What is DALL-E 2 from OpenAI?

DALL·E 2 is a new AI system that can create realistic images and art from a description in natural language that is set to disrupt the creative industry.

According to Open AI – “DALL·E 2 can create original, realistic images and art from text descriptions. It can combine concepts, attributes, and styles. It can expand images beyond what is in the original canvas, creating expansive new compositions. In addition, DALL·E 2 can make realistic edits to existing pictures from a natural language caption.

It can add and remove elements while considering shadows, reflections, and textures. In addition, it can take an image and create different variations of it inspired by the original. DALL·E 2 has learned the relationship between images and the text used to describe them. It uses a process called “diffusion,” which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image.”

In January 2021, OpenAI introduced a neural network called DALL·E, which creates images from text captions for a wide range of concepts expressible in natural language. DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions using a dataset of text–image pairs. OpenAI found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.

The Generative Pre-trained Transformer (GPT-3) showed that language could be used to instruct a large neural network to perform a variety of text-generation tasks. Furthermore, image GPT showed that the same type of neural network could also generate high-fidelity images. OpenAI extended these findings to show that manipulating visual concepts through language is now within reach. However, One year later, In April 2022, OpenAI announced their newest system, DALL-E 2, which generates more realistic and accurate images with 4x greater resolution that combines concepts, attributes and styles.

DALL-E was developed and announced in conjunction with Contrastive Language-Image Pre-training (CLIP). CLIP is a separate model based on zero-shot learning that was trained on 400 million pairs of images with text captions scraped from the Internet. Its role is to “understand and rank” DALL-E’s output by predicting which caption from a list of 32,768 captions randomly selected from the dataset (of which one was the correct answer) is most appropriate for an image. This model filters a larger initial list of pictures generated by DALL-E to select the most appropriate outputs. DALL-E 2 uses a diffusion model conditioned on CLIP image embeddings generated from CLIP text embeddings by a prior model.

Microsoft is investing significantly in DALL-E 2, OpenAI’s AI-powered system that generates images from the text by bringing it to first-party apps and services. In its Ignite conference, Microsoft announced integrating DALL-E 2 with the newly launched Microsoft Designer app and Image Creator tool in Bing and Microsoft Edge.

With the advent of DALL-E 2 and open-source alternatives like Stable Diffusion in recent years, AI image generators have exploded in popularity. In September, OpenAI said that more than 1.5 million users were actively creating over 2 million images daily with DALL-E 2, including artists, creative directors and authors. In addition, brands such as Stitch Fix, Nestlé and Heinz have piloted DALL-E 2 for ad campaigns and other commercial use cases. In contrast, certain architectural firms have used DALL-E 2 and tools akin to it to conceptualize new buildings.

Seeking to bring OpenAI’s tech to an even wider audience, Microsoft is launching Designer, a Canva-like web app that can generate designs for presentations, posters, digital postcards, invitations, graphics and more to share on social media and other channels. The Designer leverages user-created content and DALL-E 2 to ideate designs with drop-downs and text boxes for further customization and personalization. Within Designer, users can choose from various templates to get started on specific, defined-dimensions designs for platforms like Instagram, LinkedIn, Facebook ads and Instagram Stories. Prebuilt templates are available from the web, as are shapes, photos, icons and headings that can be added to projects.

Another new Microsoft-developed app underpinned by DALL-E 2 is Image Creator, heading to Bing and Edge in the coming weeks. As the name implies, Image Creator — accessed via the Bing Images tab or bing.com/create, or through the Image Creator icon in the sidebar within Edge — generates art given a text prompt by funnelling requests to DALL-E 2, acting like a frontend client for OpenAI’s still-in-beta DALL-E 2 service. So, for example, typing in a description of something, any additional context, like location or activity, and an art style will yield an image from Image Creator. “Image Creator will soon create images that don’t yet exist, limited only by your imagination, Unlike Designer, Image Creator in Bing and Edge will be completely free to use, but Microsoft — wary of potential abuse and misuse — says it’ll take a “measured approach” to roll out the app.

To address the Ethical Concerns, OpenAI has developed safety mitigations and continues to improve, including:

a. Preventing Harmful Generations: OpenAI has limited the ability of DALL·E 2 to generate violent, hateful, or adult images. OpenAI minimized DALL·E 2’s exposure to these concepts by removing the most explicit content from the training data. OpenAI also used advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures.

b. Curbing Misuse: OpenAI’s content policy does not allow users to generate violent, adult, or political content, among other categories. In addition, OpenAI won’t generate images if their filters identify text prompts and image uploads that may violate their policies. They also have automated and human monitoring systems to guard against misuse.

c. Phased Deployment Based on Learning: Learning from real-world use is vital in developing and deploying AI responsibly. OpenAI began by previewing DALL·E 2 to a limited number of trusted users. Then, they slowly added more users as they learned more about the technology’s capabilities and limitations and gained confidence in their safety systems. Finally, they made DALL·E 2 available in beta in July 2022.

To summarize, DALL·E 2, if successful, will empower people to express themselves creatively. DALL·E 2 will also help OpenAI to understand how advanced AI systems see and understand the human world and will be critical in their mission of creating AI that benefits humanity. Besides, it will become a game-changer for organizations that find it challenging to develop and retain experts in this field. This advancement in technology is a threat to people in the creative field as AI could replace humans!

What is DALL-E 2 from OpenAI?