Dall-E 2

ScottocS_woman_with_a_big_smile_and_perfect_teeth_with_a_mardi__e6552fc1-1158-4728-b1c9-be068e1a2237

To say I have been fascinated with text to image AI would be a major understatement. The image above is of a woman that does not exist, wearing a “mask” and other Mardis Gras paraphernalia.

What is Dall-E

DALL-E is an AI model developed by OpenAI that generates unique images from textual descriptions. The model is trained on a massive dataset of images and texts, learning to understand and generate a diverse range of image styles and content. When given a textual description as input, the model generates an image that corresponds to the description. The generated images can be highly imaginative and combine features from multiple real-world objects or scenes to create novel and creative outputs.

How Does It Work?

DALL-E works by using a transformer-based architecture, similar to those used in language models like GPT-3. The model is trained on a massive dataset of images and textual descriptions, learning to map textual descriptions to visual representations.

When given a textual description as input, the model encodes the text into a high-dimensional vector representation and then generates an image by transforming the vector through multiple linear and non-linear layers. The generated image is then decoded and compared to the ground-truth image. The difference between the generated and ground-truth images is used to update the model’s parameters through backpropagation and gradient descent.

The model is trained in a generative fashion, meaning it learns to generate images from scratch, rather than modifying existing images. This allows DALL-E to generate highly imaginative and creative images, combining features from multiple real-world objects or scenes to create novel outputs.

Some Examples

I thought up a project, called The State of AI, and I am using AI to generate 12 images of a state for a Calendar. Here are some examples with the prompts I used as captions:

December in Hawaii: Christmas lights on Kalakaua Avenue, Oahu: “4k Ultra high details, oil painting style, brush strokes, vivid colors”

So, these are all very stylized as that is the parameter of the calendar series I created. Also, these represent the final image I chose after going through setting tweaks and variations. But the results are very cool. But one place where I find Dall-E to really shine is in realism.

“a photograph from the 1970s of a teenage kid leaning against his new 1971 Datsun 240Z, realistic face and body“

If you look at this image glancingly it looks right. Closer inspection reveals flaws, but this technology is in it’s infancy. Also, there has been no refinements of the prompt. Just an idea and then an image.

I do a lot of 3D Printing. One of the things that I find the most amazing in that world is that I can dream up something, sit down with Fusion 360, draw it up, export it out, slice it and print it. From idea to something in my hand in a few hours. Text to Image has me excited in the same way. If I need an image, I just describe it, tweak some parameters, refine the options, and I have an image.

I was going to go into some of the other AI Text to Image Generators, such as MidJourny, Stable Diffusion, or others, but I think I will go through each individually. They all have different strengths.

Here are some other images and their prompts: