This post is part of a series exploring multimodal generative AI (GAI) technologies from text, to image, to audio and video generation. Check out some of the other posts in this series:
- Hands on with Adobe Firefly: Finally an image generator that can be used in school
- Hands on with AI audio generation: GAI voice, music, and sound effects
- Hands on with chat + search: connecting generative AI to the internet
- Hands on with Bing Image Creator: Microsoft’s image generator just got serious
DALL-E in ChatGPT: The basics
If you’re using ChatGPT with a Plus subscription I’m going to assume that you’re fairly familiar with image generation prompting (there’s really no reason to pay for Plus unless, like me, you use it daily for work or study: just use Microsoft Bing for free). Standard advice applies: give as much detail as possible if you want specific images, including information on the style, form, colours, and so on.
With that in mind, here are some of the basics for DALL-E 3 in ChatGPT which differ from other image generation platforms.
First of all, you have to turn it on. ChatGPT’s various models (Default, Bing search, Advanced Data Analysis, plugins, and DALL-E 3) all run separately and must be selected from the drop down before prompting.
Once you’ve enabled it, it will stay active for the duration of the chat and any subsequent chats until you either change it or log out. It’s worth pointing out that when you’re using DALL-E 3, you don’t (currently) have internet access as well. The dataset for this model is trained up to January 2022.
As I said, it’s pretty standard fair as far as image generation is concerned. If you want a cat, ask for a cat:
Unlike Adobe Firefly or Midjourney there are no switches, toggles, or controls to set things like aspect ratio, style, generate variations, and so on: you do that all in the context of the chat thread.
This is where working with DALL-E 3 diverges from other platforms. The dialogic nature of this model means that rather than crafting the “perfect” prompt you can start bouncing ideas back and forth, refining and testing them. More on that later.
You have three choices of aspect ratio: “wide” (1792×1024), “tall” (1024×1792) and “square” (1024×1024). Square is the default, and you can change them by prompting for them directly or letting the model infer which you want. For example, you can type “wide”, or include detail like “website header image” and both will result in a landscape ratio.
Refining and testing ideas in DALL-E 3
With most image generation, you’re either going in cold and experimenting, or you know exactly what you want and you have to create detailed prompts to bring that to life.
With ChatGPT and DALL-E 3, you can have the best of both worlds, experimenting and refining as you go. You can also do this with Microsoft Bing Chat, though I’ve noticed it’s a little slower and not as good at referencing its own images.
The biggest thing here for me is the workflow. You can go from a text-based chat to variations of images in a single chat thread, like the one that follows. Beginning with a prompt for some mock marketing text for a farmers market, I’ve used ChatGPT to generate a flyer, and then some variations in different styles, then a specific style, and finally some mock photos of merchandise.
Generate four alternatives in different styles: one retro 1950s, one cyberpunk, one impressionist rural charming, one of your choice and explain your choice.
After generating the four images, I decided to go with the wildcard. ChatGPT generated a “minimalist” black and white version. I created a few more variations on that one, and then prompted for a text-free black on white image.
At this point, the model started to veer off a little. It’s better than most at staying true to the prompt, but not perfect. Text generation in images is also much better than other models, but again, far from accurate every time. Some of this is down to issues with replicating text, and some of it is the way it handles the prompts. As you can see in the images below, the first attempt ignored the instruction to generate a flat image and put it on a mock up of a notebook.
After trying again, I got the image on the right with the outline of the fruit basket and no text. Still not quite on a white background, but close enough. Next, I would recommend taking this image and editing it elsewhere such as in Adobe Photoshop if you want to add text.
Finally, I’ve taken the “idea” of the minimalist design and added it to some photos. I probably could have gotten a little more control here with more careful prompting: you couldn’t use this just yet for actual branding as it’s obviously created new “logos” in each image, and added colour to our previously black and white minimalist design. Still, not bad if you’re just doing some quick brainstorming.
Generating image prompts using data, descriptions, and other input
The biggest advantage of DALL-E 3 in ChatGPT is that you can use other sources (other than your mind, I mean) to generate the images. This means that you can use other models like ChatGPT Advanced Data Analysis, websites with Bing search enabled, or just copy in chunks of text from elsewhere for inspiration.
For example, if I upload a mock financial statement into Advanced Data Analysis and get it to generate a “story” from the data, I can generate some images in the DALL-E model. Not particularly useful for generating graphs and charts, but if you wanted to turn data like this into an image for a slide deck or website then it could create some interesting graphics.
For the next one, I copied and pasted the text from my About page directly into the DALL-E 3 model and asked for some illustrations. The one that stood out was the image below, which includes someone (me, but with hair…) standing on a conference stage talking to a crowd. You can see it’s picked up quite a few details and created text, although not entirely accurately (it’s grabbed the names of the two boards I’m on, but misspelled “Agents” and “Autism” in parts of the image, as well as the frankly bizarre “confeereintion”).
There’s also someone at the back of the crowd who appears to be having an existential crisis on their iPad, which I hope doesn’t actually happen when I run a session.
Diversity and stereotypes
That image, which I wasn’t expecting to include direct references to YCA and Reframing Autism, highlighted a few other issues. Every image model, no matter how sophisticated, suffers issues of bias and representation. As we learn more about them, developers are putting in place guardrails and filters. I wrote a lot about this in the Bing post: Hands on with Bing Image Creator: Microsoft’s image generator just got serious.
In this one, I noticed the “puzzle” imagery. Puzzle pieces are a common feature of autism imagery in these models because of the ubiquitous puzzle symbolism associated with organisations like Autism Speaks in the US. Many people in the autistic community, however, dislike the puzzle imagery as it suggests a problem to be solved, or the idea that autism doesn’t “fit”.
Many people in the communities I work with and online prefer other symbols such as the infinity loop which don’t have similar negative connotations. Still, the image generator doesn’t know that, and defaults to images like this:
With that in mind, I thought I’d try “photo of an autistic person”. In Bing image creator, this prompt actually got blocked as part of its discriminatory speech filtering. Ouch. DALL-E 3 let the prompt through, however, and generated. the following images.
On the whole, much better than Midjourney which typically generates quite dark pictures, almost invariably of young, sad looking, white males. But if you look closely, there’s still a “seriousness” to all the images, they’re 3/4 male, and all white. And, in the last one, there’s the puzzle iconography again (I’m not even going to mention that kid’s cardigan…).
Obviously picking up on cues from it’s training, beneath the images the text read as follows: Please let me know if any of these images resonate with your vision or if you’d like further adjustments.
OK then. I’d like them to be more diverse, and to celebrate that diversity…
DALL-E 3 got carried away with the celebrations, and we’ve ended up dangerously close to “women laughing alone with salad” territory…
Which model should I use?
So far, in terms of image generation, I’ve written about Firefly, Bing, and now ChatGPT. In earlier articles I also discussed how Midjourney could be used in subjects like media, but I haven’t used that platform for over a month now that Adobe in particular has pushed out a more “ethical” model which contains content credentialed metadata.
So which one should you use?
It really depends on the purpose, and your access. Bing chat with DALL-E 3 is currently free to access with a generous number of generations. Adobe Firefly gives you 25 images on a free account, and then 1000 or 3000 for the paid Creative Cloud subscription levels. ChatGPT with DALL-E 3 is only available for Plus users for around $30/mth AUD.
All three (now Firefly is on v2) offer good quality images. Adobe’s product is perhaps the most “ethical”, though I’m sure many Adobe Stock contributors would argue against that. Adobe also has the best safety features from my experiments so far, and users who agree to the Adobe Creative Cloud terms and conditions, including students under the education license, can use it.
Bing and ChatGPT have the advantage of the “chat” mode which allows much easier back-and-forth editing than Firefly. At the end of the day, I encourage you to get in and experiment. If you’re interested in ChatGPT with DALL-E 3, sign up for a month then cancel if you find that Bing. orFirefly suit your needs.
On November 8th I’ll be running a webinar on image generation. Practical Strategies for Image Generation in Education will cover six strategies and focus on Bing and Firefly for image generation in K-12 and tertiary contexts. Check out the webinar details on eventbrite here.
Got questions or comments, or want to get in touch to discuss professional learning and GAI consultancy services? Use the form below: