OpenAI Has Improved Its Image Gen – But Do We Need More “Offensive” AI?

Open AI have just released a long overdue update to their image generation capabilities. I say long overdue, because if you compare the output of DALL-E 3 to competing image generators like Midjourney, Adobe Firefly or even the slew of open sourced models built on stable diffusion you can see that it’s been lagging behind for some time now. 

With about as much fanfare as they could muster, OpenAI also released their video generation platform, Sora, over Christmas 2024 and given Sora obviously uses image generation under the hood it was a little surprising that DALL-E didn’t get an upgrade at the same time.

Clearly, they were working on the image generation updates in the background. The March update to ChatGPT has brought a number of changes including improved quality, more varied styles, and the ability to render text. 

But rather than focusing on the much more consistent images, or the impressive control of text generation which outclasses any other image generator I’ve seen, CEO Sam Altman decided to focus his announcement on another upcoming feature: “offensive” images. 

The decision to relax guardrails isn’t new, and Altman has hinted at it before. The breathless excitement around the potential for more “offensive” content is also perfectly in line with the US administration’s war on woke. It’s Musk’s hostile take over of Twitter in the name of “freedom of speech”. It’s Zuckerberg’s “masculine energy.” It’s as predictable and tedious as you might expect from this bizarre fraternity of billionaires.

It’s also not a particularly hot take to suggest AI can be used to create offensive images. Stable Diffusion based open source image models have flooded the internet with AI-generated pornography, including deepfakes and CSAM. But Altman’s stance legitimises the use of AI to create violent and pornographic images, and does so in the spurious name of “creativity”.

Right now, OpenAI’s image generator still has guardrails, though they are already more relaxed than previous versions of DALL-E. In the following examples, I tested out its restrictions around violent images, misinformation, and nudity. It goes without saying that some readers might find these images “offensive”. Creative? Not so much.

Violence

Since I have the paid subscription to ChatGPT, I can run both the new and the old image models simultaneously. The new image generation model is “coming soon” in the free version, and a request for a “war zone” image with wounded people is still a flat no:

Prompt used in ChatGPT-4o (free) with DALL-E 3

Switch to the updated model, however, and it’s a different story:

Image generated in ChatGPT Plus with new ChatGPT 4o image generator

Blood and gore has been out of bounds for OpenAI’s model up until now, but a certain level of violence is apparently now OK. Here’s an example, even if it is only a flesh wound:

Misinformation

Violent images like the first example above could easily contribute to clickbait and viral misinformation, luring viewers into sharing shock-and-awe articles on social media. Another relaxed guardrail makes misinformation even more likely. Like X’s Grok, OpenAI has seemingly dropped its restrictions on generating images of real people. It’s particularly successful with celebrities and politicians, who obviously have much more training data in the dataset than you or I:

Much like the average feed on X, OpenAI’s image sharing page on its Sora website is already filled with AI-generated “satire”:

screenshot of thumbnails generated on the Sora platform

Nudity

Violence and misinformation aside, we all know where the real money is. A huge volume of AI image generation is already centred on explicit images, including deepfakes. All of Altman’s comments thus far point towards OpenAI releasing a version of it’s image generator that allows for the creation of nude images, and there is no real technical barrier to them doing so.

Right now, the new model will not generate nude images.

“Tasteful”

However, it is already possible to see how fragile those guardrails are. If you request an image of a nude person, male or female, it will begin to generate the image. Clearly, OpenAI is using two methods to filter content: vetting the words of the prompt, and also applying image recognition to the image as it generates. As soon as the model hits a nipple, it pretends it could never generate the image and gives variations of the same canned response:

How long before even these spurious guardrails are removed?

“Creative expression”

Compared to other, less mainstream AI models out there, OpenAI’s new image generator is currently pretty tame. No extreme violence, no explicit images, and a polite demurring to any user that requests something too spicy. But there is one area of OpenAI’s new approach to “creative expression” that some users have found particularly offensive: OpenAI has also relaxed its guardrails prohibiting users from generating images in the style of other artists.

Most notably, the internet has been flooded with profile pics, family photos, and memes generated by ChatGPT “in the style of Studio Ghibli”.

Studio Ghibli’s founder, animator and director Hayao Mizayaki, famously referred to Artificial Intelligence as “an insult to life itself”.

OpenAI is already embroiled in numerous lawsuits regarding the scraping of copyrighted data for training their models. Until now, their image generation has at least obfuscated these problems in the output. The irony of his model being used to drown X and Facebook in Ghibli-derivatives is obviously lost on Sam Altman, however, who took it as an opportunity to whine about how mean everyone is being to him while he tries to “cure cancer or whatever”.

I’m not sure how trivialising the creation of violent and explicit images or filling social media with tactless anime is going to cure cancer. But please, if everyone could just stop being so mean to Sam he’d really appreciate it.

The Future of Multimodal GenAI

All of this – both the relaxed content restrictions and the cavalier approach to copyright – is a frustrating distraction from the potential of this technology. I have no doubt that multimodal output that combines text, image, audio, video, and code generation is the most useful avenue for Generative AI, and I’ve been saying so for a long time now. I just hope that someone other than OpenAI, or at least someone other than Sam Altman, gets to drive these advances forwards.

I don’t think Altman, Musk, and Zuckerberg are qualified to tell us what counts as “offensive”, and what should or should not be permitted in platforms that have scaled to hundreds of millions, or billions, of users. Anyone who really wants to create violent and explicit content with AI can already do so: we don’t need these mainstream platforms to further legitimise it.

And that’s probably the biggest indication that this is about money, and not about “creative expression”. Because open source image generators do not represent any value for OpenAI, and there’s obviously a huge market share to be captured in users who would be willing to pay to create offensive images with ChatGPT.

The future of multimodal GenAI could be about improving access to creative technologies, or reducing the learning curve or cost of industry standard image and video editing. It could be about allowing individuals to quickly plan and create content without access to huge funds or powerful hardware.

Or, it could be about gore and pornography.

It looks like OpenAI, at least, will be following the money.

Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

← Back

Thank you for your response. ✨

Leave a Reply