Simon Willison’s Weblog: Introducing 4o Image Generation

Source URL: https://simonwillison.net/2025/Mar/25/introducing-4o-image-generation/#atom-everything
Source: Simon Willison’s Weblog
Title: Introducing 4o Image Generation

Feedly Summary: Introducing 4o Image Generation
When OpenAI first announced GPT-4o back in May 2024 one of the most exciting features was true multi-modality in that it could both input and output audio and images. The “o" stood for "omni", and the image output examples in that launch post looked really impressive.
It’s taken them over ten months (and Gemini beat them to it) but today they’re finally making those image generation abilities available, live right now in ChatGPT for paying customers.
My test prompt for any model that can manipulate incoming images is "Turn this into a selfie with a bear", because you should never take a selfie with a bear! I fed ChatGPT this selfie and got back this result:

That’s pretty great! It mangled the text on my T-Shirt (which says "LAWRENCE.COM" in a creative font) and added a second visible AirPod. It’s very clearly me though, and that’s definitely a bear.
There are plenty more examples in OpenAI’s launch post, but as usual the most interesting details are tucked away in the updates to the system card. There’s lots in there about their approach to safety and bias, including a section on "Ahistorical and Unrealistic Bias" which feels inspired by Gemini’s embarrassing early missteps.
One section that stood out to me is their approach to images of public figures. The new policy is much more permissive than for DALL-E – highlights mine:

4o image generation is capable, in many instances, of generating a depiction of a public figure based solely on a text prompt.
At launch, we are not blocking the capability to generate adult public figures but are instead implementing the same safeguards that we have implemented for editing images of photorealistic uploads of people. For instance, this includes seeking to block the generation of photorealistic images of public figures who are minors and of material that violates our policies related to violence, hateful imagery, instructions for illicit activities, erotic content, and other areas. Public figures who wish for their depiction not to be generated can opt out.
This approach is more fine-grained than the way we dealt with public figures in our DALL·E series of models, where we used technical mitigations intended to prevent any images of a public figure from being generated. This change opens the possibility of helpful and beneficial uses in areas like educational, historical, satirical and political speech. After launch, we will continue to monitor usage of this capability, evaluating our policies, and will adjust them if needed.

Given that "public figures who wish for their depiction not to be generated can opt out" I wonder if we’ll see a stampede of public figures to do exactly that!
Tags: openai, ai, multi-modal-output, llms, ai-ethics, llm-release, generative-ai, chatgpt, dalle, gemini

AI Summary and Description: Yes

Summary: The text discusses OpenAI’s launch of GPT-4o, which incorporates multi-modal capabilities, allowing image generation alongside text inputs. It highlights changes in policy regarding the generation of images of public figures and emphasizes the importance of safety and bias considerations.

Detailed Description:
The announcement of GPT-4o by OpenAI introduces significant advancements in AI capabilities, focusing on multi-modal outputs, which include both audio and images. This launch is particularly relevant for professionals in AI and cloud computing security, as it brings to light various security, ethical, and compliance considerations surrounding generative AI technologies.

Key Highlights:
– **Multi-modality in AI**: GPT-4o allows users to generate images based on text prompts, which reflects a growing trend towards developing AI systems that can handle multiple types of data inputs and outputs, enhancing their applicability in various fields including education and entertainment.
– **Image Generation Capabilities**: OpenAI has implemented features to create depictions of public figures based solely on text prompts, which indicates a shift towards increased functionality but also raises ethical and privacy concerns.
– **Policy Changes**: Their approach to generating images of public figures is more flexible than previous models (e.g., DALL-E). While they have safeguards against misuse, the fact that users can opt out of having their likeness used for generation opens up potential legal and ethical implications.
– **Safety and Bias Considerations**: The updates provide insights into OpenAI’s measures to mitigate bias and maintain safety in their models. This includes blocking the generation of harmful or inappropriate content, highlighting an ongoing commitment to responsible AI usage.
– **Monitoring and Adaptation**: OpenAI plans to continue evaluating the usage of this new capability, indicating that they may adapt their policies based on real-world applications and feedback.

Implications for Security and Compliance:
– The new capabilities in GPT-4o enhance the need for robust ethical guidelines and compliance frameworks within AI development, particularly concerning the generation of public figures’ images.
– Professionals in security and compliance should remain vigilant about the implications of these technologies, especially regarding privacy rights and intellectual property.
– As AI becomes more integrated into various sectors, the necessity for clear governance and regulatory frameworks cannot be overstated, especially in relation to public figures and sensitive content.

This launch not only showcases advancements in AI capabilities but also underscores the evolving landscape of AI ethics, security, and user rights—key areas for ongoing attention among professionals in these fields.