Simon Willison’s Weblog: Create and edit images with Gemini 2.0 in preview

Source URL: https://simonwillison.net/2025/May/7/gemini-images-preview/#atom-everything
Source: Simon Willison’s Weblog
Title: Create and edit images with Gemini 2.0 in preview

Feedly Summary: Create and edit images with Gemini 2.0 in preview
Gemini 2.0 Flash has had image generation capabilities for a while now, and they’re now available via the paid Gemini API – at 3.9 cents per generated image.
According to the API documentation you need to use the new gemini-2.0-flash-preview-image-generation model ID and specify {“responseModalities":["TEXT","IMAGE"]} as part of your request.
Here’s an example that calls the API using curl (and fetches a Gemini key from the llm keys get store):
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent?key=$(llm keys get gemini)" \
-H "Content-Type: application/json" \
-d ‘{
"contents": [{
"parts": [
{"text": "Photo of a raccoon in a trash can with a paw-written sign that says I love trash"}
]
}],
"generationConfig":{"responseModalities":["TEXT","IMAGE"]}
}’ > /tmp/raccoon.json
Here’s the response. I got Gemini 2.5 Pro to vibe-code me a new debug tool for visualizing that JSON. If you visit that tool and click the "Load an example" link you’ll see the result of the raccoon image visualized:

The other prompt I tried was this one:

Provide a vegetarian recipe for butter chicken but with chickpeas not chicken and include many inline illustrations along the way

The result of that one was a 41MB JSON file(!) containing 28 images – which presumably cost over a dollar since images are 3.9 cents each.
If you want to see that one you can click the "Load a really big example" link in the debug tool, then wait for your browser to fetch and render the full 41MB JSON file.
The most interesting feature of Gemini (as with GPT-4o images) is the ability to accept images as inputs. I tried that out with this pelican photo like this:
cat > /tmp/request.json << EOF { "contents": [{ "parts":[ {"text": "Modify this photo to add an inappropriate hat"}, { "inline_data": { "mime_type":"image/jpeg", "data": "$(base64 -i pelican.jpg)" } } ] }], "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]} } EOF # Execute the curl command with the JSON file curl -X POST \ 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent?key='$(llm keys get gemini) \ -H 'Content-Type: application/json' \ -d @/tmp/request.json \ > /tmp/out.json
And now the pelican is wearing a hat:

Via Hacker News
Tags: vision-llms, text-to-image, gemini, generative-ai, ai, llms, vibe-coding, tools

AI Summary and Description: Yes

Summary: The text discusses the capabilities of the Gemini 2.0 Flash and its API for generating images, highlighting its novel features such as accepting input images and generating rich multimedia responses alongside text. This information is particularly relevant for professionals involved in AI, specifically those exploring image generation and integration with language models.

Detailed Description: The text provides a detailed overview of the capabilities of the Gemini 2.0 Flash, particularly its image generation features. The following points summarize its significance:

– **Image Generation API**: The Gemini 2.0 Flash API allows users to generate images at a cost of 3.9 cents per image, offering a new avenue for integrating image capabilities within applications.
– **Content Generation**: The API supports requests for both text and image responses through specific model IDs and parameters.
– **Example API Call**: A sample API call is provided using the `curl` command, demonstrating how to generate an image based on a descriptive text prompt.
– For instance, the user generated an image of a raccoon in a trash can, showing the flexibility of the API.
– **Large JSON Outputs**: The text mentions generating a large JSON file (41MB) containing multiple illustrations from a single request, indicating the high data output potential from complex prompts.
– **Input Modification Feature**: Notably, Gemini allows for the modification of input images, which enhances its usability in creative tasks. An example of adding a hat to a pelican photo illustrates this feature.
– **Integration with Tools**: The text references the use of debugging tools to visualize API outputs, highlighting practical applications for developers and analysts working with AI-generated content.

Overall, the Gemini 2.0 Flash capabilities can significantly impact AI-driven applications, particularly those focused on creative, generative tasks in various industries. Security and compliance professionals may also need to consider data management and privacy implications when implementing these features.