Simon Willison’s Weblog: Improved Gemini 2.5 Flash and Flash-Lite

Sep 25, 2025

—

Source URL: https://simonwillison.net/2025/Sep/25/improved-gemini-25-flash-and-flash-lite/#atom-everything
Source: Simon Willison’s Weblog
Title: Improved Gemini 2.5 Flash and Flash-Lite

Feedly Summary: Improved Gemini 2.5 Flash and Flash-Lite
Two new preview models from Google – updates to their fast and inexpensive Flash and Flash Lite families:

The latest version of Gemini 2.5 Flash-Lite was trained and built based on three key themes:

Better instruction following: The model is significantly better at following complex instructions and system prompts.
Reduced verbosity: It now produces more concise answers, a key factor in reducing token costs and latency for high-throughput applications (see charts above).
Stronger multimodal & translation capabilities: This update features more accurate audio transcription, better image understanding, and improved translation quality.

[…]
This latest 2.5 Flash model comes with improvements in two key areas we heard consistent feedback on:

Better agentic tool use: We’ve improved how the model uses tools, leading to better performance in more complex, agentic and multi-step applications. This model shows noticeable improvements on key agentic benchmarks, including a 5% gain on SWE-Bench Verified, compared to our last release (48.9% → 54%).
More efficient: With thinking on, the model is now significantly more cost-efficient—achieving higher quality outputs while using fewer tokens, reducing latency and cost (see charts above).

They also added two new convenience model IDs: gemini-flash-latest and gemini-flash-lite-latest, which will always resolve to the most recent model in that family.
I released llm-gemini 0.26 adding support for the new models and new aliases. I also used the response.set_resolved_model() method added in LLM 0.27 to ensure that the correct model ID would be recorded for those -latest uses.
Both of these models support optional reasoning tokens. I had them draw me pelicans riding bicycles in both thinking and non-thinking mode, using commands that looked like this:
llm -m gemini-2.5-flash-preview-09-2025 -o thinking_budget 4000 “Generate an SVG of a pelican riding a bicycle"

I then got each model to describe the image it had drawn using commands like this:
llm -a https://static.simonwillison.net/static/2025/gemini-2.5-flash-preview-09-2025-thinking.png -m gemini-2.5-flash-preview-09-2025 -o thinking_budget 2000 ‘Detailed single line alt text for this image’

gemini-2.5-flash-preview-09-2025-thinking

A minimalist stick figure graphic depicts a person with a white oval body and a dot head cycling a gray bicycle, carrying a large, bright yellow rectangular box resting high on their back.

gemini-2.5-flash-preview-09-2025

A simple cartoon drawing of a pelican riding a bicycle, with the text "A Pelican Riding a Bicycle" above it.

gemini-2.5-flash-lite-preview-09-2025-thinking

A quirky, simplified cartoon illustration of a white bird with a round body, black eye, and bright yellow beak, sitting astride a dark gray, two-wheeled vehicle with its peach-colored feet dangling below.

gemini-2.5-flash-lite-preview-09-2025

A minimalist, side-profile illustration of a stylized yellow chick or bird character riding a dark-wheeled vehicle on a green strip against a white background.

Via Hacker News
Tags: google, llms, llm, gemini, pelican-riding-a-bicycle, llm-reasoning, llm-release

AI Summary and Description: Yes

Summary: The text discusses the latest updates to Google’s Gemini 2.5 Flash and Flash-Lite models, highlighting notable improvements in instruction-following, reduced verbosity, and enhanced multimodal capabilities. These updates reflect advancements in AI model development that could impact performance in high-throughput applications, which are of interest to AI, cloud, and infrastructure security professionals.

Detailed Description:
The document outlines the enhancements made to two new AI models from Google, Gemini 2.5 Flash and Flash-Lite. These improvements are especially relevant for AI security professionals and infrastructure developers as they reflect ongoing advancements in AI technology that can influence both performance and security.

Key Points:
– **Better Instruction Following**: The Flash-Lite model has improved its ability to handle complex instructions effectively, which is crucial for developing more sophisticated AI applications.
– **Reduced Verbosity**: The model now provides more concise answers, helping to reduce token costs and latency. This optimization can be vital for organizations working with large data sets or those requiring quick response times in cloud applications.
– **Stronger Multimodal & Translation Capabilities**: Enhanced audio transcription, image understanding, and translation quality show that the model can deliver better performance across various types of media input.
– **Improved Agentic Tool Use**: The model demonstrates a 5% performance gain on agentic benchmarks, highlighting its increased capability to manage complex tasks independently.
– **Cost Efficiency**: By using fewer tokens while delivering high-quality outputs, the new models can significantly reduce operational costs for businesses relying on cloud AI services.
– **New Model IDs**: Introduction of model IDs like gemini-flash-latest, which streamlines the process of obtaining the latest model with clear financial implications for companies aiming to stay current with AI technology.

This update shows that advancements in large language models (LLMs) not only improve usability but can also have profound implications on the efficiency and cost-effectiveness of deploying AI technologies. It suggests a trajectory where businesses can leverage improved AI capabilities to enhance their security protocols, resource management, and compliance with operational standards.

Overall, the release of these models has the potential to redefine how organizations implement AI technologies across their infrastructures, necessitating a reassessment of security measures and compliance frameworks associated with these new capabilities.

-bench Verified .NET 2 2025 4 5 5 flash 7 a Act advancement advancements age agent agentic agentic tool use AI AI applications AI capabilities ai model AI models AI security AI technologies AI technology All alt and Angular app Application applications art as assessment at ated audio Audio transcription based benchmark benchmarks Bi bicycle black bot Box budget built business by C capabilities capability CI CIA CleaR cli Cloud cloud applications co Col command companies compliance compliance framework compliance frameworks cost cost efficiency cost-effective cost-effectiveness Costs cross Current D data de DeFi demo developer developers development document DoT e effective effectiveness efficiency efficient end exp fact fast feature features feedback file financial financial implications fine flash following for framework frameworks g Gemini Gemini 2 Gen Go Google graph gs H hack hacker Hacker News high high-throughput Highlight HP HR http HTTPS image image understanding impact implications in Influence infrastructure infrastructure security infrastructures instruction instruction following inter io ite J k Key l language language model language models large large language model large language models Large Language Models (LLMs) latency leading led Li line Lite llm llms lm low M made man management measures media mini ML modal Mode model model development models multi Multimodal multimodal capabilities N new news NGO no non NPU o of on only ons operation operational operational cost Operational Costs operational standards OPM opt optimization organization organizations ory oS oss out output Outputs over pelican per performance phi point potential pre Preview pro process professionals prompt prompts protocol protocols ps Q quality quality output QUIC R rag rate Ray RCE re reasoning reasoning tokens record red release resource resource management response response times review riding right Ro RoT s sec security security measure security measures security professionals security protocols service services side Sig Sim Simon Willison Simple single SoC source SSE SSO standards step applications structures support SVG system system prompt system prompts T Tags: Task tasks tech technologies technology ted test text the thinking throughput Time times to token tokens tool tool use tools Tor TP trained trajectory translation two type UI under up update updates US usability use V val version web white Wi x yt z