Simon Willison’s Weblog: Gemini 2.5 Models now support implicit caching

Source URL: https://simonwillison.net/2025/May/9/gemini-implicit-caching/#atom-everything
Source: Simon Willison’s Weblog
Title: Gemini 2.5 Models now support implicit caching

Feedly Summary: Gemini 2.5 Models now support implicit caching
I just spotted a cacheTokensDetails key in the token usage JSON while running a long chain of prompts against Gemini 2.5 Flash – despite not configuring caching myself:
{“cachedContentTokenCount": 200658, "promptTokensDetails": [{"modality": "TEXT", "tokenCount": 204082}], "cacheTokensDetails": [{"modality": "TEXT", "tokenCount": 200658}], "thoughtsTokenCount": 2326}
I went searching and it turns out Gemini had a massive upgrade to their prompt caching earlier today:

Implicit caching directly passes cache cost savings to developers without the need to create an explicit cache. Now, when you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit. We will dynamically pass cost savings back to you, providing the same 75% token discount.

Previously you needed to both explicitly configure the cache and pay a per-hour charge to keep that cache warm.
This new mechanism is so much more convenient! It imitates how both DeepSeek and OpenAI implement prompt caching, leaving Anthropic as the remaining large provider who require you to manually configure prompt caching to get it to work.
Tags: prompt-caching, gemini, prompt-engineering, generative-ai, llm-pricing, ai, llms

AI Summary and Description: Yes

Summary: The text discusses the recent upgrade to the Gemini 2.5 models, particularly their new implicit caching features. This enhancement significantly reduces costs and simplifies the process for developers, aligning Gemini’s capabilities with industry practices in generative AI and large language models (LLMs).

Detailed Description: The update to the Gemini 2.5 models introduces implicit caching that impacts how prompts are managed and executed, ultimately benefiting developers working with AI. The following points highlight the significance of this upgrade:

– **Implicit Caching**: The new caching system enables automatic cache hits for requests with shared prefixes, streamlining the use of resources and enhancing performance.
– **Cost Savings**: Developers can benefit from a substantial 75% discount on token costs due to this caching mechanism, providing economic advantages that were previously unavailable.
– **Ease of Use**: Unlike prior requirements to explicitly configure caching, the new system simplifies operations and allows developers to focus more on application development rather than infrastructure management.
– **Industry Comparison**: The mechanism aligns Gemini with competitors such as DeepSeek and OpenAI, both of which offer similar implicit caching capabilities, leaving Anthropic as a distinct player requiring manual configuration.

Overall, this upgrade emphasizes the shift in generative AI towards more user-friendly and economically viable solutions, reflecting ongoing trends in AI security and infrastructure that prioritize efficiency and resource management. Security and compliance professionals would benefit from understanding how such optimizations can influence infrastructure deployment strategies and cost forecasting for AI initiatives.