Simon Willison’s Weblog: Gemini Diffusion

May 21, 2025

—

Source URL: https://simonwillison.net/2025/May/21/gemini-diffusion/
Source: Simon Willison’s Weblog
Title: Gemini Diffusion

Feedly Summary: Gemini Diffusion
Another of the announcements from Google I/O yesterday was Gemini Diffusion, Google’s first LLM to use diffusion (similar to image models like Imagen and Stable Diffusion) in place of transformers.
Google describe it like this:

Traditional autoregressive language models generate text one word – or token – at a time. This sequential process can be slow, and limit the quality and coherence of the output.
Diffusion models work differently. Instead of predicting text directly, they learn to generate outputs by refining noise, step-by-step. This means they can iterate on a solution very quickly and error correct during the generation process. This helps them excel at tasks like editing, including in the context of math and code.

The key feature then is speed. I made it through the waitlist and tried it out just now and wow, they are not kidding about it being fast.
In this video I prompt it with “Build a simulated chat app" and it responds at 857 tokens/second, resulting in an interactive HTML+JavaScript page (embedded in the chat tool, Claude Artifacts style) within single digit seconds.

The performance feels similar to the Cerebras Coder tool, which used Cerebras to run Llama3.1-70b at around 2,000 tokens/second.
Prior to this the only commercial grade diffusion model I’ve encountered is Inception Mercury back in February this year.
Tags: llm-release, gemini, google, generative-ai, ai, google-io, llms

AI Summary and Description: Yes

Summary: The text introduces Gemini Diffusion, Google’s innovative large language model (LLM) that utilizes diffusion techniques for text generation, improving both speed and output quality. This advancement is significant for practitioners focused on AI and generative AI security, as it suggests a shift in how LLMs can be optimized and the potential implications for AI application development.

Detailed Description: The announcement of Gemini Diffusion marks a potential breakthrough in the field of large language models, with significant implications for various sectors, including AI security and infrastructure:

– **Diffusion Models vs. Traditional LLMs**:
– Traditional autoregressive models generate text sequentially, which can slow down performance.
– Diffusion models refine and generate outputs through a process of iterative noise correction, leading to faster and possibly more coherent outputs.

– **Speed and Efficiency**:
– Gemini Diffusion reportedly operates at impressive speeds, generating responses at a rate of 857 tokens/second.
– This capability allows for rapid generation of complex outputs, including coding and interactive content, significantly enhancing usability for developers.

– **Practical Applications**:
– The model demonstrates potential in various tasks, including:
– Code generation.
– Editing and refining existing text.
– Development of interactive applications, as showcased by a simulated chat app prompt.

– **Comparison with Other Models**:
– Performance is compared to the Cerebras Coder tool that runs Llama3.1-70b, highlighting the competitive landscape of high-performance LLMs.

– **Implications for Security and Compliance**:
– The introduction of such technology may raise questions around security measures necessary for rapid text generation capabilities.
– As LLMs become faster and more integrated into applications, professionals in AI security will need to address possible vulnerabilities and establish governance frameworks.

Overall, Gemini Diffusion represents a significant advancement in generative AI capabilities, prompting professionals in AI, cloud computing, and security to consider the potential impacts and necessary adaptations in their security approaches and compliance practices.

.NET 1 2 2025 3 5 7 a Act adaptation advancement AI AI security and API app Application application development applications art as Auto autoregressive models being Bi by C capabilities capability Cerebras chat CI CIA Claude Claude Artifact Cloud cloud computing co code code generation coding cohere coherence commercial competitive competitive landscape compliance compliance practices Computing content Context D day de demo developer developers development diffusion diffusion model diffusion models e editing efficiency error Ester Excel fact fast feature fine first focused for framework frameworks g Gemini Gemini Diffusion Gen generation generative Generative AI git Go Google Google I/O governance governance framework governance frameworks grade gs H high high-performance Highlight HR http HTTPS image Image models Imagen implications implications for security in infrastructure inter interactive content io IRS ite J Java JavaScript Just k Key l land language language model language models large large language model large language models led Li llama Llama3.1-70b llm llms lm low M made man math measures Mila mini ML Mode model models N no o of on one only OPM opt oS out output Outputs over Paris performance potential practical applications pre process professionals prompt Prompting Q quality question QUIC R Raise rate RCE red release report response responses Ro s sec sector security security and compliance security measure security measures shift side Sig Sim single source T Tags: Task tasks tech techniques technology text text generation the Time to token tokens tool Tor TP Transform transformer transformers trie UI US usability use V video vulnerabilities web Wi x