Simon Willison’s Weblog: Introducing Gemma 3n: The developer guide

Jun 26, 2025

—

Source URL: https://simonwillison.net/2025/Jun/26/gemma-3n/
Source: Simon Willison’s Weblog
Title: Introducing Gemma 3n: The developer guide

Feedly Summary: Introducing Gemma 3n: The developer guide
Extremely consequential new open weights model release from Google today:

Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs.

Optimized for on-device: Engineered with a focus on efficiency, Gemma 3n models are available in two sizes based on effective parameters: E2B and E4B. While their raw parameter count is 5B and 8B respectively, architectural innovations allow them to run with a memory footprint comparable to traditional 2B and 4B models, operating with as little as 2GB (E2B) and 3GB (E4B) of memory.

This is very exciting: a 2B and 4B model optimized for end-user devices which accepts text, images and audio as inputs!
Gemma 3n is also the most comprehensive day one launch I’ve seen for any model: Google partnered with “AMD, Axolotl, Docker, Hugging Face, llama.cpp, LMStudio, MLX, NVIDIA, Ollama, RedHat, SGLang, Unsloth, and vLLM" so there are dozens of ways to try this out right now.
So far I’ve run two variants on my Mac laptop. Ollama offer a 7.5GB version (full tag gemma3n:e4b-it-q4_K_M0) of the 4B model, which I ran like this:
ollama pull gemma3n
llm install llm-ollama
llm -m gemma3n:latest "Generate an SVG of a pelican riding a bicycle"

It drew me this:

The Ollama version doesn’t appear to support image or audio input yet.
… but the mlx-vlm version does!
First I tried that on this WAV file like so (using a recipe adapted from Prince Canuma’s video):
uv run –with mlx-vlm mlx_vlm.generate \
–model gg-hf-gm/gemma-3n-E4B-it \
–max-tokens 100 \
–temperature 0.7 \
–prompt "Transcribe the following speech segment in English:" \
–audio pelican-joke-request.wav

That downloaded a 15.74 GB bfloat16 version of the model and output the following correct transcription:

Tell me a joke about a pelican.

Then I had it draw me a pelican for good measure:
uv run –with mlx-vlm mlx_vlm.generate \
–model gg-hf-gm/gemma-3n-E4B-it \
–max-tokens 100 \
–temperature 0.7 \
–prompt "Generate an SVG of a pelican riding a bicycle"

I quite like this one:

It’s interesting to see such a striking visual difference between those 7.5GB and 15GB model quantizations.
Tags: google, ai, generative-ai, local-llms, llms, vision-llms, mlx, ollama, pelican-riding-a-bicycle, gemma, llm-release, prince-canuma

AI Summary and Description: Yes

Summary: The text discusses the release of Gemma 3n, a new open weights model by Google, designed for multimodal input (image, audio, text) and optimized for on-device efficiency. This model demonstrates significant advancements in architecture and usability, allowing end-users to process complex types of input with manageable resource consumption. The collaborative launch with multiple tech partners highlights its importance in the AI and generative AI landscape.

Detailed Description:

– **Product Overview**:
– Gemma 3n is a multimodal model capable of accepting various input types, including images, audio, and text while providing text outputs.
– It is optimized for on-device usage, making it accessible for end-user devices without demanding extensive resources.

– **Technical Specifications**:
– The model comes in two versions, E2B and E4B, featuring 5 billion and 8 billion parameters respectively but designed to have a memory footprint similar to that of traditional 2 billion and 4 billion models (requiring as little as 2GB and 3GB of memory).
– This architectural innovation is crucial for improving the performance of multimodal systems on low-resource platforms.

– **Collaboration and Availability**:
– The launch included partnerships with prominent tech companies, which enhances the model’s usability and demonstrates a strong community backing for its deployment across various platforms. Partners include:
– AMD
– Docker
– Hugging Face
– NVIDIA
– RedHat
– Among others.

– **Practical Applications**:
– Users can experiment with handling different types of media, such as generating images (SVG) or transcribing audio. Sample commands illustrate how to run the model using frameworks like Ollama and mlx-vlm.
– Provided examples showcase the ease of use from prompt generation to producing high-quality outputs, emphasizing the model’s versatility in application.

– **Importance for Professionals**:
– Security and compliance professionals in AI, cloud, and infrastructure sectors should take note of the advancements in model optimization and multimodal processing, as these are pivotal for developing robust AI applications and services in resource-constrained environments.
– The collaborative approach to the model’s launch suggests an evolving landscape where interoperability and shared innovation are key in AI development.

Understanding the release of Gemma 3n is significant not only for the prospective capabilities it offers but also for its implications in security contexts where running AI models efficiently on local devices becomes increasingly critical.

.NET 1 10 2 2025 3 4 5 5G 7 a access Act advancement advancements AI AI applications AI development AI landscape ai model AI models AMD and anti app Application applications Arch architectural architectural innovation architecture Aria art as audio availability based Bi bicycle bing by C capabilities CI CIA Cloud co Col collaboration collaborative collaborative approach command community companies compliance compliance professionals constrained environments consumption Context cpp critical cross D day de demand demo deployment design developer development device Docker e E2 ease of use effective efficiency efficient end Engineer environment exp face file first following for framework frameworks full g Gemma Gemma 3 Gemma3 gemma3n Gen generation generative Generative AI Go Google gs H handling high Highlight http HTTPS hugging Hugging Face image implications improving in infrastructure infrastructure sector innovation Innovations inter interoperability io Iron IRS ite J k Key l Labor land led Li llama llama.cpp llm llms lm local low M mac making man max media memory Mila ML mlx modal modal processing Mode model model optimization model quantization models multi Multimodal multimodal model multimodal processing multimodal systems my N native new no NPU Nvidia o oE of off ollama on one only open open weights OPM opt optimization optimized ory oS other out output Outputs over parameter partnership partnerships pelican performance platform platforms practical applications pre Prince Canuma pro process processing product professionals prompt ps Q quality quantization R rate RCE red release resource resource consumption resource-constrained environments resources riding right Ro RSA s sam sec sector security security and compliance Segment service services SHA Sig Sim size source source Platform specific Speech SSE studio support SVG system systems T Tags: tech tech companies technical specifications ted test text the to token tokens Tor TP trained transcribe trie two type UI under up US usability usage use user user devices Users uv V versatility version video Vision vision-llms vllm web weight Wi x z Zen