Simon Willison’s Weblog: Notes on Google’s Gemma 3

Mar 12, 2025

—

Source URL: https://simonwillison.net/2025/Mar/12/gemma-3/
Source: Simon Willison’s Weblog
Title: Notes on Google’s Gemma 3

Feedly Summary: Google’s Gemma team released an impressive new model today (under their not-open-source Gemma license). Gemma 3 comes in four sizes – 1B, 4B, 12B, and 27B – and while 1B is text-only the larger three models are all multi-modal for vision:

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling.

I tried out the largest model using the latest Ollama – this is the second time I’ve spotted a major model release partnering with Ollama on launch day, the first being Mistral Small 3.
I ran this (after upgrading Ollama through their menu icon upgrade option):
ollama pull gemma3:27b

That pulled 17GB of model weights. I’ve been trying it out using LLM and llm-ollama:
llm install llm-ollama
llm -m gemma3:27b ‘Build a single page HTML+CSS+JavaScript UI that gives me a large textarea for writing in which constantly saves what I have entered to localStorage (restoring when I reload the page) and displays a word counter’

That was a replay of a prompt I ran against Claude Artifacts a few months ago. Here’s what Gemma built, and the full chat transcript. It’s a simple example but it worked just right.

Something I’ve been curious about recently is longer context support: how well can a local model on my laptop deal with summarization or data extraction tasks against longer pieces of text?
I decided to try my Hacker News summarize script using Gemma, against the thread there discussing the Gemma 3 technical paper.
First I did a quick token count (using the OpenAI tokenizer but it’s usually a similar number to other models):
curl ‘https://hn.algolia.com/api/v1/items/43340491’ | ttok

This returned 22,260 – well within Gemma’s documented limits but still a healthy number considering just last year most models topped out at 4,000 or 8,000.
I ran my script like this:
hn-summary.sh 43340491 -m gemma3:27b

It did a pretty good job! Here’s the full prompt and response. The one big miss is that it ignored my instructions to include illustrative quotes – I don’t know if modifying the prompt will fix that but it’s disappointing that it didn’t handle that well, given how important direct quotes are for building confidence in RAG-style responses.
Here’s what I got for Generate an SVG of a pelican riding a bicycle:
llm -m gemma3:27b ‘Generate an SVG of a pelican riding a bicycle’

You can also try out the new Gemma in Google AI Studio, and via their API. I added support for it to llm-gemini 0.15, though sadly it appears vision mode doesn’t work with that API hosted model yet.
llm install -U llm-gemini
llm keys set gemini
# paste key here
llm -m gemma-3-27b-it ‘five facts about pelicans of interest to skunks’

Here’s what I got. I’m not sure how pricing works for that hosted model.
Gemma 3 is also already available through MLX-VLM – here’s their model collection – but I haven’t tried that version yet.
Tags: google, ai, generative-ai, llms, vision-llms, mlx, ollama, pelican-riding-a-bicycle, gemma

AI Summary and Description: Yes

Summary: Google’s Gemma team has released the Gemma 3 model, featuring multi-modal capabilities that enhance AI functions such as vision-language processing and language understanding. This model marks a significant advance in the capabilities of AI technologies, particularly in handling extensive context and improves interactions with users through structured outputs and function calling.

Detailed Description:
– **Model Overview**:
– **Gemma 3 Family**: The new model comes in four versions: 1B (text-only), 4B, 12B, and the largest 27B, which supports both text and vision inputs.
– **Multimodal Capabilities**: Gemma 3’s multi-modal capability allows it to process and generate outputs that integrate both visual and textual information.

– **Technical Enhancements**:
– **Extended Context Window**: The model can handle context windows of up to 128,000 tokens, vastly exceeding the typical limits of previous models (usually between 4,000 to 8,000 tokens).
– **Language Support**: It understands over 140 languages, making it versatile for global applications.
– **Improved Functionalities**: The enhanced math, reasoning, and chat capabilities include structured outputs and the ability to invoke functions.

– **Practical Application Example**:
– The largest model (27B) was used to generate a web UI that saves a user’s input to localStorage, showcasing practical usage in web development.
– When tested against a summarization script for a technical paper, Gemma 3 managed the task successfully, highlighting its ability to process longer texts—although it did miss including direct quotes, which is significant for building trust in responses.

– **Installation and Testing**:
– The model is accessible via the Ollama platform and can be pulled using specific commands.
– It can also be integrated with other tools like Google AI Studio and MLX-VLM, although some features (like vision mode) may not be fully functional in certain environments.

– **Future Considerations**:
– The release prompts questions about how users can effectively leverage the long context support for tasks like summarization or data extraction.
– There is also curiosity about model pricing and the implications of its use in applications demanding high fidelity and accuracy.

This advancement in AI, particularly generative AI and LLM security, is noteworthy for professionals in the field, as it pushes the boundaries of what AI can achieve, blending capabilities that improve user interaction and information processing.

.NET 1 2 3 4 5 7 a access accuracy Act actions advancement after AI AI technologies alt and API Application applications Arize art as being blending building C capabilities chat Claude Claude Artifact Col command constant Context context window css Curl D data data extraction day de demand development document e effective end environment eXtended extraction fact feature features first for full function calling future future considerations g Gemini Gemma Gemma 3 Gemma3 Gen generative Generative AI Go Google Google AI Studio grade grading gs H hack hacker Hacker News health high Highlight hosted HR http HTTPS ICO implications in information installation inter interaction interactions iOS IRS ite J Java JavaScript job Just k Key keys l language language processing language understanding large led Li llama llm llms lm local long low making man math Mila mini Mistral ML mlx modal Mode model model weights models ModI multi Multimodal multimodal capabilities my N news no notes NPU o oE of off ollama on one open open-source openai OPM opt out Outputs over platform play point porting pre pricing process processing professionals prompt prompts question QUIC R rag rate RCE reasoning red release response responses return right Ro RSA Rust s S/4 SAP sec security side Sig Sim Simple single source specific specific commands storage structured structured output structured outputs summarization SVG T Tags: Task tasks tech technologies test Testing text the Time to token token count tokens tool tools Tor TP trie trust UI up upgrade US usage use user user interaction Users V version Vision vision-llms web web development Well Wi Wind Windows x