Simon Willison’s Weblog: Notes on Google’s Gemma 3

Source URL: https://simonwillison.net/2025/Mar/12/gemma-3/
Source: Simon Willison’s Weblog
Title: Notes on Google’s Gemma 3

Feedly Summary: Google’s Gemma team released an impressive new model today (under their not-open-source Gemma license). Gemma 3 comes in four sizes – 1B, 4B, 12B, and 27B – and while 1B is text-only the larger three models are all multi-modal for vision:

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling.

I tried out the largest model using the latest Ollama – this is the second time I’ve spotted a major model release partnering with Ollama on launch day, the first being Mistral Small 3.
I ran this (after upgrading Ollama through their menu icon upgrade option):
ollama pull gemma3:27b

That pulled 17GB of model weights. I’ve been trying it out using LLM and llm-ollama:
llm install llm-ollama
llm -m gemma3:27b ‘Build a single page HTML+CSS+JavaScript UI that gives me a large textarea for writing in which constantly saves what I have entered to localStorage (restoring when I reload the page) and displays a word counter’

That was a replay of a prompt I ran against Claude Artifacts a few months ago. Here’s what Gemma built, and the full chat transcript. It’s a simple example but it worked just right.

Something I’ve been curious about recently is longer context support: how well can a local model on my laptop deal with summarization or data extraction tasks against longer pieces of text?
I decided to try my Hacker News summarize script using Gemma, against the thread there discussing the Gemma 3 technical paper.
First I did a quick token count (using the OpenAI tokenizer but it’s usually a similar number to other models):
curl ‘https://hn.algolia.com/api/v1/items/43340491’ | ttok

This returned 22,260 – well within Gemma’s documented limits but still a healthy number considering just last year most models topped out at 4,000 or 8,000.
I ran my script like this:
hn-summary.sh 43340491 -m gemma3:27b

It did a pretty good job! Here’s the full prompt and response. The one big miss is that it ignored my instructions to include illustrative quotes – I don’t know if modifying the prompt will fix that but it’s disappointing that it didn’t handle that well, given how important direct quotes are for building confidence in RAG-style responses.
Here’s what I got for Generate an SVG of a pelican riding a bicycle:
llm -m gemma3:27b ‘Generate an SVG of a pelican riding a bicycle’

You can also try out the new Gemma in Google AI Studio, and via their API. I added support for it to llm-gemini 0.15, though sadly it appears vision mode doesn’t work with that API hosted model yet.
llm install -U llm-gemini
llm keys set gemini
# paste key here
llm -m gemma-3-27b-it ‘five facts about pelicans of interest to skunks’

Here’s what I got. I’m not sure how pricing works for that hosted model.
Gemma 3 is also already available through MLX-VLM – here’s their model collection – but I haven’t tried that version yet.
Tags: google, ai, generative-ai, llms, vision-llms, mlx, ollama, pelican-riding-a-bicycle, gemma

AI Summary and Description: Yes

Summary: Google’s Gemma team has released the Gemma 3 model, featuring multi-modal capabilities that enhance AI functions such as vision-language processing and language understanding. This model marks a significant advance in the capabilities of AI technologies, particularly in handling extensive context and improves interactions with users through structured outputs and function calling.

Detailed Description:
– **Model Overview**:
– **Gemma 3 Family**: The new model comes in four versions: 1B (text-only), 4B, 12B, and the largest 27B, which supports both text and vision inputs.
– **Multimodal Capabilities**: Gemma 3’s multi-modal capability allows it to process and generate outputs that integrate both visual and textual information.

– **Technical Enhancements**:
– **Extended Context Window**: The model can handle context windows of up to 128,000 tokens, vastly exceeding the typical limits of previous models (usually between 4,000 to 8,000 tokens).
– **Language Support**: It understands over 140 languages, making it versatile for global applications.
– **Improved Functionalities**: The enhanced math, reasoning, and chat capabilities include structured outputs and the ability to invoke functions.

– **Practical Application Example**:
– The largest model (27B) was used to generate a web UI that saves a user’s input to localStorage, showcasing practical usage in web development.
– When tested against a summarization script for a technical paper, Gemma 3 managed the task successfully, highlighting its ability to process longer texts—although it did miss including direct quotes, which is significant for building trust in responses.

– **Installation and Testing**:
– The model is accessible via the Ollama platform and can be pulled using specific commands.
– It can also be integrated with other tools like Google AI Studio and MLX-VLM, although some features (like vision mode) may not be fully functional in certain environments.

– **Future Considerations**:
– The release prompts questions about how users can effectively leverage the long context support for tasks like summarization or data extraction.
– There is also curiosity about model pricing and the implications of its use in applications demanding high fidelity and accuracy.

This advancement in AI, particularly generative AI and LLM security, is noteworthy for professionals in the field, as it pushes the boundaries of what AI can achieve, blending capabilities that improve user interaction and information processing.