Simon Willison’s Weblog: Introducing EmbeddingGemma

Sep 4, 2025

—

Source URL: https://simonwillison.net/2025/Sep/4/embedding-gemma/#atom-everything
Source: Simon Willison’s Weblog
Title: Introducing EmbeddingGemma

Feedly Summary: Introducing EmbeddingGemma
Brand new open weights (under the slightly janky Gemma license) 308M parameter embedding model from Google:

Based on the Gemma 3 architecture, EmbeddingGemma is trained on 100+ languages and is small enough to run on less than 200MB of RAM with quantization.

It’s available via sentence-transformers, llama.cpp, MLX, Ollama, LMStudio and more.
As usual for these smaller models there’s a Transformers.js demo (via) that runs directly in the browser (in Chrome variants) – Semantic Galaxy loads a ~400MB model and then lets you run embeddings against hundreds of text sentences, map them in a 2D space and run similarity searches to zoom to points within that space.

Tags: google, ai, embeddings, transformers-js, gemma

AI Summary and Description: Yes

Summary: The text introduces EmbeddingGemma, an efficient open-source embedding model designed for multilingual applications. Its lightweight architecture and integration with various platforms make it particularly relevant for AI professionals focusing on natural language processing and embedding techniques.

Detailed Description: The content highlights EmbeddingGemma, a new embedding model from Google that has implications for AI, particularly in natural language processing tasks. Here are the major points of significance:

– **Model Overview**:
– EmbeddingGemma is a 308M parameter embedding model.
– It is based on the Gemma 3 architecture, known for its performance and efficiency.

– **Efficiency**:
– The model is designed to be lightweight, operating on less than 200MB of RAM when quantized. This is critical for deployment in environments with limited resources.

– **Multilingual Capability**:
– The model is trained on over 100 languages, making it versatile for global applications.

– **Accessibility**:
– Available through various platforms including:
– sentence-transformers
– llama.cpp
– MLX
– Ollama
– LMStudio
– A diverse range of integrations enhances its usability across different systems.

– **Interactive Tools**:
– The model features a demo via Transformers.js that functions directly in the browser.
– Users can load a ~400MB model and perform embeddings on multiple text sentences, which can be visualized in a 2D space for similarity searches.

– **Practical Implications**:
– The availability of such models democratizes access to advanced AI capabilities, enabling developers to incorporate sophisticated embedding techniques into their applications without heavy infrastructure costs.
– The multilingual support broadens market access and user engagement in diverse linguistic settings.

This introduction of EmbeddingGemma represents a notable advancement in accessible AI technologies, offering security and compliance professionals opportunities to innovate and adapt language processing solutions effectively.

.NET 1 10 2 2025 3 4 5 a access accessibility Act ads advanced advanced AI advancement age AI AI capabilities AI technologies All and anti app Application applications Arch architecture Aria art as at ated availability based Bi browser C capabilities capability Chrome CI co compliance compliance professionals content cost Costs cpp critical cross D de demo deployment design developer developers e effective efficiency efficient embedding model embeddings engagement environment environments feature features for function g Gemma Gemma 3 glob Global Go Google gs H high Highlight HR http HTTPS implications in infrastructure infrastructure costs integration integrations inter interactive tools io Iron ite J js k l language language processing led Li license lightweight llama llama.cpp lm load M making man market market access Mila ML mlx Mode model model design models multi Multil multilingual multilingual support N natural language natural language processing new no o OCR of off ollama on ons open open weights open-source oS oss out over parameter per performance phi platform platforms point practical implications pre pro process processing professionals ps Q quantization quantized R rate RCE re red resource resources Ro row RSA s search sec security security and compliance Semantic settings Sig Sim similarity search Simon Willison small smaller models solutions source space studio support system systems T Tags: Task tasks tech techniques technologies ted text the to tool tools TP trained Transform transformer transformers transformers-js UI under up US usability use user user engagement Users V web weight Wi x yt z Zoom