Simon Willison’s Weblog: State-of-the-art text embedding via the Gemini API

Mar 7, 2025

—

Source URL: https://simonwillison.net/2025/Mar/7/gemini-embeddings/#atom-everything
Source: Simon Willison’s Weblog
Title: State-of-the-art text embedding via the Gemini API

Feedly Summary: State-of-the-art text embedding via the Gemini API
Gemini just released their new text embedding model, with the snappy name gemini-embedding-exp-03-07. It supports 8,000 input tokens – up from 3,000 – and outputs vectors that are a lot larger than their previous text-embedding-004 model – that one output size 768 vectors, the new model outputs 3072.
Storing that many floating point numbers for each embedded record can use a lot of space. thankfully, the new model supports Matryoshka Representation Learning – this means you can simply truncate the vectors to trade accuracy for storage.
I added support for the new model in llm-gemini 0.14. LLM doesn’t yet have direct support for Matryoshka truncation so I instead registered different truncated sizes of the model under different IDs: gemini-embedding-exp-03-07-2048, gemini-embedding-exp-03-07-1024, gemini-embedding-exp-03-07-512, gemini-embedding-exp-03-07-256, gemini-embedding-exp-03-07-128.
The model is currently free while it is in preview, but comes with a strict rate limit – 5 requests per minute and just 100 requests a day. I quickly tripped those limits while testing out the new model – I hope they can bump those up soon.
Via @officiallogank
Tags: embeddings, gemini, ai, google, llm

AI Summary and Description: Yes

Summary: The release of Gemini’s new text embedding model, gemini-embedding-exp-03-07, significantly enhances the capabilities of AI-driven applications by increasing input token support and output vector dimensions. This advancement, combined with the innovative Matryoshka Representation Learning for efficient storage, presents important considerations for developers in AI and cloud computing.

Detailed Description: The announcement of the new text embedding model through the Gemini API showcases advancements in AI technology that may impact various applications and industries. Here are the key points:

– **Enhanced Capacity**: The new model supports 8,000 input tokens, up from the previous limit of 3,000, allowing for more comprehensive text processing.
– **Improved Output Vectors**: The model’s output size has increased from 768 vectors in the prior version to 3072 in the new version, indicating an enhancement in the model’s capability to represent textual information accurately.
– **Storage Efficiency**: With the increased size of embedded records, there are concerns about storage. The model introduces Matryoshka Representation Learning, which allows users to truncate vectors. This means they can choose to reduce vector size to save space, albeit at the cost of some accuracy.
– **LLM Integration**: Support for the new model has been added in llm-gemini version 0.14. Although direct support for Matryoshka truncation isn’t available yet, truncated sizes are registered under unique IDs for easier access and use.
– **Testing Limitations**: Currently, the model is available in a preview phase with usage limits set to five requests per minute and a maximum of 100 requests per day, which may hinder extensive testing and adoption during the initial rollout.

These advancements reflect the ongoing evolution in AI and cloud computing, particularly in the management of data embeddings, which is essential for developing applications in natural language processing, machine learning, and AI-based systems. Professionals in these fields should stay informed on such developments to leverage the full potential of new technologies while considering the implications of storage and accuracy.

.NET 1 2 24 3 4 5 7 a access accuracy Act adoption advancement advancements AI AI technology alt and API Application applications art as based based Systems by C capabilities capacity CERN CIA Cloud cloud computing Computing concerns cost Current D data day de developer developers development driven driven applications e efficiency efficient embedding model embeddings exp floating point numbers for free full g Gemini GIS Go Google gs H HR http HTTPS implications in information integration J Just k Key l language language processing large learning led Li limitations llm lm low mac machine Machine Learning man management Matryoshka Representation Learning max mini Mode model model outputs model support N natural language natural language processing no NPU o oE of off on on learning one OPM opt out Outputs point potential pre Preview process processing professionals Py QUIC R rag rate RCE red release representation Ro s side Sig Sim Snap source SSE state storage storage efficiency system systems T Tags: tech technologies technology test Testing text Text Embedding the to token tokens Tor TP trade trie UI up US usage use user Users V vectors version web Wi x