Simon Willison’s Weblog: State-of-the-art text embedding via the Gemini API

Source URL: https://simonwillison.net/2025/Mar/7/gemini-embeddings/#atom-everything
Source: Simon Willison’s Weblog
Title: State-of-the-art text embedding via the Gemini API

Feedly Summary: State-of-the-art text embedding via the Gemini API
Gemini just released their new text embedding model, with the snappy name gemini-embedding-exp-03-07. It supports 8,000 input tokens – up from 3,000 – and outputs vectors that are a lot larger than their previous text-embedding-004 model – that one output size 768 vectors, the new model outputs 3072.
Storing that many floating point numbers for each embedded record can use a lot of space. thankfully, the new model supports Matryoshka Representation Learning – this means you can simply truncate the vectors to trade accuracy for storage.
I added support for the new model in llm-gemini 0.14. LLM doesn’t yet have direct support for Matryoshka truncation so I instead registered different truncated sizes of the model under different IDs: gemini-embedding-exp-03-07-2048, gemini-embedding-exp-03-07-1024, gemini-embedding-exp-03-07-512, gemini-embedding-exp-03-07-256, gemini-embedding-exp-03-07-128.
The model is currently free while it is in preview, but comes with a strict rate limit – 5 requests per minute and just 100 requests a day. I quickly tripped those limits while testing out the new model – I hope they can bump those up soon.
Via @officiallogank
Tags: embeddings, gemini, ai, google, llm

AI Summary and Description: Yes

Summary: The release of Gemini’s new text embedding model, gemini-embedding-exp-03-07, significantly enhances the capabilities of AI-driven applications by increasing input token support and output vector dimensions. This advancement, combined with the innovative Matryoshka Representation Learning for efficient storage, presents important considerations for developers in AI and cloud computing.

Detailed Description: The announcement of the new text embedding model through the Gemini API showcases advancements in AI technology that may impact various applications and industries. Here are the key points:

– **Enhanced Capacity**: The new model supports 8,000 input tokens, up from the previous limit of 3,000, allowing for more comprehensive text processing.
– **Improved Output Vectors**: The model’s output size has increased from 768 vectors in the prior version to 3072 in the new version, indicating an enhancement in the model’s capability to represent textual information accurately.
– **Storage Efficiency**: With the increased size of embedded records, there are concerns about storage. The model introduces Matryoshka Representation Learning, which allows users to truncate vectors. This means they can choose to reduce vector size to save space, albeit at the cost of some accuracy.
– **LLM Integration**: Support for the new model has been added in llm-gemini version 0.14. Although direct support for Matryoshka truncation isn’t available yet, truncated sizes are registered under unique IDs for easier access and use.
– **Testing Limitations**: Currently, the model is available in a preview phase with usage limits set to five requests per minute and a maximum of 100 requests per day, which may hinder extensive testing and adoption during the initial rollout.

These advancements reflect the ongoing evolution in AI and cloud computing, particularly in the management of data embeddings, which is essential for developing applications in natural language processing, machine learning, and AI-based systems. Professionals in these fields should stay informed on such developments to leverage the full potential of new technologies while considering the implications of storage and accuracy.