Simon Willison’s Weblog: Nomic Embed Text V2: An Open Source, Multilingual, Mixture-of-Experts Embedding Model

Source URL: https://simonwillison.net/2025/Feb/12/nomic-embed-text-v2/#atom-everything
Source: Simon Willison’s Weblog
Title: Nomic Embed Text V2: An Open Source, Multilingual, Mixture-of-Experts Embedding Model

Feedly Summary: Nomic Embed Text V2: An Open Source, Multilingual, Mixture-of-Experts Embedding Model
Nomic continue to release the most interesting and powerful embedding models. Their latest is Embed Text V2, an Apache 2.0 licensed multi-lingual 1.9GB model (here it is on Hugging Face) trained on “1.6 billion high-quality data pairs", which is the first embedding model I’ve seen to use a Mixture of Experts architecture:

In our experiments, we found that alternating MoE layers with 8 experts and top-2 routing provides the optimal balance between performance and efficiency. This results in 475M total parameters in the model, but only 305M active during training and inference.

I first tried it out using uv run like this:
uv run \
–with einops \
–with sentence-transformers \
–python 3.13 python
Then:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v2-moe", trust_remote_code=True)
sentences = ["Hello!", "¡Hola!"]
embeddings = model.encode(sentences, prompt_name="passage")
print(embeddings)
Then I got it working on my laptop using the llm-sentence-tranformers plugin like this:
llm install llm-sentence-transformers
llm install einops # additional necessary package
llm sentence-transformers register nomic-ai/nomic-embed-text-v2-moe –trust-remote-code

llm embed -m sentence-transformers/nomic-ai/nomic-embed-text-v2-moe -c ‘string to embed’

This outputs a 768 item JSON array of floating point numbers to the terminal. These are Matryoshka embeddings which means you can truncate that down to just the first 256 items and get similarity calculations that still work albeit slightly less well.
To use this for RAG you’ll need to conform to Nomic’s custom prompt format. For documents to be searched:
search_document: text of document goes here

And for search queries:
search_query: term to search for

I landed a new –prepend option for the llm embed-multi command to help with that, but it’s not out in a full release just yet.
Via @nomic_ai
Tags: embeddings, llm, nomic, ai, rag, uv, python

AI Summary and Description: Yes

Summary: The text discusses the release of Nomic’s Embed Text V2, a multilingual embedding model utilizing a Mixture-of-Experts architecture to optimize performance and efficiency. It also outlines practical implementation instructions and showcases its application in generating embeddings for various languages.

Detailed Description:
Nomic, known for developing innovative embedding models, has introduced Embed Text V2. This model, licensed under Apache 2.0, is a significant advancement in the field of machine learning, especially for those focused on natural language processing (NLP) and AI applications. Below are the key points:

– **Model Overview**:
– Size: 1.9GB multilingual model trained on 1.6 billion high-quality data pairs.
– Architecture: First embedding model to use Mixture-of-Experts (MoE) approach, featuring 8 experts and top-2 routing for enhanced performance.
– Parameter Efficiency: The model contains 475 million total parameters, with only 305 million active during training and inference, balancing computational resources with effective output.

– **Implementation Steps**:
– The text provides a step-by-step guide to using the model with Python, highlighting the libraries needed (einops and sentence-transformers).
– Sample code is included for embedding sentences in English and Spanish, demonstrating ease of integration for developers.
– Additional packages required for full functionality are mentioned, emphasizing the model’s capability to handle multiple languages.

– **Functionality**:
– The output format of the model is described, including the production of a 768-item JSON array of floating-point numbers, termed Matryoshka embeddings.
– Users can truncate these embeddings for similarity calculations without significantly affecting performance.

– **Usage Scenarios**:
– The model can be pertinent for applications utilizing Retrieval-Augmented Generation (RAG) techniques, necessitating specific prompt formats for searching documents and queries.
– A new command option (–prepend) is introduced, indicating ongoing improvements and features expected in future releases.

Overall, Nomic’s Embed Text V2 showcases advancements in AI embedding technologies, emphasizing efficiency and multilingual capabilities, while offering practical steps for implementation. Security, compliance, and privacy considerations may arise as organizations deploy such machine learning models, which should be addressed by professionals engaged in those areas.