Simon Willison’s Weblog: Codestral Embed – Experimental News Clipping Site

Source URL: https://simonwillison.net/2025/May/28/codestral-embed/#atom-everything
Source: Simon Willison’s Weblog
Title: Codestral Embed

Feedly Summary: Codestral Embed
Brand new embedding model from Mistral, specifically trained for code. Mistral claim that:

Codestral Embed significantly outperforms leading code embedders in the market today: Voyage Code 3, Cohere Embed v4.0 and OpenAI’s large embedding model.

The model is designed to work at different sizes. They show performance numbers for 256, 512, 1024 and 1546 sized vectors in binary (256 bits = 32 bytes of storage per record), int8 and float32 representations.

The dimensions of our embeddings are ordered by relevance. For any integer target dimension n, you can choose to keep the first n dimensions for a smooth trade-off between quality and cost.

I think that means they’re using Matryoshka embeddings.
Here’s the problem: the benchmarks look great, but the model is only available via their API (or for on-prem deployments at “contact us" prices).
I’m perfectly happy to pay for API access to an embedding model like this, but I only want to do that if the model itself is also open weights so I can maintain the option to run it myself in the future if I ever need to.
The reason is that the embeddings I retrieve from this API only maintain their value if I can continue to calculate more of them in the future. If I’m going to spend money on calculating and storing embeddings I want to know that value is guaranteed far into the future.
If the only way to get new embeddings is via an API, and Mistral shut down that API (or go out of business), that investment I’ve made in the embeddings I’ve stored collapses in an instant.
I don’t actually want to run the model myself. Paying Mistral $0.15 per million tokens (50% off for batch discounts) to not have to waste my own server’s RAM and GPU holding that model in memory is great deal!
In this case, open weights is a feature I want purely because it gives me complete confidence in the future of my investment.
Tags: mistral, ai, embeddings

AI Summary and Description: Yes

Summary: The text discusses a new embedding model from Mistral, highlighting its competitive performance compared to existing products. A key concern raised is regarding the model’s availability through APIs and the implications of relying on this access for long-term investment in embeddings.

Detailed Description:
The text provides insights into the latest embedding model named Codestral Embed, developed by Mistral. This model offers several noteworthy features and raises questions about the sustainability of investments in AI embeddings. Here are the major points highlighted in the analysis:

* **Performance Comparison**:
– Codestral Embed reportedly outperforms notable competitors such as Voyage Code 3, Cohere Embed v4.0, and OpenAI’s embedding model.
– Performance metrics cover various vector sizes: 256, 512, 1024, and 1546 bits, presented in different representations (binary, int8, float32).

* **Embedding Dimensions**:
– The model employs an approach that organizes dimensions by relevance, allowing users to select the first n dimensions for a balance between quality and cost.

* **Concerns with API Dependency**:
– The model is primarily accessible via an API, with on-premises options available at custom pricing.
– The author expresses a need for open weights, emphasizing that reliance solely on API access could jeopardize the viability of stored embeddings if the service is discontinued or the provider fails.

* **Investment Assurance**:
– There is a focus on the long-term value of embeddings, stressing that the ability to compute new embeddings independently is crucial for protecting the investment over time.
– The author is willing to pay for the API access if they can ensure future accessibility and control over embeddings.

* **Pricing Insight**:
– The API pricing structure mentioned ($0.15 per million tokens with discounts) hints at the economic considerations underpinning the decision to use a hosted model versus self-hosting.

In summary, the Codestral Embed model signals a potentially transformative innovation in the field of AI embeddings. However, the concerns surrounding API dependency serve as a reminder for professionals in AI and cloud security to consider the implications of vendor lock-in and the importance of maintaining flexibility and control over essential AI resources.