Simon Willison’s Weblog: Codestral 25.01

Source URL: https://simonwillison.net/2025/Jan/13/codestral-2501/
Source: Simon Willison’s Weblog
Title: Codestral 25.01

Feedly Summary: Codestral 25.01
Brand new code-focused model from Mistral. Unlike the first Codestral this one isn’t (yet) available as open weights. The model has a 256k token context – a new record for Mistral.
The new model scored an impressive joint first place with Claude 3.5 Sonnet on the Copilot Arena leaderboard.
Chatbot Arena announced Copilot Arena on 12th November 2024. The leaderboard is driven by results gathered through their Copilot Arena VS Code extensions, which provides users with free access to models in exchange for logged usage data plus their votes as to which of two models returns the most useful completion.
So far the only other independent benchmark result I’ve seen is for the Aider Polyglot test. This was less impressive:

Codestral 25.01 scored 11% on the aider polyglot benchmark.
62% o1 (high)
48% DeepSeek V3
16% Qwen 2.5 Coder 32B Instruct
11% Codestral 25.01
4% gpt-4o-mini

The new model can be accessed via my llm-mistral plugin using the codestral-latest alias:
llm install llm-mistral
llm keys set mistral
# Paste Mistral API key here
llm -m codestral-latest “JavaScript to reverse an array"

Via @sophiamyang
Tags: mistral, llm, ai-assisted-programming, generative-ai, ai, llms, aider, evals

AI Summary and Description: Yes

Summary: The text discusses the release of a new code-focused AI model, Codestral 25.01, from Mistral, highlighting its noteworthy context length and performance in AI-focused benchmarks. This information is particularly relevant for professionals interested in AI security, LLMs, and generative AI applications.

Detailed Description:

– The new model Codestral 25.01 represents a significant advancement in AI due to its remarkable 256k token context length, which is noted as a record for Mistral.
– The model has achieved recognition by securing a joint first place with another model, Claude 3.5 Sonnet, on the Copilot Arena leaderboard, indicating its competitive performance in the AI space.
– The leaderboard’s rankings are derived from data collected through their VS Code extensions, which engage users by trading logged usage data for access to model evaluations and voting capabilities.
– Intelligence assessments reflect that Codestral 25.01’s performance on the Aider Polyglot test, while relatively lower compared to other models, presents an insight into areas for improvement:
– 62% for o1 (high)
– 48% for DeepSeek V3
– 16% for Qwen 2.5 Coder 32B Instruct
– 11% for Codestral 25.01
– 4% for gpt-4o-mini
– Access to the new model is facilitated through the llm-mistral plugin, enabling users to engage with it easily. The installation instructions provided demonstrate the accessibility of this technology for developers looking to enhance their programming activities leveraging AI.
– Significant tags associated with the model include mistral, llm, ai-assisted programming, generative AI, and LLMs, emphasizing its application relevance within the AI and software security landscapes.

The information points to an evolving landscape in AI model development, providing valuable insights for professionals in AI security, particularly those assessing the capabilities and measures for deploying these advanced technologies effectively.