Simon Willison’s Weblog: Devstral

Source URL: https://simonwillison.net/2025/May/21/devstral/#atom-everything
Source: Simon Willison’s Weblog
Title: Devstral

Feedly Summary: Devstral
New Apache 2.0 licensed LLM release from Mistral, this time specifically trained for code.

Devstral achieves a score of 46.8% on SWE-Bench Verified, outperforming prior open-source SoTA models by more than 6% points. When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 (671B) and Qwen3 232B-A22B.

I’m always suspicious of small models like this that claim great benchmarks against much larger rivals, but there’s a Devstral model that is just 14GB on Ollama to it’s quite easy to try out for yourself.
I fetched it like this:
ollama pull devstral

Then ran it in a llm chat session with llm-ollama like this:
llm install llm-ollama
llm chat -m devstral

Initial impressions: I think this one is pretty good! Here’s a full transcript where I had it write Python code to fetch a CSV file from a URL and import it into a SQLite database, creating the table with the necessary columns. Honestly I need to retire that challenge, it’s been a while since a model failed at it, but it’s still interesting to see how it handles follow-up prompts to demand things like asyncio or a different HTTP client library.
Tags: llm, ai, ollama, llms, llm-release, mistral, ai-assisted-programming, generative-ai

AI Summary and Description: Yes

Summary: The text discusses the release of Devstral, a new open-source LLM (Large Language Model) from Mistral, specifically trained for code generation. It highlights its performance benchmarks against larger models and includes practical insights on how to deploy and test this model, making it relevant for professionals interested in AI and LLM security, as well as AI-assisted programming.

Detailed Description: The text provides essential insights into the capabilities and performance of Devstral, a newly released open-source LLM, and includes user experiences and benchmarks against larger models. Key points include:

– **Model Performance**: Devstral recorded a remarkable 46.8% on the SWE-Bench Verified benchmark, surpassing previously established state-of-the-art (SoTA) open-source models by over 6 percentage points.
– **Comparative Evaluation**: The model outperformed significantly larger models, such as Deepseek-V3-0324 (671B parameters) and Qwen3 232B-A22B.
– **Model Accessibility**: Notably, Devstral is relatively lightweight at just 14GB, making it easy for users to download and test. The installation command mentioned is:
– `ollama pull devstral`
– **User Experience**: The author shares initial impressions of the model, portraying it as effective in generating Python code to perform specific tasks, such as fetching data from a CSV file and handling various follow-up prompts.

Practical Insights:
– The benchmarks and user tests provide a real-world evaluation of LLM performance that security professionals, developers, and engineers can use to assess the model’s applicability for their projects.
– The discussion around follow-up prompts emphasizes the model’s agility in handling complex programming requests, relevant for AI-assisted programming environments.

Overall, Devstral seems to offer a promising tool for developers in the AI space, particularly when considering security and compliance aspects in implementing LLMs in software development practices.