Simon Willison’s Weblog: Mistral-Small 3.2

Source URL: https://simonwillison.net/2025/Jun/20/mistral-small-32/
Source: Simon Willison’s Weblog
Title: Mistral-Small 3.2

Feedly Summary: Mistral-Small 3.2
Released on Hugging Face a couple of hours ago, so far there aren’t any quantizations to run it on a Mac but I’m sure those will emerge pretty quickly.
This is a minor bump to Mistral Small 3.1, one of my favorite local models. I’ve been running Small 3.1 via Ollama where it’s a 15GB download – these 24 billion parameter models are a great balance between capabilities and not using up all of the available RAM on my laptop. I expect Ollama will add 3.2 imminently.
According to Mistral:

Small-3.2 improves in the following categories:

Instruction following: Small-3.2 is better at following precise instructions
Repetition errors: Small-3.2 produces less infinite generations or repetitive answers
Function calling: Small-3.2’s function calling template is more robust (see here and examples)

Interestingly they recommend running it with a temperature of 0.15 – many models recommend a default of 0.7. They also provide a suggested system prompt which includes a note that “Your knowledge base was last updated on 2023-10-01".
It’s not currently available via Mistral’s API, or through any of the third-party LLM hosting vendors that I’ve checked, so I’ve not been able to run a prompt through the model myself yet.
Tags: ai, generative-ai, llms, mistral, llm-tool-use, llm-release

AI Summary and Description: Yes

Summary: The text discusses the recent release of Mistral-Small 3.2, highlighting its enhancements over version 3.1, particularly in instruction following, reducing repetition errors, and improvements in function calling. These upgrades are significant for AI practitioners looking to leverage local models for efficient processing on devices with limited RAM.

Detailed Description: The text provides insights into the new version of the Mistral-Small model tailored for local deployment. Key highlights from the release include:

* **Release Context**:
– Mistral-Small 3.2 was recently made available on Hugging Face.
– There is currently no quantization option for running it on Mac systems, but these are anticipated soon.
– This new version is a minor update from Mistral Small 3.1.

* **Model Specifications**:
– The model features 24 billion parameters, striking a balance between capability and resource efficiency—ideal for users with constraints on RAM.
– Users report running version 3.1 using Ollama, which involves a 15GB download.

* **Improvements in Small-3.2**:
– **Instruction Following**: Enhanced precision in adhering to user instructions, thus increasing usability in interactive scenarios.
– **Repetition Errors**: The model has diminished its tendency to produce repetitive responses or infinite loops, which is a common issue in generative models.
– **Function Calling**: An upgraded function calling template, which likely aids in improved performance for specific tasks and API integrations.

* **Operational Recommendations**:
– Mistral suggests using a lower temperature setting of 0.15 for outputs, contrasting the typical default of 0.7. This change is significant for developers looking to generate more focused and less random responses from the model.
– The system prompt includes a note regarding the last update to the knowledge base, implying the model’s awareness of its temporal context.

* **Availability**:
– Mistral-Small 3.2 is not yet accessible via Mistral’s API or third-party hosting vendors, limiting its immediate testing and application by users until these avenues are established.

Overall, the release of Mistral-Small 3.2 represents a meaningful advancement in local AI model deployment, particularly for developers focused on practical applications in environments where resources are limited. Its improvements in instruction adherence and error reduction are particularly noteworthy for enhancing user experience and application reliability.