Source URL: https://simonwillison.net/2025/Jul/29/qwen3-30b-a3b-instruct-2507/
Source: Simon Willison’s Weblog
Title: Qwen/Qwen3-30B-A3B-Instruct-2507
Feedly Summary: Qwen/Qwen3-30B-A3B-Instruct-2507
New model update from Qwen, improving on their previous Qwen3-30B-A3B release from late April. In their tweet they said:
Smarter, faster, and local deployment-friendly.
✨ Key Enhancements:
✅ Enhanced reasoning, coding, and math skills
✅ Broader multilingual knowledge
✅ Improved long-context understanding (up to 256K tokens)
✅ Better alignment with user intent and open-ended tasks
✅ No more
🔧 With 3B activated parameters, it’s approaching the performance of GPT-4o and Qwen3-235B-A22B Non-Thinking
I tried the chat.qwen.ai hosted model with “Generate an SVG of a pelican riding a bicycle" and got this:
I particularly enjoyed this detail from the SVG source code:
<!– Bonus: Pelican’s smile –>
<path d="M245,145 Q250,150 255,145" fill="none" stroke="#d4a037" stroke-width="2"/>
I went looking for quantized versions that could fit on my Mac and found lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-8bit from LM Studio. Getting that up and running was a 32.46GB download and it appears to use just over 30GB of RAM.
The pelican I got from that one wasn’t as good:
I then tried that local model on the "Write an HTML and JavaScript page implementing space invaders" task that I ran against GLM-4.5 Air. The output looked promising, in particular it seemed to be putting more effort into the design of the invaders (GLM-4.5 Air just used rectangles):
// Draw enemy ship
ctx.fillStyle = this.color;
// Ship body
ctx.fillRect(this.x, this.y, this.width, this.height);
// Enemy eyes
ctx.fillStyle = ‘#fff’;
ctx.fillRect(this.x + 6, this.y + 5, 4, 4);
ctx.fillRect(this.x + this.width – 10, this.y + 5, 4, 4);
// Enemy antennae
ctx.fillStyle = ‘#f00’;
if (this.type === 1) {
// Basic enemy
ctx.fillRect(this.x + this.width / 2 – 1, this.y – 5, 2, 5);
} else if (this.type === 2) {
// Fast enemy
ctx.fillRect(this.x + this.width / 4 – 1, this.y – 5, 2, 5);
ctx.fillRect(this.x + (3 * this.width) / 4 – 1, this.y – 5, 2, 5);
} else if (this.type === 3) {
// Armored enemy
ctx.fillRect(this.x + this.width / 2 – 1, this.y – 8, 2, 8);
ctx.fillStyle = ‘#0f0’;
ctx.fillRect(this.x + this.width / 2 – 1, this.y – 6, 2, 3);
}
But the resulting code didn’t actually work:
That same prompt against the unquantized Qwen-hosted model produced a different result which sadly also resulted in an unplayable game – this time because everything moved too fast.
This new Qwen model is a non-reasoning model, whereas GLM-4.5 and GLM-4.5 Air are both reasoners. It looks like at this scale the "reasoning" may make a material difference in terms of getting code that works out of the box.
Tags: ai, generative-ai, llms, qwen, mlx, llm-reasoning, llm-release, lm-studio
AI Summary and Description: Yes
Summary: The text discusses the latest updates to the Qwen/Qwen3-30B-A3B model, highlighting its enhancements in reasoning, coding, and multilingual capabilities. It provides insights into local deployment performance and contrasts the outputs with other models like GLM-4.5, indicating implications for developers and AI practitioners in achieving effective generative AI applications.
Detailed Description:
The text centers around the latest model update from Qwen, dubbed Qwen3-30B-A3B, which enhances its previous release from April. Here are the key insights and details outlined in the text:
– **Model Enhancements**:
– **Reasoning, Coding, and Math Skills**: The model showcases improved capabilities in reasoning tasks, programming, and mathematical operations.
– **Broader Multilingual Knowledge**: This enhancement allows the model to generate content in a wider array of languages, benefiting diverse user bases.
– **Long-Context Understanding**: The model can now handle inputs with up to 256,000 tokens (256K), which is critical for in-depth and continuous dialogues or complex task instructions.
– **Alignment with User Intent**: Better performance in aligning output with user instructions and handling open-ended prompts, leading to more useful responses.
– **Non-Thinking Mode**: The transition away from “thinking” blocks means the model operates more straightforwardly, potentially improving efficiency in generating responses.
– **Performance Comparison**:
– The text makes a comparative analysis between Qwen3-30B-A3B and other models like GPT-4o and GLM-4.5, indicating that it is approaching their performance levels.
– Real-world tests with prompts such as generating SVG graphics and writing game code illustrate the model’s practical application. However, it also highlights drawbacks, where the generated code did not function correctly, pointing to a need for further refinement of the model’s reasoning abilities.
– **Local Deployment**:
– The process of running the model locally through a quantized version is discussed, emphasizing its significant resource requirements (over 30GB of RAM), which may pose challenges for end-users with limited hardware capabilities.
– **Insights for Professionals**:
– This report is crucial for AI developers, as it outlines practical implications and limitations in deploying advanced AI models for real-world applications.
– It highlights the importance of reasoning capabilities in generative AI for producing functional outputs, informing decisions on model selection based on task needs.
In summary, the text provides valuable insights about the advancements in the Qwen model that can directly influence developers and practitioners in AI/ML, especially in areas of generative AI security and software development involving AI tools.