Simon Willison’s Weblog: Qwen/Qwen3-30B-A3B-Instruct-2507

Jul 29, 2025

—

Source URL: https://simonwillison.net/2025/Jul/29/qwen3-30b-a3b-instruct-2507/
Source: Simon Willison’s Weblog
Title: Qwen/Qwen3-30B-A3B-Instruct-2507

Feedly Summary: Qwen/Qwen3-30B-A3B-Instruct-2507
New model update from Qwen, improving on their previous Qwen3-30B-A3B release from late April. In their tweet they said:

Smarter, faster, and local deployment-friendly.
✨ Key Enhancements:
✅ Enhanced reasoning, coding, and math skills
✅ Broader multilingual knowledge
✅ Improved long-context understanding (up to 256K tokens)
✅ Better alignment with user intent and open-ended tasks
✅ No more blocks — now operating exclusively in non-thinking mode
🔧 With 3B activated parameters, it’s approaching the performance of GPT-4o and Qwen3-235B-A22B Non-Thinking

I tried the chat.qwen.ai hosted model with “Generate an SVG of a pelican riding a bicycle" and got this:

I particularly enjoyed this detail from the SVG source code:
<!– Bonus: Pelican’s smile –>
<path d="M245,145 Q250,150 255,145" fill="none" stroke="#d4a037" stroke-width="2"/>

I went looking for quantized versions that could fit on my Mac and found lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-8bit from LM Studio. Getting that up and running was a 32.46GB download and it appears to use just over 30GB of RAM.
The pelican I got from that one wasn’t as good:

I then tried that local model on the "Write an HTML and JavaScript page implementing space invaders" task that I ran against GLM-4.5 Air. The output looked promising, in particular it seemed to be putting more effort into the design of the invaders (GLM-4.5 Air just used rectangles):
// Draw enemy ship
ctx.fillStyle = this.color;

// Ship body
ctx.fillRect(this.x, this.y, this.width, this.height);

// Enemy eyes
ctx.fillStyle = ‘#fff’;
ctx.fillRect(this.x + 6, this.y + 5, 4, 4);
ctx.fillRect(this.x + this.width – 10, this.y + 5, 4, 4);

// Enemy antennae
ctx.fillStyle = ‘#f00’;
if (this.type === 1) {
// Basic enemy
ctx.fillRect(this.x + this.width / 2 – 1, this.y – 5, 2, 5);
} else if (this.type === 2) {
// Fast enemy
ctx.fillRect(this.x + this.width / 4 – 1, this.y – 5, 2, 5);
ctx.fillRect(this.x + (3 * this.width) / 4 – 1, this.y – 5, 2, 5);
} else if (this.type === 3) {
// Armored enemy
ctx.fillRect(this.x + this.width / 2 – 1, this.y – 8, 2, 8);
ctx.fillStyle = ‘#0f0’;
ctx.fillRect(this.x + this.width / 2 – 1, this.y – 6, 2, 3);
}
But the resulting code didn’t actually work:

That same prompt against the unquantized Qwen-hosted model produced a different result which sadly also resulted in an unplayable game – this time because everything moved too fast.
This new Qwen model is a non-reasoning model, whereas GLM-4.5 and GLM-4.5 Air are both reasoners. It looks like at this scale the "reasoning" may make a material difference in terms of getting code that works out of the box.
Tags: ai, generative-ai, llms, qwen, mlx, llm-reasoning, llm-release, lm-studio

AI Summary and Description: Yes

Summary: The text discusses the latest updates to the Qwen/Qwen3-30B-A3B model, highlighting its enhancements in reasoning, coding, and multilingual capabilities. It provides insights into local deployment performance and contrasts the outputs with other models like GLM-4.5, indicating implications for developers and AI practitioners in achieving effective generative AI applications.

Detailed Description:

The text centers around the latest model update from Qwen, dubbed Qwen3-30B-A3B, which enhances its previous release from April. Here are the key insights and details outlined in the text:

– **Model Enhancements**:
– **Reasoning, Coding, and Math Skills**: The model showcases improved capabilities in reasoning tasks, programming, and mathematical operations.
– **Broader Multilingual Knowledge**: This enhancement allows the model to generate content in a wider array of languages, benefiting diverse user bases.
– **Long-Context Understanding**: The model can now handle inputs with up to 256,000 tokens (256K), which is critical for in-depth and continuous dialogues or complex task instructions.
– **Alignment with User Intent**: Better performance in aligning output with user instructions and handling open-ended prompts, leading to more useful responses.
– **Non-Thinking Mode**: The transition away from “thinking” blocks means the model operates more straightforwardly, potentially improving efficiency in generating responses.

– **Performance Comparison**:
– The text makes a comparative analysis between Qwen3-30B-A3B and other models like GPT-4o and GLM-4.5, indicating that it is approaching their performance levels.
– Real-world tests with prompts such as generating SVG graphics and writing game code illustrate the model’s practical application. However, it also highlights drawbacks, where the generated code did not function correctly, pointing to a need for further refinement of the model’s reasoning abilities.

– **Local Deployment**:
– The process of running the model locally through a quantized version is discussed, emphasizing its significant resource requirements (over 30GB of RAM), which may pose challenges for end-users with limited hardware capabilities.

– **Insights for Professionals**:
– This report is crucial for AI developers, as it outlines practical implications and limitations in deploying advanced AI models for real-world applications.
– It highlights the importance of reasoning capabilities in generative AI for producing functional outputs, informing decisions on model selection based on task needs.

In summary, the text provides valuable insights about the advancements in the Qwen model that can directly influence developers and practitioners in AI/ML, especially in areas of generative AI security and software development involving AI tools.

-4o .NET 1 10 2 2025 24 3 4 5 7 a Act advanced advanced AI advancement advancements age AI AI applications AI developers ai model AI models AI security AI tool AI tools air alignment analysis and anti app Application applications ARM art as at ated based Bi bicycle Box C capabilities centers challenge challenges chat CI CIA co code coding Col community content Context context understanding continuous critical D de decision decisions deployment depth design developer developers development e edge effective efficiency election end fast fine for friendly function g Gen generated generated code generative Generative AI Go GPT GPT-4o graph graphics gs H handling hardware hardware capabilities high Highlight hosted HR http HTTPS implications improving in Influence insights instruction intent io ite J Java JavaScript Just k Key knowledge l language leading led level Li limitations llm llms lm local local deployment long low M mac man math mathematical mean ML mlx Mode model model selection models multi Multil multilingual Multilingual capabilities my N needs new no non NPU o of on one open operation operations OPM oS other out output Outputs over parameter Paris pelican per performance performance comparison phi play point potential practical implications pre pro process professionals programming prompt prompts ps Q quantized Qwen R rate Ray RCE re real real-world applications reasoning reasoning abilities reasoning capabilities reasoning mode reasoning model reasoning tasks red release report Requirements resource resource requirements response responses riding Ro s sam Scale sec security Sig Sim skills software software development source source code SSE studio SVG T Tags: Tails Task tasks ted test text text understanding the thinking Time to token tokens tool tools TP transition trie type UI under up update updates US use user user base user intent Users V val version Ware web Wi world world application world applications writing x yt z