Source URL: https://simonwillison.net/2025/Sep/12/qwen3-next/#atom-everything
Source: Simon Willison’s Weblog
Title: Qwen3-Next-80B-A3B: 🐧🦩 Who needs legs?!
Feedly Summary: Qwen3-Next-80B-A3B
Qwen announced two new models via their Twitter account (nothing on their blog yet): Qwen3-Next-80B-A3B-Instruct and Qwen3-Next-80B-A3B-Thinking.
They make some big claims on performance:
Qwen3-Next-80B-A3B-Instruct approaches our 235B flagship.
Qwen3-Next-80B-A3B-Thinking outperforms Gemini-2.5-Flash-Thinking.
The name “80B-A3B" indicates 80 billion parameters of which only 3 billion are active at a time. You still need to have enough GPU-accessible RAM to hold all 80 billion in memory at once but only 3 billion will be used for each round of inference, which provides a significant speedup in responding to prompts.
More details from their tweet:
80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)
Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed & recall
Ultra-sparse MoE: 512 experts, 10 routed + 1 shared
Multi-Token Prediction → turbo-charged speculative decoding
Beats Qwen3-32B in perf, rivals Qwen3-235B in reasoning & long-context
The models on Hugging Face are around 150GB each so I decided to try them out via OpenRouter instead (Thinking, Instruct).
I’m used my llm-openrouter plugin. I installed it like this:
llm install llm-openrouter
llm keys set openrouter
# paste key here
Then found the model IDs with this command:
llm models -q next
Which output:
OpenRouter: openrouter/qwen/qwen3-next-80b-a3b-thinking
OpenRouter: openrouter/qwen/qwen3-next-80b-a3b-instruct
I have an LLM prompt template saved called pelican-svg which I created like this:
llm "Generate an SVG of a pelican riding a bicycle" –save pelican-svg
This means I can run my pelican benchmark like this:
llm -t pelican-svg -m openrouter/qwen/qwen3-next-80b-a3b-thinking
Or like this:
llm -t pelican-svg -m openrouter/qwen/qwen3-next-80b-a3b-instruct
Here’s the thinking model output (exported with llm logs -c | pbcopy after I ran the prompt):
I liked the "Whimsical style with smooth curves and friendly proportions (no anatomical accuracy needed for bicycle riding!)" note in the transcript.
The instruct (non-reasoning) model gave me this:
"Who needs legs!?" indeed! I like the penguin-flamingo emoji sequence it’s decided on for pelicans.
Tags: ai, generative-ai, llms, llm, qwen, pelican-riding-a-bicycle, llm-reasoning, llm-release, openrouter, ai-in-china
AI Summary and Description: Yes
Summary: The text provides an overview of Qwen’s new AI models, Qwen3-Next-80B-A3B-Instruct and Qwen3-Next-80B-A3B-Thinking, highlighting their performance claims, architectural details, and use case examples. It is particularly relevant for professionals in AI security and generative AI security due to its implications for model optimization and inference speed.
Detailed Description:
The text describes the launch of two new AI models by Qwen, elaborating on their capabilities, performance metrics, and architectural innovations. Below are the key points discussed in the text:
– **Model Specifications**:
– The Qwen3-Next-80B-A3B model features 80 billion parameters, but only 3 billion are active during inference, allowing for quicker processing and reduced computational costs.
– This architecture is designed for significant speed increases in response times when handling prompts, particularly with long contexts.
– **Performance Claims**:
– The Instruct model claims performance nearly equal to Qwen’s previous flagship model with 235 billion parameters.
– The Thinking model is positioned as outperforming Google’s Gemini-2.5-Flash-Thinking in reasoning capabilities.
– **Technological Innovations**:
– **Hybrid Architecture**: Utilizes Gated DeltaNet and Gated Attention techniques to optimize both speed and recall.
– **Ultra-sparse Mixture of Experts (MoE)**: Features 512 experts, with 10 of them being activated per inference request along with 1 shared, enhancing processing efficiency.
– **Multi-Token Prediction**: This method fosters more efficient speculative decoding, which further enhances response speed.
– **Practical Implementation**:
– Users can access the models via OpenRouter, and the text describes prompting techniques using installed plugins to evaluate model performance through a creative prompt related to generating SVG images.
– **Use Case**:
– The benchmark example involving generating a whimsical image of a pelican riding a bicycle showcases the models’ capabilities in producing detailed and creative outputs with specific aesthetic requests.
By understanding the performance capabilities and architectural advantages of these new models, security and compliance professionals can anticipate how these advancements in generative AI might impact the landscape of AI security, model training methodologies, and the ethical considerations around model deployments in production settings.