Simon Willison’s Weblog: Qwen3-Coder: Agentic Coding in the World

Source URL: https://simonwillison.net/2025/Jul/22/qwen3-coder/
Source: Simon Willison’s Weblog
Title: Qwen3-Coder: Agentic Coding in the World

Feedly Summary: Qwen3-Coder: Agentic Coding in the World
It turns out that as I was typing up my notes on Qwen3-235B-A22B-Instruct-2507 the Qwen team were unleashing something much bigger:

Today, we’re announcing Qwen3-Coder, our most agentic code model to date. Qwen3-Coder is available in multiple sizes, but we’re excited to introduce its most powerful variant first: Qwen3-Coder-480B-A35B-Instruct — a 480B-parameter Mixture-of-Experts model with 35B active parameters which supports the context length of 256K tokens natively and 1M tokens with extrapolation methods, offering exceptional performance in both coding and agentic tasks.

This is another Apache 2.0 licensed open weights model, available as Qwen3-Coder-480B-A35B-Instruct and Qwen3-Coder-480B-A35B-Instruct-FP8 on Hugging Face.
I used qwen3-coder-480b-a35b-instruct on the Hyperbolic playground to run my “Generate an SVG of a pelican riding a bicycle" test prompt:

I actually slightly prefer the one I got from qwen3-235b-a22b-07-25.
In addition to the new model, Qwen released their own take on an agentic terminal coding assistant called qwen-code, which they describe in their blog post as being "Forked from Gemini Code" (they mean gemini-cli) – which is Apache 2.0 so a fork is in keeping with the license.
They focused really hard on code performance for this release, including generating synthetic data tested using 20,000 parallel environments on Alibaba Cloud:

In the post-training phase of Qwen3-Coder, we introduced long-horizon RL (Agent RL) to encourage the model to solve real-world tasks through multi-turn interactions using tools. The key challenge of Agent RL lies in environment scaling. To address this, we built a scalable system capable of running 20,000 independent environments in parallel, leveraging Alibaba Cloud’s infrastructure. The infrastructure provides the necessary feedback for large-scale reinforcement learning and supports evaluation at scale. As a result, Qwen3-Coder achieves state-of-the-art performance among open-source models on SWE-Bench Verified without test-time scaling.

To further burnish their coding credentials, the announcement includes instructions for running their new model using both Claude Code and Cline using custom API base URLs that point to Qwen’s own compatibility proxies.
Pricing for Qwen’s own hosted models (through Alibaba Cloud) looks competitive. This is the first model I’ve seen that sets different prices for four different sizes of input:

This kind of pricing reflects how inference against longer inputs is more expensive to process. Gemini 2.5 Pro has two different prices for above or below 200,00 tokens.
Via @Alibaba_Qwen
Tags: ai, generative-ai, llms, ai-assisted-programming, qwen, llm-pricing, llm-release, coding-agents

AI Summary and Description: Yes

**Summary:**
The text discusses the launch of Qwen3-Coder, a high-performance code generation model featuring significant improvements in size, functionality, and application capabilities for AI-assisted programming. It highlights advancements in agentic AI models, infrastructure scaling via Alibaba Cloud, and pricing strategies related to large input lengths, marking its significance for professionals in AI, cloud computing, and software security.

**Detailed Description:**
The content details the release of Qwen3-Coder, an advanced generative AI model dedicated to coding tasks. Here are the key highlights:

– **Model Specifications:**
– Qwen3-Coder is introduced as the most powerful version to date, specifically the 480B-parameter variant.
– The model supports a native context length of 256K tokens and can extrapolate to handle up to 1M tokens, which improves its utility for complex coding and “agentic” tasks.

– **Availability and Licensing:**
– It is available under the Apache 2.0 license, making it accessible on platforms like Hugging Face, ensuring compliance and open usage options for developers.

– **Integration of Reinforcement Learning:**
– The model incorporates long-horizon reinforcement learning (Agent RL) that assists it in tackling real-world coding tasks by engaging in multi-turn interactions.
– The environment scaling challenge, essential for effective training and testing, is addressed by implementing a scalable system capable of managing 20,000 parallel environments on Alibaba Cloud.

– **Performance Metrics:**
– Qwen3-Coder demonstrates state-of-the-art performance as assessed through the SWE-Bench Verified program without requiring test-time scaling.

– **Infrastructure Utilization:**
– The effective performance of the model is bolstered by utilizing Alibaba Cloud’s infrastructure, which sustains large-scale reinforcement learning and evaluation capabilities.

– **Market Positioning:**
– Qwen’s competitive pricing model differentiates costs based on input sizes, indicating a thoughtful approach to pricing for varying demand and computational resource use.
– This trend reflects broader patterns observed in other models, such as Gemini 2.5 Pro, which uses tiered pricing based on token lengths.

– **Additional Tools:**
– Qwen also introduced a coding assistant called qwen-code, highlighting the growing ecosystem of AI tools aimed at enhancing the coding process alongside the main model.

This information is particularly relevant for professionals focusing on AI advancements, cloud infrastructure, and the integration of AI capabilities in software development, underscoring the potential implications for security practices in coding environments. The innovations in model performance and the utilization of robust cloud resources merit attention from those monitoring AI security implications and compliance standards.