Simon Willison’s Weblog: Qwen3-235B-A22B-Thinking-2507

Source URL: https://simonwillison.net/2025/Jul/25/qwen3-235b-a22b-thinking-2507/#atom-everything
Source: Simon Willison’s Weblog
Title: Qwen3-235B-A22B-Thinking-2507

Feedly Summary: Qwen3-235B-A22B-Thinking-2507
The third Qwen model release week, following Qwen3-235B-A22B-Instruct-2507 on Monday 21st and Qwen3-Coder-480B-A35B-Instruct on Tuesday 22nd.
Those two were both non-reasoning models – a change from the previous models in the Qwen 3 family which combined reasoning and non-reasoning in the same model, controlled by /think and /no_think tokens.
Today’s model, Qwen3-235B-A22B-Thinking-2507 (also released as an FP8 variant), is their new thinking variant.
Qwen claim “state-of-the-art results among open-source thinking models" and have increased the context length to 262,144 tokens – a big jump from April’s Qwen3-235B-A22B which was "32,768 natively and 131,072 tokens with YaRN".
Their own published benchmarks show comparable scores to DeepSeek-R1-0528, OpenAI’s o3 and o4-mini, Gemini 2.5 Pro and Claude Opus 4 in thinking mode.
The new model is already available via OpenRouter.
But how good is its pelican?
I tried it with "Generate an SVG of a pelican riding a bicycle" via OpenRouter, and it thought for 166 seconds – nearly three minutes! I have never seen a model think for that long. No wonder the documentation includes the following:

However, since the model may require longer token sequences for reasoning, we strongly recommend using a context length greater than 131,072 when possible.

Here’s a copy of that thinking trace. It was really fun to scan through:

The finished pelican? Not so great! I like the beak though:

Via @Alibaba_Qwen
Tags: ai, generative-ai, llms, qwen, pelican-riding-a-bicycle, llm-reasoning, llm-release

AI Summary and Description: Yes

Summary: The text discusses the release of a new AI model, Qwen3-235B-A22B-Thinking-2507, highlighting its advancements in reasoning capabilities and increased context length. This development is pertinent for professionals in AI, particularly in areas relating to the evolution of large language models (LLMs) and their security considerations.

Detailed Description:

The text highlights the launch of the Qwen3-235B-A22B-Thinking-2507 model, an upgrade in the Qwen AI series, focused on enhancing reasoning abilities. Key aspects include:

– **Model Variants and Releases**: This model follows previous non-reasoning models in the series and reflects a strategic shift in design.
– **Enhanced Context Length**: A significant increase in context length to 262,144 tokens from earlier limits, which allows for more extensive data input and potentially more accurate reasoning capabilities.
– **Comparison with Competitors**: Qwen claims their model achieves competitive performance against other notable models like DeepSeek-R1-0528, OpenAI’s releases, and others, thereby reinforcing their position in the open-source community.
– **Performance Observations**: An anecdote about the model’s processing time when generating an SVG of a pelican highlights the model’s new thinking capabilities and raises questions about efficiency and practical application in real-time scenarios.
– **Documentation and Recommendations**: The model documentation suggests using a context length greater than 131,072 tokens for optimal reasoning performance.

This information carries significant implications for professionals in AI, especially concerning:

– **Future Developments in AI Models**: The increasing complexity and capabilities of LLMs can impact security measures that need to be integrated as they evolve.
– **Comparative Analysis for Security**: As new models are benchmarked against existing AIs, understanding their reasoning capabilities and the potential security vulnerabilities that arise with complex token processing becomes critical.
– **Application in Real-World Scenarios**: The performance metrics and user experiences shared here can guide developers in optimizing AI implementations, particularly in sensitive applications that require secure and efficient reasoning.

In summary, the conversation around the Qwen3-235B-A22B-Thinking-2507 model’s reasoning and its enhancements to context length holds considerable relevance to ongoing discussions in AI security, performance, and governance within the burgeoning field of generative AI.