Simon Willison’s Weblog: Qwen3-235B-A22B-Thinking-2507

Jul 25, 2025

—

Source URL: https://simonwillison.net/2025/Jul/25/qwen3-235b-a22b-thinking-2507/#atom-everything
Source: Simon Willison’s Weblog
Title: Qwen3-235B-A22B-Thinking-2507

Feedly Summary: Qwen3-235B-A22B-Thinking-2507
The third Qwen model release week, following Qwen3-235B-A22B-Instruct-2507 on Monday 21st and Qwen3-Coder-480B-A35B-Instruct on Tuesday 22nd.
Those two were both non-reasoning models – a change from the previous models in the Qwen 3 family which combined reasoning and non-reasoning in the same model, controlled by /think and /no_think tokens.
Today’s model, Qwen3-235B-A22B-Thinking-2507 (also released as an FP8 variant), is their new thinking variant.
Qwen claim “state-of-the-art results among open-source thinking models" and have increased the context length to 262,144 tokens – a big jump from April’s Qwen3-235B-A22B which was "32,768 natively and 131,072 tokens with YaRN".
Their own published benchmarks show comparable scores to DeepSeek-R1-0528, OpenAI’s o3 and o4-mini, Gemini 2.5 Pro and Claude Opus 4 in thinking mode.
The new model is already available via OpenRouter.
But how good is its pelican?
I tried it with "Generate an SVG of a pelican riding a bicycle" via OpenRouter, and it thought for 166 seconds – nearly three minutes! I have never seen a model think for that long. No wonder the documentation includes the following:

However, since the model may require longer token sequences for reasoning, we strongly recommend using a context length greater than 131,072 when possible.

Here’s a copy of that thinking trace. It was really fun to scan through:

The finished pelican? Not so great! I like the beak though:

Via @Alibaba_Qwen
Tags: ai, generative-ai, llms, qwen, pelican-riding-a-bicycle, llm-reasoning, llm-release

AI Summary and Description: Yes

Summary: The text discusses the release of a new AI model, Qwen3-235B-A22B-Thinking-2507, highlighting its advancements in reasoning capabilities and increased context length. This development is pertinent for professionals in AI, particularly in areas relating to the evolution of large language models (LLMs) and their security considerations.

Detailed Description:

The text highlights the launch of the Qwen3-235B-A22B-Thinking-2507 model, an upgrade in the Qwen AI series, focused on enhancing reasoning abilities. Key aspects include:

– **Model Variants and Releases**: This model follows previous non-reasoning models in the series and reflects a strategic shift in design.
– **Enhanced Context Length**: A significant increase in context length to 262,144 tokens from earlier limits, which allows for more extensive data input and potentially more accurate reasoning capabilities.
– **Comparison with Competitors**: Qwen claims their model achieves competitive performance against other notable models like DeepSeek-R1-0528, OpenAI’s releases, and others, thereby reinforcing their position in the open-source community.
– **Performance Observations**: An anecdote about the model’s processing time when generating an SVG of a pelican highlights the model’s new thinking capabilities and raises questions about efficiency and practical application in real-time scenarios.
– **Documentation and Recommendations**: The model documentation suggests using a context length greater than 131,072 tokens for optimal reasoning performance.

This information carries significant implications for professionals in AI, especially concerning:

– **Future Developments in AI Models**: The increasing complexity and capabilities of LLMs can impact security measures that need to be integrated as they evolve.
– **Comparative Analysis for Security**: As new models are benchmarked against existing AIs, understanding their reasoning capabilities and the potential security vulnerabilities that arise with complex token processing becomes critical.
– **Application in Real-World Scenarios**: The performance metrics and user experiences shared here can guide developers in optimizing AI implementations, particularly in sensitive applications that require secure and efficient reasoning.

In summary, the conversation around the Qwen3-235B-A22B-Thinking-2507 model’s reasoning and its enhancements to context length holds considerable relevance to ongoing discussions in AI security, performance, and governance within the burgeoning field of generative AI.

.NET 1 2 2025 3 4 5 5 Pro 7 7 model a Act advancement advancements AI AI implementation ai model AI models AI security Alibaba analysis and app Application applications Aria art as at ated benchmark benchmarks Bi bicycle by C capabilities CERN CI CIA Claude Claude Opus 4 co code community competitive competitive performance competitors complexity Context context length control conversation core critical D data data input day de deep DeepSeek design developer developers development developments document documentation DoT e efficiency efficient end exp experience focused following for future future developments g Gemini Gemini 2 Gen generative Generative AI geo Go governance grade gs H high Highlight HR http HTTPS implementation implications in information io iOS J k Key l language language model language models large large language model large language models Large Language Models (LLMs) led Li llm llms lm long low M man measures metrics mini Mode model model release model variants models N native new NGO no non NPU o o3 of on open open-source openai openrouter OPM opt Opus oS other out over Paris pelican per performance performance metrics potential pre pro process processing professionals ps Py Q question Qwen R R1 Raise rate RCE re ready real real-time Real-World Scenarios reasoning reasoning abilities reasoning capabilities reasoning mode reasoning model reasoning models recommendations red release releases riding Ro RSA s sam SD sec secure security security considerations security measure security measures Security Vulnerabilities sensitive applications sequence series series A SHA shift side Sig Sim source SSE state strategic strategic shift SVG T Tags: ted text the thinking third Thought Time to token token processing tokens Tor TP trie two UI under up upgrade US use user user experience user experiences V vulnerabilities web Wi world world scenarios x yt z