Simon Willison’s Weblog: Qwen3-VL: Sharper Vision, Deeper Thought, Broader Action

Sep 24, 2025

—

Source URL: https://simonwillison.net/2025/Sep/23/qwen3-vl/
Source: Simon Willison’s Weblog
Title: Qwen3-VL: Sharper Vision, Deeper Thought, Broader Action

Feedly Summary: Qwen3-VL: Sharper Vision, Deeper Thought, Broader Action
I’ve been looking forward to this. Qwen 2.5 VL is one of the best available open weight vision LLMs, so I had high hopes for Qwen 3’s vision models.

Firstly, we are open-sourcing the flagship model of this series: Qwen3-VL-235B-A22B, available in both Instruct and Thinking versions. The Instruct version matches or even exceeds Gemini 2.5 Pro in major visual perception benchmarks. The Thinking version achieves state-of-the-art results across many multimodal reasoning benchmarks.

Bold claims against Gemini 2.5 Pro, which are supported by a flurry of self-reported benchmarks.
This initial model is enormous. On Hugging Face both Qwen3-VL-235B-A22B-Instruct and Qwen3-VL-235B-A22B-Thinking are 235B parameters and weigh 471 GB. Not something I’m going to be able to run on my 64GB Mac!
The Qwen 2.5 VL family included models at 72B, 32B, 7B and 3B sizes. Given the rate Qwen are shipping models at the moment I wouldn’t be surprised to see smaller Qwen 3 VL models show up in just the next few days.
Also from Qwen today, three new API-only closed-weight models: upgraded Qwen 3 Coder, Qwen3-LiveTranslate-Flash (real-time multimodal interpretation), and Qwen3-Max, their new trillion parameter flagship model, which they describe as their “largest and most capable model to date".
Via Hacker News
Tags: ai, generative-ai, llms, vision-llms, qwen, llm-reasoning, llm-release, ai-in-china

AI Summary and Description: Yes

Summary: The text discusses the release of Qwen3-VL, an advanced open-source vision LLM from Qwen, highlighting its superior performance against competitors and massive model size. This information is particularly significant for professionals in AI and infrastructure security, as it reflects advancements in generative AI and its implications for model deployment and security considerations.

Detailed Description:
The content focuses on the introduction of Qwen3-VL, a powerful vision Large Language Model (LLM) designed for enhanced visual perception and multimodal reasoning. Key highlights include:

– **Model Variants**: Qwen 3 comes in both Instruct and Thinking versions, with the Instruct version reportedly outperforming Gemini 2.5 Pro in visual perception benchmarks.
– **Benchmark Performance**: Each model’s capabilities are backed by self-reported benchmarks, underscoring the competitive edge of Qwen’s implementations in the ever-evolving AI landscape.
– **Model Size**: The flagship Qwen3-VL-235B-A22B model consists of 235 billion parameters and takes up 471 GB of storage. This scale presents a considerable challenge for deployment on standard hardware, such as a 64GB Mac.
– **Future Developments**: The text hints at the potential release of smaller variants of the Qwen 3 VL lineup, reflecting the rapid pace of innovation within the AI field.
– **Additional Releases**: Qwen has introduced new API-only models, including Qwen 3 Coder and Qwen3-LiveTranslate-Flash, which facilitate advanced real-time multimodal tasks.

Considering these advancements, several implications arise for security and compliance professionals:

– **Security in AI Models**: The deployment of large models requires rigorous compliance with security protocols, especially as they may be used in sensitive applications that involve personal or proprietary data.
– **Infrastructure Readiness**: Organizations must assess their infrastructure capabilities to handle such expansive models, as inadequate resources could lead to security vulnerabilities during deployment.
– **Governance and Compliance**: The advancements in models like Qwen3-VL increase the need for clear governance frameworks to ensure ethical use and compliance with regulatory standards in AI applications.
– **Industry Competitiveness**: The announcement signifies ongoing competition in the AI field, requiring professionals to stay abreast of technological developments to maintain organizational competitiveness.

The evolution of generative AI models like Qwen3-VL emphasizes the critical need for integrated security strategies as organizations leverage these technologies for both operational efficiency and regulatory compliance.

.NET 1 2 2025 3 32B 4 5 5 Pro 7 a Act advanced advancement advancements age AI AI applications AI landscape ai model AI models All and API app Application applications Aria art as at ated backed benchmark benchmark performance benchmarks Best Bi bot by C capabilities challenge China CI CIA CleaR closed co code Col Competition competitive competitive edge competitiveness competitors compliance compliance professionals content critical cross D data day days de deep deployment design development developments e edge efficiency ELF ERP ethical ethical use EU exp face first flash for framework frameworks future future developments g Gemini Gemini 2 Gen generative Generative AI generative AI models Go governance Governance and Compliance governance framework governance frameworks grade gs H hack hacker Hacker News hardware high Highlight HR http HTTPS hugging Hugging Face implementation implications in industry information infrastructure infrastructure capabilities infrastructure readiness infrastructure security innovation integrated security inter interpret io IRS J Just k Key l land language language model large large language model Large Language Model (LLM) large models led Li line llm llms lm logic M mac man mass max mini modal Mode model model deployment model variants models multi Multimodal multimodal reasoning multimodal tasks my N new news next NGO no o of on one only ons open open-source operation operational operational efficiency OPM organization organizations ory oS oss out over parameter per perception performance potential Power pre pro professionals proprietary proprietary data protocol protocols ps Q Qwen Qwen 3 R rag rate RCE re readiness real real-time reasoning regulatory regulatory compliance release releases report resource resources Ro RoT s Scale sec security security and compliance security considerations security protocols security strategies Security Vulnerabilities self sensitive applications series SHA side Sig Sim Simon Willison size sizes small source sourcing SSE standards state storage strategies support T Tags: Task tasks tech technological technologies ted text the thinking Thought Time to Tor TP trillion UI under up upgrade US use V version Vision Vision Models vision-llms visual perception vulnerabilities Ware web weight weight models Wi x z