Simon Willison’s Weblog: Four new releases from Qwen

Sep 22, 2025

—

Source URL: https://simonwillison.net/2025/Sep/22/qwen/
Source: Simon Willison’s Weblog
Title: Four new releases from Qwen

Feedly Summary: It’s been an extremely busy day for team Qwen. Within the last 24 hours (all links to Twitter, which seems to be their preferred platform for these announcements):

Qwen3-Next-80B-A3B-Instruct-FP8 and Qwen3-Next-80B-A3B-Thinking-FP8 – official FP8 quantized versions of their Qwen3-Next models. On Hugging Face Qwen3-Next-80B-A3B-Instruct is 163GB and Qwen3-Next-80B-A3B-Instruct-FP8 is 82.1GB. I wrote about Qwen3-Next on Friday 12th September.
Qwen3-TTS-Flash provides “multi-timbre, multi-lingual, and multi-dialect speech synthesis" according to their blog announcement. It’s not available as open weights, you have to access it via their API instead. Here’s a free live demo.
Qwen3-Omni is today’s most exciting announcement: a brand new 30B parameter "omni" model supporting text, audio and video input and text and audio output! You can try it on chat.qwen.ai by selecting the "Use voice and video chat" icon – you’ll need to be signed in using Google or GitHub. This one is open weights, as Apache 2.0 Qwen3-Omni-30B-A3B-Instruct, Qwen/Qwen3-Omni-30B-A3B-Thinking, and Qwen3-Omni-30B-A3B-Captioner on HuggingFace. That Instruct model is 70.5GB so this should be relatively accessible for running on expensive home devices.
Qwen-Image-Edit-2509 is an updated version of their excellent Qwen-Image-Edit model which I first tried last month. Their blog post calls it "the monthly iteration of Qwen-Image-Edit" so I guess they’re planning more frequent updates. The new model adds multi-image inputs. I used it via chat.qwen.ai to turn a photo of our dog into a dragon in the style of one of Natalie’s ceramic pots.

Here’s the prompt I used, feeding in two separate images. Weirdly it used the edges of the landscape photo to fill in the gaps on the otherwise portrait output. It turned the chair seat into a bowl too!

Tags: text-to-speech, ai, qwen, llms, multi-modal-output, llm-release, ai-in-china, generative-ai

AI Summary and Description: Yes

Summary: The text discusses recent updates and releases from the Qwen team in the AI domain, particularly focused on advanced models for various applications, including speech synthesis and multi-modal input/output. The significance of these developments lies in their potential impact on generative AI and LLM capabilities, presenting new opportunities for application in various fields.

Detailed Description: The content provides an overview of several AI model developments from the Qwen team, which play a crucial role in advancing generative AI technologies. Below are the major points highlighted in the text:

– **Qwen3-Next Models**:
– Two new FP8 quantized versions of the Qwen3-Next models have been released.
– Their sizes are notably large, with Qwen3-Next-80B-A3B-Instruct at 163GB and the FP8 version at 82.1GB, indicating intensive computational requirements.

– **Qwen3-TTS-Flash**:
– A model focused on speech synthesis that supports multi-timbre and multi-lingual capabilities.
– Access is restricted to an API rather than open weights, showing a trend towards cloud-based solutions for advanced AI functionalities.

– **Qwen3-Omni Model**:
– A new variant featuring a 30B parameter model that can handle text, audio, and video inputs while producing text and audio outputs.
– Significantly, it is available with open weights under an Apache 2.0 license, making it more accessible for developers and researchers.

– **Qwen-Image-Edit Update**:
– A monthly iterative update to the image editing model, now supporting multi-image inputs.
– The user experience is showcased with a practical example, illustrating the model’s capabilities to creatively transform images.

Insights:
– These advancements point to a rapid evolution in generative AI, emphasizing the importance of LLMs in both consumer and enterprise solutions.
– The availability of models with open weights encourages wider experimentation and could leverage community-driven developments in AI applications.
– The trend towards multi-modal models indicates a significant move towards more integrated AI solutions, which can handle various types of data input and output, enhancing user engagement and usability.

For security and compliance professionals, the discussion around model accessibility, as well as the computational demands of deploying such substantial models, underlines the importance of a secure, compliant infrastructure that can manage significant data and processing power without compromising user privacy or data integrity.

.NET 0 license 1 2 2025 24 3 4 5 5G 7 a access accessibility Act advanced advanced AI advanced models advancement advancements age AI AI applications ai model AI technologies air All and anti apach Apache Apache 2 Apache 2.0 Apache 2.0 license API app Application applications Arch Aria art as at ated audio availability based based solutions Bi bot by C capabilities cell chat China CI CIA Cloud cloud-based cloud-based solutions co community compliance compliance professionals computation computational demand computational requirements consumer content D data data input data integrity day de demand demo developer developers development developments device domain drive driven Driven Development e edge editing end engagement enterprise enterprise solutions ERP Excel exp experience experimentation face first flash focused for free function g Gen generative Generative AI git GitHub Go Google gs H high Highlight http HTTPS hugging Hugging Face Huggingface ICO image Image editing impact in infrastructure insights integrity intensive io IRS ite iteration J k l land large led Li license line Link llm llms lm low M making man modal Mode model Model Access model accessibility model development model support models multi multi-modal input N new next no NPU o of off on one ons open open weights OPM oS other out output Outputs over parameter per planning platform play point porting post potential Power practical example pre privacy pro process processing processing power professionals prompt ps Q quantized Qwen R rag rate RCE re red release releases Requirements research researchers Ro Role RoT s search sec secure security security and compliance Sig Sim Simon Willison size sizes solutions source Speech Speech Synthesis SSE support synthesis T Tags: team tech technologies ted text text-to-speech the thinking to TP Transform trie tts turn twitter two type UI under up update updates US usability use user user engagement user experience user privacy V version video voice web weight Well Wi x z