Source URL: https://simonwillison.net/2025/Sep/22/qwen/
Source: Simon Willison’s Weblog
Title: Four new releases from Qwen
Feedly Summary: It’s been an extremely busy day for team Qwen. Within the last 24 hours (all links to Twitter, which seems to be their preferred platform for these announcements):
Qwen3-Next-80B-A3B-Instruct-FP8 and Qwen3-Next-80B-A3B-Thinking-FP8 – official FP8 quantized versions of their Qwen3-Next models. On Hugging Face Qwen3-Next-80B-A3B-Instruct is 163GB and Qwen3-Next-80B-A3B-Instruct-FP8 is 82.1GB. I wrote about Qwen3-Next on Friday 12th September.
Qwen3-TTS-Flash provides “multi-timbre, multi-lingual, and multi-dialect speech synthesis" according to their blog announcement. It’s not available as open weights, you have to access it via their API instead. Here’s a free live demo.
Qwen3-Omni is today’s most exciting announcement: a brand new 30B parameter "omni" model supporting text, audio and video input and text and audio output! You can try it on chat.qwen.ai by selecting the "Use voice and video chat" icon – you’ll need to be signed in using Google or GitHub. This one is open weights, as Apache 2.0 Qwen3-Omni-30B-A3B-Instruct, Qwen/Qwen3-Omni-30B-A3B-Thinking, and Qwen3-Omni-30B-A3B-Captioner on HuggingFace. That Instruct model is 70.5GB so this should be relatively accessible for running on expensive home devices.
Qwen-Image-Edit-2509 is an updated version of their excellent Qwen-Image-Edit model which I first tried last month. Their blog post calls it "the monthly iteration of Qwen-Image-Edit" so I guess they’re planning more frequent updates. The new model adds multi-image inputs. I used it via chat.qwen.ai to turn a photo of our dog into a dragon in the style of one of Natalie’s ceramic pots.
Here’s the prompt I used, feeding in two separate images. Weirdly it used the edges of the landscape photo to fill in the gaps on the otherwise portrait output. It turned the chair seat into a bowl too!
Tags: text-to-speech, ai, qwen, llms, multi-modal-output, llm-release, ai-in-china, generative-ai
AI Summary and Description: Yes
Summary: The text discusses recent updates and releases from the Qwen team in the AI domain, particularly focused on advanced models for various applications, including speech synthesis and multi-modal input/output. The significance of these developments lies in their potential impact on generative AI and LLM capabilities, presenting new opportunities for application in various fields.
Detailed Description: The content provides an overview of several AI model developments from the Qwen team, which play a crucial role in advancing generative AI technologies. Below are the major points highlighted in the text:
– **Qwen3-Next Models**:
– Two new FP8 quantized versions of the Qwen3-Next models have been released.
– Their sizes are notably large, with Qwen3-Next-80B-A3B-Instruct at 163GB and the FP8 version at 82.1GB, indicating intensive computational requirements.
– **Qwen3-TTS-Flash**:
– A model focused on speech synthesis that supports multi-timbre and multi-lingual capabilities.
– Access is restricted to an API rather than open weights, showing a trend towards cloud-based solutions for advanced AI functionalities.
– **Qwen3-Omni Model**:
– A new variant featuring a 30B parameter model that can handle text, audio, and video inputs while producing text and audio outputs.
– Significantly, it is available with open weights under an Apache 2.0 license, making it more accessible for developers and researchers.
– **Qwen-Image-Edit Update**:
– A monthly iterative update to the image editing model, now supporting multi-image inputs.
– The user experience is showcased with a practical example, illustrating the model’s capabilities to creatively transform images.
Insights:
– These advancements point to a rapid evolution in generative AI, emphasizing the importance of LLMs in both consumer and enterprise solutions.
– The availability of models with open weights encourages wider experimentation and could leverage community-driven developments in AI applications.
– The trend towards multi-modal models indicates a significant move towards more integrated AI solutions, which can handle various types of data input and output, enhancing user engagement and usability.
For security and compliance professionals, the discussion around model accessibility, as well as the computational demands of deploying such substantial models, underlines the importance of a secure, compliant infrastructure that can manage significant data and processing power without compromising user privacy or data integrity.