Source URL: https://simonwillison.net/2025/Mar/24/qwen25-vl-32b/#atom-everything
Source: Simon Willison’s Weblog
Title: Qwen2.5-VL-32B: Smarter and Lighter
Feedly Summary: Qwen2.5-VL-32B: Smarter and Lighter
The second big open weight LLM release from China today – the first being DeepSeek v3-0324.
Qwen’s previous vision model was Qwen2.5 VL, released in January in 3B, 7B and 72B sizes.
Today’s release is a 32B model, which is quickly becoming my personal favourite model size – large enough to have GPT-4-class capabilities, but small enough that on my 64GB Mack there’s still enough RAM for me to run other memory-hungry applications like Firefox and VS Code.
Qwen claim that the new model (when compared to their previous 2.5 VL family) can “align more closely with human preferences", is better at "mathematical reasoning" and provides "enhanced accuracy and detailed analysis in tasks such as image parsing, content recognition, and visual logic deduction".
They also offer some presumably carefully selected benchmark results showing it out-performing Gemma 3-27B, Mistral Small 3.1 24B and GPT-4o-0513 (there have been two more recent GPT-4o releases since that one, 2024-08-16 and 2024-11-20).
As usual, Prince Canuma had MLX versions of the models live within hours of the release, in 4 bit, 6 bit, 8 bit, and bf16 variants.
I ran the 4bit version (a 18GB model download) using uv and Prince’s mlx-vlm like this:
uv run –with ‘numpy<2' --with mlx-vlm \
python -m mlx_vlm.generate \
--model mlx-community/Qwen2.5-VL-32B-Instruct-4bit \
--max-tokens 1000 \
--temperature 0.0 \
--prompt "Describe this image." \
--image Mpaboundrycdfw-1.png
Here's the image:
And here's the resulting description, which I found to be very impressive:
This image appears to be a map highlighting specific areas along the coastline, likely in California, based on the geographical features and place names. Here is a detailed description of the image:
Geographic Features:
The map shows a section of the coastline, with land on the right and the sea on the left.
The land area includes parts of SanMateo County, as noted in the top-right corner.
The coastline features several named locations, including:
Point Montara
Pillar Point
Kings Rock
VicenteCreek
DennistonCreek
State Beach
Highlighted Areas:
The map contains two distinct areas marked with different colors:
Red Area (Top Section)
Labeled as "MONTARA" and "MONTARA M R" (possibly referring to a "Marine Reserves" or similar protected area).
This area is marked with a dark red color and extends from the coastline inland.
Blue Area (Bottom Section)
Labeled as "PILLAR" and "PILLAR P O I N T" along with "SMCA" (Special Management Area).
This area is marked with a blue color and is located closer to the coastline, including the area around "Pillar Point."
Depth Contours:
The map includes depth markings in the sea, indicating the bathymetrical features:
20 fathom (fm) line is marked in the upper-left part of the sea.
10 fathom (fm) line is marked closer to the coastline.
30 fathom (fm) line is marked further out to sea.
Other Features:
State Beach: Marked near the top-right, indicating a protected recreational area.
Kings Rock: A prominent feature near the coastline, likely a rocky outcropping.
Creeks: The map shows several creeks, including VicenteCreek and DennistonCreek, which flow into the sea.
Protected Areas:
The map highlights specific protected areas:
Marine Reserves:
"MONTARA M R" (Marine Reserves) in red.
Special Management Area (SMCA)
"PILLAR P O I N T" in blue, indicating a Special Management Area.
Grid and Coordinates:
The map includes a grid with latitude and longitude markings:
Latitude ranges from approximately 37°25'N to 37°35'N.
Longitude ranges from approximately 122°22.5'W to 122°35.5'W.
Topography:
The land area shows topographic features, including elevations and vegetation, with green areas indicating higher elevations or vegetated land.
Other Labels:
"SMR": Likely stands for "State Managed Reserves."
"SMCA": Likely stands for "Special Management Control Area."
In summary, this map highlights specific protected areas along the coastline, including a red "Marine Reserves" area and a blue "Special Management Area" near "Pillar Point." The map also includes depth markings, geographical features, and place names, providing a detailed view of the region's natural and protected areas.
It included the following runtime statistics:
Prompt: 1051 tokens, 111.985 tokens-per-sec
Generation: 760 tokens, 17.328 tokens-per-sec
Peak memory: 21.110 GB
Tags: vision-llms, llm-release, generative-ai, ai, qwen, llms, uv, mlx
AI Summary and Description: Yes
Summary: The text discusses the release of the Qwen2.5-VL-32B, a new large language model (LLM) that offers improved capabilities in aligning with human preferences, mathematical reasoning, and content recognition. It emphasizes advancements in AI technology essential for professionals involved in AI security and LLM applications.
Detailed Description:
The text provides an overview of the Qwen2.5-VL-32B, highlighting its significance as a new release in the realm of large language models. This model is part of the evolving landscape of generative AI, which is crucial for AI security professionals as they navigate challenges related to model safety, data protection, and compliance.
Key Points:
- **Model Overview:**
- The Qwen2.5-VL-32B is a 32 billion parameter model, positioned as a competitive alternative to other leading models like GPT-4.
- This release follows the Qwen2.5 VL model series, which includes smaller variants launched earlier in the year.
- **Performance Improvements:**
- Claims to better align with human preferences, showcasing advancements in user interaction.
- Enhanced capabilities in mathematical reasoning, making it a more effective tool for tasks requiring analytical thinking.
- **Benchmark Results:**
- Qwen2.5-VL-32B reportedly outperformed notable models (Gemma 3-27B, Mistral Small 3.1 24B, and GPT-4o-0513) in certain benchmarks, suggesting a competitive advantage in the LLM market.
- **Practical Applications:**
- The text provides a step-by-step example of how to run the model using specific commands and its performance while analyzing an image, demonstrating the practical implications for users.
- Highlights the ability to generate detailed descriptions based on visual inputs, which is highly relevant for industries reliant on visual data analysis.
- **Technical Specifications:**
- Details on the model's capabilities in parsing images and providing descriptions, visualization of depth contours, and geographical features.
- Runtime statistics are included, showcasing the model's efficiency in terms of token processing speed and memory usage.
Overall, the Qwen2.5-VL-32B represents an important development in the generative AI sector, offering enhanced functionality that security and compliance professionals in AI must consider when evaluating the deployment and use of such models in sensitive contexts.