Simon Willison’s Weblog: qwen2.5vl in Ollama – Experimental News Clipping Site

Source URL: https://simonwillison.net/2025/May/18/qwen25vl-in-ollama/#atom-everything
Source: Simon Willison’s Weblog
Title: qwen2.5vl in Ollama

Feedly Summary: qwen2.5vl in Ollama
Ollama announced a complete overhaul of their vision support the other day. Here’s the first new model they’ve shipped since then – a packaged version of Qwen 2.5 VL which was first released on January 26th 2025. Here are my notes from that release.
I upgraded Ollama (it auto-updates so I just had to restart it from the tray icon) and ran this:
ollama pull qwen2.5vl

This downloaded a 6GB model file. I tried it out against my photo of Cleo rolling on the beach:
llm -a https://static.simonwillison.net/static/2025/cleo-sand.jpg \
‘describe this image’ -m qwen2.5vl

And got a pretty good result:

The image shows a dog lying on its back on a sandy beach. The dog appears to be a medium to large breed with a dark coat, possibly black or dark brown. It is wearing a red collar or harness around its chest. The dog’s legs are spread out, and its belly is exposed, suggesting it might be rolling around or playing in the sand. The sand is light-colored and appears to be dry, with some small footprints and marks visible around the dog. The lighting in the image suggests it is taken during the daytime, with the sun casting a shadow of the dog to the left side of the image. The overall scene gives a relaxed and playful impression, typical of a dog enjoying time outdoors on a beach.

Qwen 2.5 VL has a strong reputation for OCR, so I tried it on my poster:
llm -a https://static.simonwillison.net/static/2025/poster.jpg \
‘convert to markdown’ -m qwen2.5vl

The result that came back:

It looks like the image you provided is a jumbled and distorted text, making it difficult to interpret. If you have a specific question or need help with a particular topic, please feel free to ask, and I’ll do my best to assist you!

I’m not sure what went wrong here. My best guess is that the maximum resolution the model can handle is too small to make out the text, or maybe Ollama resized the image to the point of illegibility before handing it to the model?
Tags: vision-llms, llm, ollama, generative-ai, ai, qwen, llms

AI Summary and Description: Yes

Summary: The text discusses the launch of the Qwen 2.5 VL model by Ollama, showcasing its capabilities in image description and optical character recognition (OCR). This is particularly relevant for professionals working in AI and generative AI security, as it touches upon advancements in large language models (LLMs) and their real-world applications.

Detailed Description: The content centers around the introduction of Ollama’s Qwen 2.5 VL model, focusing on its performance in analyzing images and processing textual information. Below are the key insights:

– **Model Launch**: Qwen 2.5 VL was released by Ollama, marking a significant update in their offerings, especially in the realm of vision-related tasks.
– **Package and Usage**: The model can be easily downloaded and utilized via a command-line interface, enhancing accessibility for developers.
– **Image Analysis Results**:
– The model successfully describes an image of a dog on the beach, indicating strong performance in visual interpretation.
– The details provided in the description reflect the model’s capability in understanding context, breed attributes, and environmental elements.
– **OCR Capabilities**:
– The attempt to convert a poster image to markdown highlights the model’s focus on OCR, although it faced challenges with distorted text recognition.
– Speculations about potential issues (resolution limits or preprocessing adjustments by Ollama) illustrate the complexities involved in image-to-text transformation.

In summary, this development in generative AI, particularly with the Qwen 2.5 VL model, emphasizes the evolving landscape of large language models, highlighting their potential applications in automation and content recognition. For security and compliance professionals, understanding the capabilities and limitations of such AI models is crucial, especially in designing secure frameworks that ensure safe and effective deployment of AI technologies.

* Important points for professionals to consider:
– The growing relevance of vision-based LLMs in different sectors.
– The necessity for rigorous testing of AI outputs for reliability, especially in OCR contexts.
– Considerations for image resolution and preprocessing that can affect model performance.