Simon Willison’s Weblog: deepseek-ai/DeepSeek-R1-0528

May 31, 2025

—

Source URL: https://simonwillison.net/2025/May/31/deepseek-aideepseek-r1-0528/
Source: Simon Willison’s Weblog
Title: deepseek-ai/DeepSeek-R1-0528

Feedly Summary: deepseek-ai/DeepSeek-R1-0528
Sadly the trend for terrible naming of models has infested the Chinese AI labs as well.
DeepSeek-R1-0528 is a brand new and much improved open weights reasoning model from DeepSeek, a major step up from the DeepSeek R1 they released back in January.

In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by […] Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro. […]
Beyond its improved reasoning capabilities, this version also offers a reduced hallucination rate, enhanced support for function calling, and better experience for vibe coding.

The new R1 comes in two sizes: a 685B model called deepseek-ai/DeepSeek-R1-0528 (the previous R1 was 671B) and an 8B variant distilled from Qwen 3 called deepseek-ai/DeepSeek-R1-0528-Qwen3-8B.
The January release of R1 had a much larger collection of distilled models: four based on Qwen 2.5 (14B, 32B, Math 1.5B and Math 7B) and 2 based on Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
No Llama model at all this time. I wonder if that’s because Qwen 3 is really good and Apache 2 licensed, while Llama continues to stick with their janky license terms.
Further adding to the confusion, Ollama have mixed the two new models into their existing deepseek-r1 label. Last week running ollama pull deepseek-r1:8B from Ollama would get you DeepSeek-R1-Distill-Llama-8B (Internet Archive link, today that same command gets you DeepSeek-R1-0528-Qwen3-8B, a completely different base model.
If this bothers you as much as it bothers me, thankfully you can use the more explicit tag deepseek-r1:8b-0528-qwen3-q8_0. I ran it like this:
ollama pull deepseek-r1:8b-0528-qwen3-q8_0

It fetched 8.19GB of model weights, then I had it draw me a pelican.
llm install llm-ollama
llm -m deepseek-r1:8b-0528-qwen3-q8_0 \
“Generate an SVG of a pelican riding a bicycle"

Since the “reasoning” traces of these models are almost always the most interesting part, I’ll quote the response here in full:

We are going to create an SVG that depicts a pelican riding a bicycle.
Steps:

Create the basic structure of an SVG with width and height set appropriately (e.g., 400×200).
Draw a pelican: We’ll break it down into parts – body, beak, eye, feet, wings, etc.
Draw a bicycle: We can represent it as two circles for wheels, a rectangle for the frame, and some other details.

Let’s design the SVG with the following:

The pelican will be facing to the right (positive x-axis) so that it is riding the bike from left to right.
The bicycle will have two wheels and a simple frame in front of the pelican.

We’ll use absolute positioning for each part. Let’s assume we set up coordinates such that:

The top-left of the SVG is (0,0), and we want the scene to be drawn from left to right.
The pelican will be positioned at (50, 120) with a body length of about 30 units.

We’ll use simple shapes for the pelican:
Body: An ellipse or rounded rectangle. Let’s do an ellipse for the body.
Beak: A triangle pointing to the right.
Head: Another ellipse but smaller and positioned in front of the body.
Wings: Two triangles (one above each other, one slightly behind) that are upside down.
Bicycle:

Wheels: two circles at fixed positions relative to the frame.
Frame: a rectangle with some details for handlebars and seat.

Since this is a fun illustration, we’ll keep it simple but recognizable.
Let’s code accordingly.
</think>
Here’s an SVG of a pelican riding a bicycle. This example uses basic shapes to create a stylized version:
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 400 200">
<!– Pelican body –>
<ellipse cx="150" cy="150" rx="80" ry="60" fill="#FFD700"/>

<!– Pelican head (larger for comedic effect) –>
<circle cx="90" cy="120" r="45" fill="#FFA3CE"/>

<!– Pelican beak –>
<polygon points="80,110 60,120 80,130 70,140 60,150" fill="#FFC233"/>

<!– Pelican eye –>
<circle cx="95" cy="115" r="3" fill="#fff"/>

<!– Bicycle wheels –>
<ellipse cx="380" cy="140" rx="25" ry="60" transform="rotate(90 380 140)" fill="black"/>
<ellipse cx="380" cy="70" rx="25" ry="60" transform="rotate(90 380 70)" fill="black"/>

<!– Bicycle frame –>
<path d="M 15,40 L 15,140 H 370 L 370,140 V 40 Z" stroke="gray" stroke-width="6" fill="none"/>

<!– Handlebars and seat (stylized) –>
<path d="M 280,40 C 290,35 300,30 310,40 C 320,50 330,60 310,70 L 310,70 Q 290,50 280,40" fill="none" stroke="gray"/>

<!– Pedals –>
<circle cx="340" cy="70" r="15" fill="#3366CC"/>
<polygon points="340,65 330,45 350,45" fill="#F29E3D"/>

<!– Neck connecting head to body –>
<path d="M 80,120 C 70,160 100,140 150,120" stroke="#FFA3CE" stroke-width="8"/>
</svg>
Explanation:

Pelican Body: A large yellow ellipse.

Head: A pink circle with a prominent beak (yellow-orange triangle).

Bicycle:

Wheels are black ellipses rotated vertically to simulate spokes.
The frame is a gray rectangle extending from the pelican’s back to the right side of the SVG.
Handlebars and seat are simplified curves in gray.

This creates a comical scene with the pelican riding forward. You can adjust colors, sizes, or positions by modifying cx, cy, rx, ry attributes and fill/stroke values.

Love the thought process, and the explanation! The actual rendered SVG does leave a lot to be desired though:

To be fair, this is just using the ~8GB Qwen3 Q8_0 model on my laptop. I don’t have the hardware to run the full sized R1 but it’s available as deepseek-reasoner through DeepSeek’s API, so I tried it there using the llm-deepseek plugin:
llm install llm-deepseek
llm -m deepseek-reasoner \
"Generate an SVG of a pelican riding a bicycle"

This one came out a lot better:

Tags: llm, llm-reasoning, deepseek, ollama, ai, llms, llm-release, generative-ai

AI Summary and Description: Yes

Summary: The text discusses the release of a new AI model, DeepSeek-R1-0528, from the Chinese AI labs, highlighting advancements in reasoning and inference capabilities, a reduced hallucination rate, and improved support for function calling. It also touches on issues related to model naming and versioning, offering insights into the nuances of generative AI.

Detailed Description:
The provided text outlines a recent development in AI technology with the launch of the DeepSeek-R1-0528 model. This release is particularly relevant for professionals working in AI, especially those focused on generative AI security. Here are the main points and implications from the analysis:

– **Model Enhancements:**
– The DeepSeek-R1-0528 shows notable improvements over its predecessor, including:
– Enhanced reasoning and inference capabilities, putting it on par with leading models like O3 and Gemini 2.5 Pro.
– Reduced hallucination rates, which is critical for applications relying on accurate outputs from generative AI.
– Improved function-calling support, enabling more complex interactions with the model.

– **Model Variants:**
– The model is offered in different sizes:
– A larger 685B version, compared to the previous 671B version.
– An 8B distilled variant based on the Qwen 3 model.
– Highlighting the lack of Llama models in this release suggests strategic choices by DeepSeek regarding model licensing and performance.

– **Confusion in Model Naming:**
– The text raises concerns about the naming conventions related to model versions, indicating that users might struggle to keep track of which version they are using, especially when version tags are mixed and unclear.
– Suggestions are made for more explicit tagging of models to mitigate user confusion.

– **Practical Application Example:**
– The text illustrates practical usability with a coding example showing how to generate an SVG image of a pelican riding a bicycle using the model, demonstrating its application in a meaningful and creative way.

– **User Experience with Models:**
– The author tests different model sizes, noting improvements in quality based on the model used and mentioning limitations based on available hardware, highlighting real-world considerations for AI deployment.

Implications for Security and Compliance Professionals:
– **Model Variations and Version Control:** Understanding the evolution of generative AI models and their versioning can be crucial for security and compliance as organizations adopt these technologies.
– **Reduction of Hallucination Rates:** The bounded accuracy of outputs from AI will have direct implications on systems that rely on generative AI; reduced hallucination leads to lower risk levels in decision-making processes.
– **Licensing and Governance:** The choice of licensed models over proprietary ones might play a significant role in governance and compliance strategies, especially in regulated industries. Choosing the right models based on licensing can impact both ethical considerations and legal compliance.

Overall, this text informs AI practitioners about critical updates in model capabilities, practical implications for deployment, and best practices for managing versioning and compliance in a rapidly evolving technological landscape.

.NET 1 10 2 2025 3 32B 3d 4 5 5 Pro 7 a accuracy Act actions ads advancement advancements AGI AI ai model AI models AI security AI technology air analysis and apach Apache Apache 2 API app Application applications Arch Aria art as attribute based Best best practices beyond Bi bicycle by C C2 calling capabilities CERN Chinese CI CIA CleaR co code coding Col command compliance compliance professionals compliance strategies concerns control critical D day de decision decision-making Decision-making Processes deep DeepSeek DeepSeek R1 demo deployment depth design development distilled models e end ethical ethical considerations exp experience focused for front full function function calling g Gemini Gemini 2 Gen generative Generative AI generative AI models Go governance gs H hallucination Hallucination Rates hardware high Highlight HR http HTTPS image implications implications for security in Inference inference capabilities insights inter interaction interactions intern internet Internet Archive io issue J Just k l land large leading led Legal legal compliance level Li licensing limitations Link llama Llama 3 Llama models Llama-3.1 llm llms lm logic low M made making making processes man math mini mixed ML Mode model model capabilities model licensing model naming model variants model weights models ModI my N nation no non o o3 oE of off ollama on one open open weights OPM opt organization organizations oS out output Outputs over pelican performance play plugin point practical implications pre process processes professionals proprietary Q quality Qwen R R1 rack Raise rate Ray RCE real reasoning reasoning capabilities reasoning mode reasoning model red reduction regulated industries release response riding right Risk Ro Role RoT s sam sec security security and compliance SHA side Sig Sim Simple size small source SSE SSO strategic strategies support SVG system systems T tagging Tags: Tails tech technological technological landscape technologies technology test text the Thought Time to TP Transform trie two under up update updates ups US usability use user user experience Users uth V val version version control versioning vibe vibe coding WAN Ware web weight Well Wi world x XML