Simon Willison’s Weblog: DeepSeek 3.1

Aug 22, 2025

—

Source URL: https://simonwillison.net/2025/Aug/22/deepseek-31/#atom-everything
Source: Simon Willison’s Weblog
Title: DeepSeek 3.1

Feedly Summary: DeepSeek 3.1
The latest model from DeepSeek, a 685B monster (like DeepSeek v3 before it) but this time it’s a hybrid reasoning model.
DeepSeek claim:

DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.

Drew Breunig points out that their benchmarks show “the same scores with 25-50% fewer tokens" – at least across AIME 2025 and GPQA Diamond and LiveCodeBench.
The DeepSeek release includes prompt examples for a coding agent, a python agent and a search agent – yet more evidence that the leading AI labs have settled on those as the three most important agentic patterns for their models to support.
Here’s the pelican riding a bicycle it drew me (transcript), which I ran from my phone using OpenRouter chat.

Tags: ai, prompt-engineering, generative-ai, llms, drew-breunig, pelican-riding-a-bicycle, llm-reasoning, deepseek, llm-release, openrouter, coding-agents, ai-in-china

AI Summary and Description: Yes

Summary: The text discusses the release of DeepSeek 3.1, a new hybrid reasoning model with improved capabilities in generating responses efficiently. Its performance benchmarks show a significant reduction in token usage while maintaining answer quality, indicating advancements in generative AI and reasoning processes relevant to AI security and development.

Detailed Description: The content highlights the introduction of DeepSeek 3.1, a new iteration from DeepSeek that emphasizes efficiency and productivity in AI responses.

– **Model Specifications**:
– DeepSeek 3.1 is a hybrid reasoning model with 685 billion parameters, maintaining its predecessor’s performance.

– **Performance Metrics**:
– The model achieves comparable answer quality to another model (DeepSeek-R1-0528) but outputs responses significantly quicker.
– Benchmarks indicate that DeepSeek 3.1 uses 25-50% fewer tokens for similar performance levels across testing frameworks like AIME 2025 and GPQA Diamond and LiveCodeBench. This reduction in tokens can lead to lower computational costs and faster processing times.

– **Prominent Features**:
– The release showcases various prompt examples, underscoring its capabilities as a coding agent, Python agent, and search agent.
– The identification of these three ‘agentic patterns’ suggests they have emerged as crucial for AI models, aligning with current trends in AI development and functionality.

– **Cultural Reference**:
– The text includes a light commentary on the model’s creative output (e.g., a pelican riding a bicycle), demonstrating the ability of such models to generate whimsical and entertaining content.

The advancements discussed in DeepSeek 3.1 are particularly relevant for professionals in AI development, as they showcase the ongoing evolution and optimization of reasoning capabilities in generative AI, which are critical for enhancing security and efficiency in AI operations.

.NET 1 2 2025 3 5 a advancement advancements age agent agentic agentic patterns agents AI AI development ai model AI models AI security and Arch art as at benchmark benchmarks Bi bicycle C capabilities chat China CI CIA co code coding coding agent computation computational costs content core cost Costs critical cross Current D de deep DeepSeek Deepseek v3 demo development e efficiency efficient end Engineer engineering EU fast faster feature features for framework frameworks function functionality g Gen generative Generative AI Go gs H high Highlight HR http HTTPS hybrid hybrid reasoning Hybrid Reasoning Model IAM in io ite iteration k l leading least led level Li llm llms lm low M man metrics Mila Mode model model specifications models my N new NGO no o of on one ons open openrouter operation operations OPM opt optimization oS oss other out output Outputs parameter patterns pelican per performance performance benchmark performance benchmarks performance metrics point pre pro process processes processing product productivity professionals prompt prompt-engineering ps Py Python Q quality QUIC R R1 rate RCE re reasoning reasoning capabilities reasoning mode reasoning model reasoning process reasoning processes red reduction release response responses riding Ro s sam search sec security Sig Sim Simon Willison size sizes source specific SSE SSO support T Tags: test Testing testing framework testing frameworks text the Time times to token token usage tokens TP trends UI under up US usage use V V3 web Wi x yt z