Simon Willison’s Weblog: DeepSeek 3.1

Source URL: https://simonwillison.net/2025/Aug/22/deepseek-31/#atom-everything
Source: Simon Willison’s Weblog
Title: DeepSeek 3.1

Feedly Summary: DeepSeek 3.1
The latest model from DeepSeek, a 685B monster (like DeepSeek v3 before it) but this time it’s a hybrid reasoning model.
DeepSeek claim:

DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.

Drew Breunig points out that their benchmarks show “the same scores with 25-50% fewer tokens" – at least across AIME 2025 and GPQA Diamond and LiveCodeBench.
The DeepSeek release includes prompt examples for a coding agent, a python agent and a search agent – yet more evidence that the leading AI labs have settled on those as the three most important agentic patterns for their models to support.
Here’s the pelican riding a bicycle it drew me (transcript), which I ran from my phone using OpenRouter chat.

Tags: ai, prompt-engineering, generative-ai, llms, drew-breunig, pelican-riding-a-bicycle, llm-reasoning, deepseek, llm-release, openrouter, coding-agents, ai-in-china

AI Summary and Description: Yes

Summary: The text discusses the release of DeepSeek 3.1, a new hybrid reasoning model with improved capabilities in generating responses efficiently. Its performance benchmarks show a significant reduction in token usage while maintaining answer quality, indicating advancements in generative AI and reasoning processes relevant to AI security and development.

Detailed Description: The content highlights the introduction of DeepSeek 3.1, a new iteration from DeepSeek that emphasizes efficiency and productivity in AI responses.

– **Model Specifications**:
– DeepSeek 3.1 is a hybrid reasoning model with 685 billion parameters, maintaining its predecessor’s performance.

– **Performance Metrics**:
– The model achieves comparable answer quality to another model (DeepSeek-R1-0528) but outputs responses significantly quicker.
– Benchmarks indicate that DeepSeek 3.1 uses 25-50% fewer tokens for similar performance levels across testing frameworks like AIME 2025 and GPQA Diamond and LiveCodeBench. This reduction in tokens can lead to lower computational costs and faster processing times.

– **Prominent Features**:
– The release showcases various prompt examples, underscoring its capabilities as a coding agent, Python agent, and search agent.
– The identification of these three ‘agentic patterns’ suggests they have emerged as crucial for AI models, aligning with current trends in AI development and functionality.

– **Cultural Reference**:
– The text includes a light commentary on the model’s creative output (e.g., a pelican riding a bicycle), demonstrating the ability of such models to generate whimsical and entertaining content.

The advancements discussed in DeepSeek 3.1 are particularly relevant for professionals in AI development, as they showcase the ongoing evolution and optimization of reasoning capabilities in generative AI, which are critical for enhancing security and efficiency in AI operations.