Source URL: https://simonwillison.net/2025/Jan/29/on-deepseek-and-export-controls/
Source: Simon Willison’s Weblog
Title: On DeepSeek and Export Controls
Feedly Summary: On DeepSeek and Export Controls
Anthropic CEO (and previously GPT-2/GPT-3 development lead at OpenAI) Dario Amodei’s essay about DeepSeek includes a lot of interesting background on the last few years of AI development.
Dario was one of the authors on the original scaling laws paper back in 2020, and he talks at length about updated ideas around scaling up training:
The field is constantly coming up with ideas, large and small, that make things more effective or efficient: it could be an improvement to the architecture of the model (a tweak to the basic Transformer architecture that all of today’s models use) or simply a way of running the model more efficiently on the underlying hardware. New generations of hardware also have the same effect. What this typically does is shift the curve: if the innovation is a 2x “compute multiplier" (CM), then it allows you to get 40% on a coding task for $5M instead of $10M; or 60% for $50M instead of $100M, etc.
He argues that DeepSeek v3, while impressive, represented an expected evolution of models based on current scaling laws.
[…] even if you take DeepSeek’s training cost at face value, they are on-trend at best and probably not even that. For example this is less steep than the original GPT-4 to Claude 3.5 Sonnet inference price differential (10x), and 3.5 Sonnet is a better model than GPT-4. All of this is to say that DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese.
Dario includes details about Claude 3.5 Sonnet that I’ve not seen shared anywhere before:
Claude 3.5 Sonnet cost "a few $10M’s to train"
3.5 Sonnet "was not trained in any way that involved a larger or more expensive model (contrary to some rumors)" – I’ve seen those rumors, they involved Sonnet being a distilled version of a larger, unreleased 3.5 Opus.
Sonnet’s training was conducted "9-12 months ago" – that would be roughly between January and April 2024. If you ask Sonnet about its training cut-off it tells you "April 2024" – that’s surprising, because presumably the cut-off should be at the start of that training period?
The general message here is that the advances in DeepSeek v3 fit the general trend of how we would expect modern models to improve, including that notable drop in training price.
Dario is less impressed by DeepSeek R1, calling it "much less interesting from an innovation or engineering perspective than V3". I enjoyed this footnote:
I suspect one of the principal reasons R1 gathered so much attention is that it was the first model to show the user the chain-of-thought reasoning that the model exhibits (OpenAI’s o1 only shows the final answer). DeepSeek showed that users find this interesting. To be clear this is a user interface choice and is not related to the model itself.
The rest of the piece argues for continued export controls on chips to China, on the basis that if future AI unlocks "extremely rapid advances in science and technology" the US needs to get their first.
Tags: anthropic, openai, deepseek, ai, llms, generative-ai, inference-scaling, o1
AI Summary and Description: Yes
Summary: The article discusses insights from Dario Amodei, CEO of Anthropic, regarding recent advancements in AI models like DeepSeek and their training costs compared to existing models. The text emphasizes the ongoing trends in model training efficiency and advocates for continued export controls on AI-related chips to China to maintain competitive advantage.
Detailed Description:
– **Dario Amodei’s Insights**: As a key figure in AI development, Amodei presents a critical analysis of recent technological advancements in AI, particularly focusing on the evolution of models like DeepSeek.
– **Model Efficiency Improvements**: The text highlights how innovations in both model architecture and hardware are leading to more effective training processes. This is exemplified by the “compute multiplier” (CM) concept, which implies that improvements enable significant savings in training costs.
– Example: A 2x compute multiplier can reduce costs dramatically (e.g., achieving significant performance improvements for half the previous investment).
– **DeepSeek v3 Assessment**:
– While praised for its capabilities, Amodei posits that the advancements are in line with existing trends, rather than groundbreaking innovations.
– He contrasts the training cost of DeepSeek v3 with other models, suggesting that it fits within an expected reduction trend.
– **Claude 3.5 Sonnet Details**:
– The article reveals insights into the training of Claude 3.5 Sonnet, debunking rumors about its relationship to larger models and specifying its training time frame.
– **User Experience and Model Interpretation**:
– The distinction in user perception of DeepSeek R1 is mentioned, particularly its chain-of-thought reasoning display, which captivated attention and showcased a difference in user interface design choices compared to OpenAI’s approach.
– **Export Controls Discussion**:
– Amodei argues for the necessity of maintaining export controls on semiconductor technology to ensure the U.S. remains at the forefront of AI advancements, particularly if future developments in AI lead to rapid technological breakthroughs.
Key Implications for Security and Compliance Professionals:
– **Awareness of Export Controls**: Understanding the implications of regulations on AI chips and technology is vital, particularly concerning national security and international competitiveness in AI.
– **Model Development Considerations**: Insights into training efficiencies and advancements can guide investment decisions in AI technologies, emphasizing the necessity of remaining updated on trends within the industry.
– **Ethics and Governance**: As AI technologies develop, there will be increased scrutiny and expectations regarding responsible AI deployment, necessitating strong governance frameworks to mitigate risks associated with rapid advancements.
This analysis underscores the evolving landscape of AI and the importance of strategic compliance efforts amidst ongoing technological evolution.