Simon Willison’s Weblog: Quoting Ben Thompson

Source URL: https://simonwillison.net/2025/Jan/28/ben-thompson/#atom-everything
Source: Simon Willison’s Weblog
Title: Quoting Ben Thompson

Feedly Summary: H100s were prohibited by the chip ban, but not H800s. Everyone assumed that training leading edge models required more interchip memory bandwidth, but that is exactly what DeepSeek optimized both their model structure and infrastructure around.
Again, just to emphasize this point, all of the decisions DeepSeek made in the design of this model only make sense if you are constrained to the H800; if DeepSeek had access to H100s, they probably would have used a larger training cluster with much fewer optimizations specifically focused on overcoming the lack of bandwidth.
— Ben Thompson, DeepSeek FAQ
Tags: deepseek, ai, gpus, nvidia

AI Summary and Description: Yes

Summary: The text discusses the distinctions in using different GPU models, specifically the H800 and H100, emphasizing how restrictions on chip availability have led to innovative optimizations in AI model training and infrastructure by DeepSeek. This insight is crucial for professionals in AI and infrastructure security as it highlights adaptive strategies in technology deployment amid regulatory constraints.

Detailed Description: The passage outlines the implications of hardware limitations due to chip bans, using the example of DeepSeek’s model optimization strategy in response to the absence of H100 GPUs. This is significant for those in the fields of AI and infrastructure, as it reflects on how organizations adapt their technology choices in the face of regulatory and supply chain challenges.

* Key Points:
– **Chip Ban Context**: The prohibition of H100 GPUs and the continued use of H800 models.
– **Misconception on Memory Bandwidth**: Common belief that advanced models require higher memory bandwidth, which DeepSeek addressed differently.
– **Optimized Model Structure**: DeepSeek constructed their AI model and infrastructure to work effectively within the limitations of the H800 GPUs.
– **Hypothetical Scenario**: If DeepSeek had access to H100 GPUs, they would have opted for different, less optimized infrastructure choices, potentially affecting their model’s performance.

This analysis underscores the importance of understanding hardware capabilities and constraints when strategizing AI development and deployment. Such insights can assist security and compliance professionals in anticipating risks and adjustments necessary for maintaining secure and efficient infrastructures.