Simon Willison’s Weblog: The impact of competition and DeepSeek on Nvidia

Jan 27, 2025

—

Source URL: https://simonwillison.net/2025/Jan/27/deepseek-nvidia/
Source: Simon Willison’s Weblog
Title: The impact of competition and DeepSeek on Nvidia

Feedly Summary: The impact of competition and DeepSeek on Nvidia
Long, excellent piece by Jeffrey Emanuel capturing the current state of the AI/LLM industry. The original title is “The Short Case for Nvidia Stock" – I’m using the Hacker News alternative title here, but even that I feel under-sells this essay.
Jeffrey has a rare combination of experience in both computer science and investment analysis. He combines both worlds here, evaluating NVIDIA’s challenges by providing deep insight into a whole host of relevant and interesting topics.
As Jeffrey describes it, NVIDA’s moat has four components: high-quality Linux drivers, CUDA as an industry standard, the fast GPU interconnect technology they acquired from Mellanox in 2019 and the flywheel effect where they can invest their enormous profits (75-90% margin in some cases!) into more R&D.
Each of these is under threat.
Technologies like MLX, Triton and JAX are undermining the CUDA advantage by making it easier for ML developers to target multiple backends – plus LLMs themselves are getting capable enough to help port things to alternative architectures.
GPU interconnect helps multiple GPUs work together on tasks like model training. Companies like Cerebras are developing enormous chips that can get way more done on a single chip.
Those 75-90% margins provide a huge incentive for other companies to catch up – including the customers who spend the most on NVIDIA at the moment – Microsoft, Amazon, Meta, Google, Apple – all of whom have their own internal silicon projects:

Now, it’s no secret that there is a strong power law distribution of Nvidia’s hyper-scaler customer base, with the top handful of customers representing the lion’s share of high-margin revenue. How should one think about the future of this business when literally every single one of these VIP customers is building their own custom chips specifically for AI training and inference?

The real joy of this article is the way it describes technical details of modern LLMs in a relatively accessible manner. I love this description of the inference-scaling tricks used by O1 and R1, compared to traditional transformers:

Basically, the way Transformers work in terms of predicting the next token at each step is that, if they start out on a bad "path" in their initial response, they become almost like a prevaricating child who tries to spin a yarn about why they are actually correct, even if they should have realized mid-stream using common sense that what they are saying couldn’t possibly be correct.
Because the models are always seeking to be internally consistent and to have each successive generated token flow naturally from the preceding tokens and context, it’s very hard for them to course-correct and backtrack. By breaking the inference process into what is effectively many intermediate stages, they can try lots of different things and see what’s working and keep trying to course-correct and try other approaches until they can reach a fairly high threshold of confidence that they aren’t talking nonsense.

The last quarter of the article talks about the seismic waves rocking the industry right now caused by DeepSeek v3 and R1. v3 remains the top-ranked open weights model, despite being around 45x more efficient in training than its competition: bad news if you are selling GPUs! R1 represents another huge breakthrough in efficiency both for training and for inference – the DeepSeek R1 API is currently 27x cheaper than OpenAI’s o1, for a similar level of quality.
Jeffrey summarized some of the key ideas from the v3 paper like this:

A major innovation is their sophisticated mixed-precision training framework that lets them use 8-bit floating point numbers (FP8) throughout the entire training process. […]
DeepSeek cracked this problem by developing a clever system that breaks numbers into small tiles for activations and blocks for weights, and strategically uses high-precision calculations at key points in the network. Unlike other labs that train in high precision and then compress later (losing some quality in the process), DeepSeek’s native FP8 approach means they get the massive memory savings without compromising performance. When you’re training across thousands of GPUs, this dramatic reduction in memory requirements per GPU translates into needing far fewer GPUs overall.

Then for R1:

With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step-by-step without relying on massive supervised datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully crafted reward functions, they managed to get models to develop sophisticated reasoning capabilities completely autonomously. […]
The technical breakthrough here was their novel approach to reward modeling. Rather than using complex neural reward models that can lead to "reward hacking" (where the model finds bogus ways to boost their rewards that don’t actually lead to better real-world model performance), they developed a clever rule-based system that combines accuracy rewards (verifying final answers) with format rewards (encouraging structured thinking). This simpler approach turned out to be more robust and scalable than the process-based reward models that others have tried.

This article is packed with insights like that – it’s worth spending the time absorbing the whole thing.
Via Hacker News
Tags: cerebras, nvidia, generative-ai, groq, deepseek, ai, llms, mlx, inference-scaling

AI Summary and Description: Yes

Summary: The text provides an in-depth analysis of the competitive landscape surrounding Nvidia within the AI and LLM industry, emphasizing the emerging threats from companies like DeepSeek and their innovative technologies. It highlights Nvidia’s strengths and vulnerabilities, particularly in relation to its key customer base, which is increasingly pursuing custom silicon for AI applications.

Detailed Description:
The article by Jeffrey Emanuel delves into the current dynamics affecting Nvidia, a leading player in the AI and GPU markets, while also discussing significant rivals like DeepSeek. It provides valuable insights for security, privacy, and compliance professionals, particularly regarding emerging technologies and competitive strategies within the AI sector. Here are the major points covered:

– **Nvidia’s Competitive Advantages**:
– High-quality Linux drivers that facilitate software performance.
– CUDA as the industry standard protocol for parallel computing and GPU programming.
– Advanced GPU interconnect technology that supports efficient multi-GPU configurations, enhancing model training capabilities.
– A robust R&D investment strategy supported by high-profit margins (75-90%).

– **Emerging Threats**:
– New technologies such as MLX, Triton, and JAX threaten Nvidia’s CUDA advantage, enabling developers to target various backend systems more easily.
– The rise of companies like Cerebras, which are challenging Nvidia by developing larger chips that outperform Nvidia’s offerings in specific tasks, particularly model training.

– **Concentration of Power**:
– Nvidia’s revenue heavily depends on a small number of customers (Microsoft, Amazon, Meta, Google, Apple) who are now investing in their own silicon projects, potentially undermining Nvidia’s market position.

– **Innovations by DeepSeek**:
– DeepSeek’s v3 model and R1 API are game changers, notable for their efficiency (45 times more efficient in training) and lower operational costs (R1 being 27 times cheaper than OpenAI’s O1).
– DeepSeek’s sophisticated mixed-precision training, utilizing FP8 throughout training, allows for significant memory savings without sacrificing performance.

– **Advances in AI Methodologies**:
– The R1 model incorporates novel reinforcement learning techniques to develop reasoning capabilities autonomously, a significant shift from traditional supervised learning approaches.
– A new reward modeling system minimizes ‘reward hacking’ issues that have plagued other models, thereby ensuring more reliable performance.

– **Implications for the Industry**:
– The article emphasizes that the competitive landscape is shifting rapidly due to these technological innovations and the development of custom silicon by key players.
– Security and compliance professionals should watch for implications related to data handling, model performance, and governance as these changes occur.

By synthesizing these elements, the article offers a comprehensive look into the AI industry’s current state, underscoring essential developments and potential shifts that may impact security and compliance measures in AI technologies.

.NET 01 1 2 3 4 5 7 a access accuracy Act AGI AI AI applications AI technologies Amazon analysis and API Apple Application applications Arch architecture architectures Arize art as Auto autonomous backend based bing business by C capabilities cell Cerebras challenges chip chips Col companies Competition competitive competitive advantage competitive landscape competitive strategies compliance compliance measures compliance professionals compute computer computer science Computing Configuration Context cost Costs cross Current custom silicon Customer D data Data Handling dataset datasets de DeepSeek DeepSeek R1 Deepseek v3 depth developer developers development e effective efficiency efficient Emerging Technologies emerging threats end Entra EU Excel exp experience fast floating point numbers for framework full future g Gen generated generative Go Google governance GPU GPU programming GPUs Groq gs hack hacker Hacker News hacking heap high Highlight HR http HTTPS Hyper ICO implications in industry Inference innovation Innovations innovative technologies insights inter intern inux investment ite J k l land large law learning led Linux Lite llm llms lm logic long low making market market position mass media memory memory requirements Meta Micro Microsoft Mila mini mixed ML mlx model model performance model training modeling models Modern multi nation native native architectures network news next no non Nvidia o o1 of off on one oost open open weights openai operation Operational Costs Orb ory out over parallel computing performance phi point Power pre precision privacy problem professionals profit programming projects protocol R R1 rack rag Rank rate RCE real reasoning reasoning capabilities red reinforcement learning Requirements response revenue reward modeling reward models right Ro Rock s scalable Scale scaling science sec security security and compliance SHA short Sig Silicon Sim Simple single SMIC software software performance source SSE start state Strategy structured supervised learning system systems T Tails Task tasks tech technical details techniques technological innovation technological innovations technologies technology text the threat threats Time to token tokens Tor TP training training capabilities training framework transformer transformers trie triton two UI up US use V V3 val vulnerabilities web Wi x zero