Simon Willison’s Weblog: DeepSeek_V3.pdf

Source URL: https://simonwillison.net/2024/Dec/26/deepseek-v3/#atom-everything
Source: Simon Willison’s Weblog
Title: DeepSeek_V3.pdf

Feedly Summary: DeepSeek_V3.pdf
The DeepSeek v3 paper (and model card) are out, after yesterday’s mysterious release of the undocumented model weights.
Plenty of interesting details in here. The model pre-trained on 14.8 trillion “high-quality and diverse tokens" (not otherwise documented).

Following this, we conduct post-training, including Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on the base model of DeepSeek-V3, to align it with human preferences and further unlock its potential. During the post-training stage, we distill the reasoning capability from the DeepSeek-R1 series of models, and meanwhile carefully maintain the balance between model accuracy and generation length.

By far the most interesting detail though is how much the training cost. DeepSeek v3 trained on 2,788,000 H800 GPU hours at an estimated cost of $5,576,000. For comparison, Meta AI’s Llama 3.1 405B (smaller than DeepSeek v3’s 685B parameters) trained on 11x that – 30,840,000 GPU hours, also on 15 trillion tokens.
DeepSeek v3 benchmarks comparably to Claude 3.5 Sonnet, indicating that it’s now possible to train a frontier-class model (at least for the 2024 version of the frontier) for less than $6 million!
DeepSeek also announced their API pricing. From February 8th onwards:

Input: $0.27/million tokens ($0.07/million tokens with cache hits)
Output: $1.10/million tokens

Claude 3.5 Sonnet is currently $3/million for input and $15/million for output, so if the models are indeed of equivalent quality this is a dramatic new twist in the ongoing LLM pricing wars.
Via @deepseek_ai
Tags: deepseek, training-data, llms, ai, generative-ai, llm-pricing, llama, meta

AI Summary and Description: Yes

Summary: The release of the DeepSeek v3 paper introduces a significant advancement in LLM technology, particularly with its impressive training costs and performance metrics. This innovation could reshape the competitive landscape of AI models, especially regarding cost efficiency and capability.

Detailed Description:
The DeepSeek v3 paper details the introduction of a new language model, highlighting several key aspects that impact the fields of AI and generative AI security:

– **Training Scale and Cost**:
– DeepSeek v3 was trained using 2,788,000 H800 GPU hours, amounting to an estimated cost of $5,576,000. This positions it as a cost-effective option for frontier-class AI models, especially when compared to Meta AI’s Llama 3.1, which incurred 30,840,000 GPU hours for a significantly higher training cost.

– **Model Specifications**:
– DeepSeek v3 features 685 billion parameters and was pre-trained on 14.8 trillion high-quality tokens.
– The model underwent extensive post-training refinements through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), enhancing its reasoning capability while balancing accuracy and generation length.

– **Benchmarking Performance**:
– The performance of DeepSeek v3 is on par with that of Claude 3.5 Sonnet, indicating it can compete effectively in terms of quality.

– **API Pricing**:
– Starting February 8th, the API pricing for DeepSeek has been set at $0.27 per million input tokens and $1.10 per million output tokens, which is substantially lower compared to Claude 3.5 Sonnet’s pricing of $3 for input and $15 for output. This pricing strategy signals potential shifts in the generative AI market.

– **Implications for AI Industry**:
– The advancements in model training and cost efficiency could encourage wider adoption of such models across various sectors, while also raising the stakes in the competitive LLM market.

This analysis reveals how the developments in DeepSeek v3 could create significant implications for professionals in AI, as well as for those focusing on security, privacy, and compliance in deploying cutting-edge generative AI technologies. With pricing becoming increasingly competitive, organizations may need to consider both quality and cost-effectiveness in their AI deployments.