Tag: DeepSeek

  • Simon Willison’s Weblog: DeepSeek_V3.pdf

    Source URL: https://simonwillison.net/2024/Dec/26/deepseek-v3/#atom-everything Source: Simon Willison’s Weblog Title: DeepSeek_V3.pdf Feedly Summary: DeepSeek_V3.pdf The DeepSeek v3 paper (and model card) are out, after yesterday’s mysterious release of the undocumented model weights. Plenty of interesting details in here. The model pre-trained on 14.8 trillion “high-quality and diverse tokens" (not otherwise documented). Following this, we conduct post-training, including…

  • Hacker News: DeepSeek-V3

    Source URL: https://github.com/deepseek-ai/DeepSeek-V3 Source: Hacker News Title: DeepSeek-V3 Feedly Summary: Comments AI Summary and Description: Yes Summary: The text introduces DeepSeek-V3, a significant advancement in language model technology, showcasing its innovative architecture and training techniques designed for improving efficiency and performance. For AI, cloud, and infrastructure security professionals, the novel methodologies and benchmarks presented can…

  • Simon Willison’s Weblog: deepseek-ai/DeepSeek-V3-Base

    Source URL: https://simonwillison.net/2024/Dec/25/deepseek-v3/#atom-everything Source: Simon Willison’s Weblog Title: deepseek-ai/DeepSeek-V3-Base Feedly Summary: deepseek-ai/DeepSeek-V3-Base No model card or announcement yet, but this new model release from Chinese AI lab DeepSeek (an arm of Chinese hedge fund High-Flyer) looks very significant. It’s a huge model – 685B parameters, 687.9 GB on disk (TIL how to size a git-lfs…

  • Slashdot: DeepSeek’s First Reasoning Model R1-Lite-Preview Beats OpenAI o1 Performance

    Source URL: https://slashdot.org/story/24/11/20/2129207/deepseeks-first-reasoning-model-r1-lite-preview-beats-openai-o1-performance?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: DeepSeek’s First Reasoning Model R1-Lite-Preview Beats OpenAI o1 Performance Feedly Summary: AI Summary and Description: Yes Summary: DeepSeek, a Chinese AI offshoot, has released a new reasoning-focused large language model, the R1-Lite-Preview, via its AI chatbot. This model demonstrates advanced reasoning capabilities and transparency in its processing, drawing attention…