Tag: DeepSeek

  • Simon Willison’s Weblog: Run DeepSeek R1 or V3 with MLX Distributed

    Source URL: https://simonwillison.net/2025/Jan/22/mlx-distributed/ Source: Simon Willison’s Weblog Title: Run DeepSeek R1 or V3 with MLX Distributed Feedly Summary: Run DeepSeek R1 or V3 with MLX Distributed Handy detailed instructions from Awni Hannun on running the enormous DeepSeek R1 or v3 models on a cluster of Macs using the distributed communication feature of Apple’s MLX library.…

  • Slashdot: Cutting-Edge Chinese ‘Reasoning’ Model Rivals OpenAI O1

    Source URL: https://slashdot.org/story/25/01/21/2138247/cutting-edge-chinese-reasoning-model-rivals-openai-o1?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Cutting-Edge Chinese ‘Reasoning’ Model Rivals OpenAI O1 Feedly Summary: AI Summary and Description: Yes Summary: The release of DeepSeek’s R1 model family marks a significant advancement in the availability of high-performing AI models, particularly in the realms of math and coding tasks. With an open MIT license, these models…

  • Hacker News: Official DeepSeek R1 Now on Ollama

    Source URL: https://ollama.com/library/deepseek-r1 Source: Hacker News Title: Official DeepSeek R1 Now on Ollama Feedly Summary: Comments AI Summary and Description: Yes Summary: The text provides an overview of DeepSeek’s first-generation reasoning models that exhibit performance comparable to OpenAI’s offerings across math, code, and reasoning tasks. This information is highly relevant for practitioners in AI and…

  • Hacker News: DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks

    Source URL: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B Source: Hacker News Title: DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text describes the introduction of DeepSeek-R1 and DeepSeek-R1-Zero, first-generation reasoning models that utilize large-scale reinforcement learning without prior supervised fine-tuning. These models exhibit significant reasoning capabilities but also face challenges like endless…

  • Simon Willison’s Weblog: DeepSeek-R1 and exploring DeepSeek-R1-Distill-Llama-8B

    Source URL: https://simonwillison.net/2025/Jan/20/deepseek-r1/ Source: Simon Willison’s Weblog Title: DeepSeek-R1 and exploring DeepSeek-R1-Distill-Llama-8B Feedly Summary: DeepSeek are the Chinese AI lab who dropped the best currently available open weights LLM on Christmas day, DeepSeek v3. That model was trained in part using their unreleased R1 “reasoning" model. Today they’ve released R1 itself, along with a whole…

  • Hacker News: DeepSeek-R1

    Source URL: https://github.com/deepseek-ai/DeepSeek-R1 Source: Hacker News Title: DeepSeek-R1 Feedly Summary: Comments AI Summary and Description: Yes Summary: The text presents advancements in AI reasoning models, specifically DeepSeek-R1-Zero and DeepSeek-R1, emphasizing the unique approach of training solely through large-scale reinforcement learning (RL) without initial supervised fine-tuning. These models demonstrate significant reasoning capabilities and highlight breakthroughs in…

  • Simon Willison’s Weblog: DeepSeek API Docs: Rate Limit

    Source URL: https://simonwillison.net/2025/Jan/18/deepseek-api-docs-rate-limit/#atom-everything Source: Simon Willison’s Weblog Title: DeepSeek API Docs: Rate Limit Feedly Summary: DeepSeek API Docs: Rate Limit This is surprising: DeepSeek offer the only hosted LLM API I’ve seen that doesn’t implement rate limits: DeepSeek API does NOT constrain user’s rate limit. We will try out best to serve every request. However,…

  • Simon Willison’s Weblog: Codestral 25.01

    Source URL: https://simonwillison.net/2025/Jan/13/codestral-2501/ Source: Simon Willison’s Weblog Title: Codestral 25.01 Feedly Summary: Codestral 25.01 Brand new code-focused model from Mistral. Unlike the first Codestral this one isn’t (yet) available as open weights. The model has a 256k token context – a new record for Mistral. The new model scored an impressive joint first place with…

  • Simon Willison’s Weblog: Weeknotes: Starting 2025 a little slow

    Source URL: https://simonwillison.net/2025/Jan/4/weeknotes/#atom-everything Source: Simon Willison’s Weblog Title: Weeknotes: Starting 2025 a little slow Feedly Summary: I published my review of 2024 in LLMs and then got into a fight with most of the internet over the phone microphone targeted ads conspiracy theory. In my last weeknotes I talked about how December in LLMs has…

  • Hacker News: Notes on the New Deepseek v3

    Source URL: https://composio.dev/blog/notes-on-new-deepseek-v3/ Source: Hacker News Title: Notes on the New Deepseek v3 Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses the release of Deepseek’s v3 model, a 607B mixture-of-experts model that showcases exceptional performance, surpassing both open-source and proprietary competitors at a significantly lower training cost. It highlights the engineering…