Deepseek v3 – Experimental News Clipping Site

Cloud Blog: How Baseten achieves 225% better cost-performance for AI inference (and you can too)

Sep 4, 2025

—

by

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/how-baseten-achieves-better-cost-performance-for-ai-inference/ Source: Cloud Blog Title: How Baseten achieves 225% better cost-performance for AI inference (and you can too) Feedly Summary: Baseten is one of a growing number of AI infrastructure providers, helping other startups run their models and experiments at speed and scale. Given the importance of those two factors to its customers,…

Simon Willison’s Weblog: DeepSeek 3.1

Aug 22, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Aug/22/deepseek-31/#atom-everything Source: Simon Willison’s Weblog Title: DeepSeek 3.1 Feedly Summary: DeepSeek 3.1 The latest model from DeepSeek, a 685B monster (like DeepSeek v3 before it) but this time it’s a hybrid reasoning model. DeepSeek claim: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly. Drew Breunig points out that their benchmarks…

Simon Willison’s Weblog: moonshotai/Kimi-K2-Instruct

Jul 11, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jul/11/kimi-k2/#atom-everything Source: Simon Willison’s Weblog Title: moonshotai/Kimi-K2-Instruct Feedly Summary: moonshotai/Kimi-K2-Instruct Colossal new open weights model release today from Moonshot AI, a two year old Chinese AI lab with a name inspired by Pink Floyd’s album The Dark Side of the Moon. My HuggingFace storage calculator says the repository is 958.52 GB. It’s a…

Cloud Blog: AI Hypercomputer developer experience enhancements from Q1 25: build faster, scale bigger

May 16, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/compute/ai-hypercomputer-enhancements-for-the-developer/ Source: Cloud Blog Title: AI Hypercomputer developer experience enhancements from Q1 25: build faster, scale bigger Feedly Summary: Building cutting-edge AI models is exciting, whether you’re iterating in your notebook or orchestrating large clusters. However, scaling up training can present significant challenges, including navigating complex infrastructure, configuring software and dependencies across numerous…

Slashdot: In ‘Milestone’ for Open Source, Meta Releases New Benchmark-Beating Llama 4 Models

Apr 6, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://news.slashdot.org/story/25/04/06/182233/in-milestone-for-open-source-meta-releases-new-benchmark-beating-llama-4-models?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: In ‘Milestone’ for Open Source, Meta Releases New Benchmark-Beating Llama 4 Models Feedly Summary: AI Summary and Description: Yes Summary: Mark Zuckerberg recently announced the launch of four new Llama Large Language Models (LLMs) that reinforce Meta’s commitment to open source AI. These models, particularly Llama 4 Scout and…

Simon Willison’s Weblog: Quoting Ahmed Al-Dahle

Apr 5, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Apr/5/llama-4/#atom-everything Source: Simon Willison’s Weblog Title: Quoting Ahmed Al-Dahle Feedly Summary: The Llama series have been re-designed to use state of the art mixture-of-experts (MoE) architecture and natively trained with multimodality. We’re dropping Llama 4 Scout & Llama 4 Maverick, and previewing Llama 4 Behemoth. 📌 Llama 4 Scout is highest performing small…

Simon Willison’s Weblog: deepseek-ai/DeepSeek-V3-0324

Mar 27, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Mar/24/deepseek/ Source: Simon Willison’s Weblog Title: deepseek-ai/DeepSeek-V3-0324 Feedly Summary: deepseek-ai/DeepSeek-V3-0324 Chinese AI lab DeepSeek just released the latest version of their enormous DeepSeek v3 model, baking the release date into the name DeepSeek-V3-0324. The license is MIT, the README is empty and the release adds up a to a total of 641 GB…

Simon Willison’s Weblog: Qwen2.5-VL-32B: Smarter and Lighter

Mar 24, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Mar/24/qwen25-vl-32b/#atom-everything Source: Simon Willison’s Weblog Title: Qwen2.5-VL-32B: Smarter and Lighter Feedly Summary: Qwen2.5-VL-32B: Smarter and Lighter The second big open weight LLM release from China today – the first being DeepSeek v3-0324. Qwen’s previous vision model was Qwen2.5 VL, released in January in 3B, 7B and 72B sizes. Today’s release is a 32B…

Simon Willison’s Weblog: deepseek-ai/DeepSeek-V3-0324

Mar 24, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Mar/24/deepseek/ Source: Simon Willison’s Weblog Title: deepseek-ai/DeepSeek-V3-0324 Feedly Summary: deepseek-ai/DeepSeek-V3-0324 Chinese AI lab DeepSeek just released the latest version of their enormous DeepSeek v3 model, baking the release date into the name DeepSeek-V3-0324. The license is MIT, the README is empty and the release adds up a to a total of 641 GB…

Simon Willison’s Weblog: My Thoughts on the Future of "AI"

Mar 19, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Mar/19/my-thoughts-on-the-future-of-ai/ Source: Simon Willison’s Weblog Title: My Thoughts on the Future of "AI" Feedly Summary: My Thoughts on the Future of “AI" Nicholas Carlini, previously deeply skeptical about the utility of LLMs, discusses at length his thoughts on where the technology might go. He presents compelling, detailed arguments for both ends of the…

Tag: Deepseek v3