Tag: open weights

  • Simon Willison’s Weblog: Codestral Embed

    Source URL: https://simonwillison.net/2025/May/28/codestral-embed/#atom-everything Source: Simon Willison’s Weblog Title: Codestral Embed Feedly Summary: Codestral Embed Brand new embedding model from Mistral, specifically trained for code. Mistral claim that: Codestral Embed significantly outperforms leading code embedders in the market today: Voyage Code 3, Cohere Embed v4.0 and OpenAI’s large embedding model. The model is designed to work…

  • Simon Willison’s Weblog: Building software on top of Large Language Models

    Source URL: https://simonwillison.net/2025/May/15/building-on-llms/#atom-everything Source: Simon Willison’s Weblog Title: Building software on top of Large Language Models Feedly Summary: I presented a three hour workshop at PyCon US yesterday titled Building software on top of Large Language Models. The goal of the workshop was to give participants everything they needed to get started writing code that…

  • Simon Willison’s Weblog: Medium is the new large

    Source URL: https://simonwillison.net/2025/May/7/medium-is-the-new-large/#atom-everything Source: Simon Willison’s Weblog Title: Medium is the new large Feedly Summary: Medium is the new large New model release from Mistral – this time closed source/proprietary. Mistral Medium claims strong benchmark scores similar to GPT-4o and Claude 3.7 Sonnet, but is priced at $0.40/million input and $2/million output – about the…

  • Simon Willison’s Weblog: What people get wrong about the leading Chinese open models: Adoption and censorship

    Source URL: https://simonwillison.net/2025/May/6/what-people-get-wrong-about-the-leading-chinese-models/#atom-everything Source: Simon Willison’s Weblog Title: What people get wrong about the leading Chinese open models: Adoption and censorship Feedly Summary: What people get wrong about the leading Chinese open models: Adoption and censorship While I’ve been enjoying trying out Alibaba’s Qwen 3 a lot recently, Nathan Lambert focuses on the elephant in…

  • Simon Willison’s Weblog: Introducing Command A: Max performance, minimal compute

    Source URL: https://simonwillison.net/2025/Mar/13/command-a/#atom-everything Source: Simon Willison’s Weblog Title: Introducing Command A: Max performance, minimal compute Feedly Summary: Introducing Command A: Max performance, minimal compute New LLM release from Cohere. It’s interesting to see which aspects of the model they’re highlighting, as an indicator of what their commercial customers value the most (highlight mine): Command A…

  • Hacker News: The Illustrated DeepSeek-R1

    Source URL: https://newsletter.languagemodels.co/p/the-illustrated-deepseek-r1 Source: Hacker News Title: The Illustrated DeepSeek-R1 Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses the launch of DeepSeek-R1, an advanced model in the machine learning and AI domain, highlighting its novel training approach, especially in reasoning tasks. This model presents significant insights into the evolving capabilities of…

  • Simon Willison’s Weblog: The impact of competition and DeepSeek on Nvidia

    Source URL: https://simonwillison.net/2025/Jan/27/deepseek-nvidia/ Source: Simon Willison’s Weblog Title: The impact of competition and DeepSeek on Nvidia Feedly Summary: The impact of competition and DeepSeek on Nvidia Long, excellent piece by Jeffrey Emanuel capturing the current state of the AI/LLM industry. The original title is “The Short Case for Nvidia Stock" – I’m using the Hacker…

  • Simon Willison’s Weblog: DeepSeek-R1 and exploring DeepSeek-R1-Distill-Llama-8B

    Source URL: https://simonwillison.net/2025/Jan/20/deepseek-r1/ Source: Simon Willison’s Weblog Title: DeepSeek-R1 and exploring DeepSeek-R1-Distill-Llama-8B Feedly Summary: DeepSeek are the Chinese AI lab who dropped the best currently available open weights LLM on Christmas day, DeepSeek v3. That model was trained in part using their unreleased R1 “reasoning" model. Today they’ve released R1 itself, along with a whole…

  • Simon Willison’s Weblog: Codestral 25.01

    Source URL: https://simonwillison.net/2025/Jan/13/codestral-2501/ Source: Simon Willison’s Weblog Title: Codestral 25.01 Feedly Summary: Codestral 25.01 Brand new code-focused model from Mistral. Unlike the first Codestral this one isn’t (yet) available as open weights. The model has a 256k token context – a new record for Mistral. The new model scored an impressive joint first place with…

  • Simon Willison’s Weblog: Pixtral Large

    Source URL: https://simonwillison.net/2024/Nov/18/pixtral-large/ Source: Simon Willison’s Weblog Title: Pixtral Large Feedly Summary: Pixtral Large New today from Mistral: Today we announce Pixtral Large, a 124B open-weights multimodal model built on top of Mistral Large 2. Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. The weights are out on…