Tag: tokens

  • Simon Willison’s Weblog: Faster inference

    Source URL: https://simonwillison.net/2025/Aug/1/faster-inference/ Source: Simon Willison’s Weblog Title: Faster inference Feedly Summary: Two interesting examples of inference speed as a flagship feature of LLM services today. First, Cerebras announced two new monthly plans for their extremely high speed hosted model service: Cerebras Code Pro ($50/month, 1,000 messages a day) and Cerebras Code Max ($200/month, 5,000/day).…

  • Simon Willison’s Weblog: Reverse engineering some updates to Claude

    Source URL: https://simonwillison.net/2025/Jul/31/updates-to-claude/#atom-everything Source: Simon Willison’s Weblog Title: Reverse engineering some updates to Claude Feedly Summary: Anthropic released two major new features for their consumer-facing Claude apps in the past couple of days. Sadly, they don’t do a very good job of updating the release notes for those apps – neither of these releases came…

  • Simon Willison’s Weblog: Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM

    Source URL: https://simonwillison.net/2025/Jul/31/qwen3-coder-flash/ Source: Simon Willison’s Weblog Title: Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM Feedly Summary: Qwen just released their sixth model(!) for this July called Qwen3-Coder-30B-A3B-Instruct – listed as Qwen3-Coder-Flash in their chat.qwen.ai interface. It’s 30.5B total parameters with 3.3B active at any one time. This means…

  • Simon Willison’s Weblog: Qwen/Qwen3-30B-A3B-Instruct-2507

    Source URL: https://simonwillison.net/2025/Jul/29/qwen3-30b-a3b-instruct-2507/ Source: Simon Willison’s Weblog Title: Qwen/Qwen3-30B-A3B-Instruct-2507 Feedly Summary: Qwen/Qwen3-30B-A3B-Instruct-2507 New model update from Qwen, improving on their previous Qwen3-30B-A3B release from late April. In their tweet they said: Smarter, faster, and local deployment-friendly. ✨ Key Enhancements: ✅ Enhanced reasoning, coding, and math skills ✅ Broader multilingual knowledge ✅ Improved long-context understanding (up…

  • Simon Willison’s Weblog: My 2.5 year old laptop can write Space Invaders in JavaScript now

    Source URL: https://simonwillison.net/2025/Jul/29/space-invaders/ Source: Simon Willison’s Weblog Title: My 2.5 year old laptop can write Space Invaders in JavaScript now Feedly Summary: I wrote about the new GLM-4.5 model family yesterday – new open weight (MIT licensed) models from Z.ai in China which their benchmarks claim score highly in coding even against models such as…

  • Simon Willison’s Weblog: Qwen3-235B-A22B-Thinking-2507

    Source URL: https://simonwillison.net/2025/Jul/25/qwen3-235b-a22b-thinking-2507/#atom-everything Source: Simon Willison’s Weblog Title: Qwen3-235B-A22B-Thinking-2507 Feedly Summary: Qwen3-235B-A22B-Thinking-2507 The third Qwen model release week, following Qwen3-235B-A22B-Instruct-2507 on Monday 21st and Qwen3-Coder-480B-A35B-Instruct on Tuesday 22nd. Those two were both non-reasoning models – a change from the previous models in the Qwen 3 family which combined reasoning and non-reasoning in the same model,…

  • Simon Willison’s Weblog: Qwen3-Coder: Agentic Coding in the World

    Source URL: https://simonwillison.net/2025/Jul/22/qwen3-coder/ Source: Simon Willison’s Weblog Title: Qwen3-Coder: Agentic Coding in the World Feedly Summary: Qwen3-Coder: Agentic Coding in the World It turns out that as I was typing up my notes on Qwen3-235B-A22B-Instruct-2507 the Qwen team were unleashing something much bigger: Today, we’re announcing Qwen3-Coder, our most agentic code model to date. Qwen3-Coder…