reasoning tokens – Experimental News Clipping Site

Simon Willison’s Weblog: Improved Gemini 2.5 Flash and Flash-Lite

Sep 25, 2025

—

by

Source URL: https://simonwillison.net/2025/Sep/25/improved-gemini-25-flash-and-flash-lite/#atom-everything Source: Simon Willison’s Weblog Title: Improved Gemini 2.5 Flash and Flash-Lite Feedly Summary: Improved Gemini 2.5 Flash and Flash-Lite Two new preview models from Google – updates to their fast and inexpensive Flash and Flash Lite families: The latest version of Gemini 2.5 Flash-Lite was trained and built based on three key…

Simon Willison’s Weblog: GPT-5: Key characteristics, pricing and model card

Aug 7, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Aug/7/gpt-5/#atom-everything Source: Simon Willison’s Weblog Title: GPT-5: Key characteristics, pricing and model card Feedly Summary: I’ve had preview access to the new GPT-5 model family for the past two weeks, and have been using GPT-5 as my daily-driver. It’s my new favorite model. It’s still an LLM – it’s not a dramatic departure…

Simon Willison’s Weblog: Grok 4

Jul 10, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jul/10/grok-4/#atom-everything Source: Simon Willison’s Weblog Title: Grok 4 Feedly Summary: Grok 4 Released last night, Grok 4 is now available via both API and a paid subscription for end-users. Key characteristics: image and text input, text output. 256,000 context length (twice that of Grok 3). It’s a reasoning model where you can’t see…

Simon Willison’s Weblog: AbsenceBench: Language Models Can’t Tell What’s Missing

Jun 20, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jun/20/absencebench/#atom-everything Source: Simon Willison’s Weblog Title: AbsenceBench: Language Models Can’t Tell What’s Missing Feedly Summary: AbsenceBench: Language Models Can’t Tell What’s Missing Here’s another interesting result to file under the “jagged frontier" of LLMs, where their strengths and weaknesses are often unintuitive. Long context models have been getting increasingly good at passing "Needle…

Simon Willison’s Weblog: Exploring Promptfoo via Dave Guarino’s SNAP evals

Apr 24, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Apr/24/exploring-promptfoo/#atom-everything Source: Simon Willison’s Weblog Title: Exploring Promptfoo via Dave Guarino’s SNAP evals Feedly Summary: I used part three (here’s parts one and two) of Dave Guarino’s series on evaluating how well LLMs can answer questions about SNAP (aka food stamps) as an excuse to explore Promptfoo, an LLM eval tool. SNAP (Supplemental…

Simon Willison’s Weblog: Gemini 2.5 Pro Preview pricing

Apr 4, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Apr/4/gemini-25-pro-pricing/ Source: Simon Willison’s Weblog Title: Gemini 2.5 Pro Preview pricing Feedly Summary: Gemini 2.5 Pro Preview pricing Google’s Gemini 2.5 Pro is currently the top model on LM Arena and, from my own testing, a superb model for OCR, audio transcription and long-context coding. You can now pay for it! The new…

Simon Willison’s Weblog: o3-mini is really good at writing internal documentation

Feb 5, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Feb/5/o3-mini-documentation/#atom-everything Source: Simon Willison’s Weblog Title: o3-mini is really good at writing internal documentation Feedly Summary: o3-mini is really good at writing internal documentation I wanted to refresh my knowledge of how the Datasette permissions system works today. I already have extensive hand-written documentation for that, but I thought it would be interesting…

Simon Willison’s Weblog: Notes on OpenAI’s new o1 chain-of-thought models

Sep 12, 2024

—

by

system automation

in Uncategorized

Source URL: https://simonwillison.net/2024/Sep/12/openai-o1/ Source: Simon Willison’s Weblog Title: Notes on OpenAI’s new o1 chain-of-thought models Feedly Summary: OpenAI released two major new preview models today: o1-preview and o1-mini (that mini one is also a preview, despite the name) – previously rumored as having the codename “strawberry". There’s a lot to understand about these models –…

Tag: reasoning tokens