Tag: reasoning tokens

  • Simon Willison’s Weblog: GPT-5: Key characteristics, pricing and model card

    Source URL: https://simonwillison.net/2025/Aug/7/gpt-5/#atom-everything Source: Simon Willison’s Weblog Title: GPT-5: Key characteristics, pricing and model card Feedly Summary: I’ve had preview access to the new GPT-5 model family for the past two weeks, and have been using GPT-5 as my daily-driver. It’s my new favorite model. It’s still an LLM – it’s not a dramatic departure…

  • Simon Willison’s Weblog: AbsenceBench: Language Models Can’t Tell What’s Missing

    Source URL: https://simonwillison.net/2025/Jun/20/absencebench/#atom-everything Source: Simon Willison’s Weblog Title: AbsenceBench: Language Models Can’t Tell What’s Missing Feedly Summary: AbsenceBench: Language Models Can’t Tell What’s Missing Here’s another interesting result to file under the “jagged frontier" of LLMs, where their strengths and weaknesses are often unintuitive. Long context models have been getting increasingly good at passing "Needle…

  • Simon Willison’s Weblog: Exploring Promptfoo via Dave Guarino’s SNAP evals

    Source URL: https://simonwillison.net/2025/Apr/24/exploring-promptfoo/#atom-everything Source: Simon Willison’s Weblog Title: Exploring Promptfoo via Dave Guarino’s SNAP evals Feedly Summary: I used part three (here’s parts one and two) of Dave Guarino’s series on evaluating how well LLMs can answer questions about SNAP (aka food stamps) as an excuse to explore Promptfoo, an LLM eval tool. SNAP (Supplemental…

  • Simon Willison’s Weblog: Gemini 2.5 Pro Preview pricing

    Source URL: https://simonwillison.net/2025/Apr/4/gemini-25-pro-pricing/ Source: Simon Willison’s Weblog Title: Gemini 2.5 Pro Preview pricing Feedly Summary: Gemini 2.5 Pro Preview pricing Google’s Gemini 2.5 Pro is currently the top model on LM Arena and, from my own testing, a superb model for OCR, audio transcription and long-context coding. You can now pay for it! The new…

  • Simon Willison’s Weblog: o3-mini is really good at writing internal documentation

    Source URL: https://simonwillison.net/2025/Feb/5/o3-mini-documentation/#atom-everything Source: Simon Willison’s Weblog Title: o3-mini is really good at writing internal documentation Feedly Summary: o3-mini is really good at writing internal documentation I wanted to refresh my knowledge of how the Datasette permissions system works today. I already have extensive hand-written documentation for that, but I thought it would be interesting…

  • Simon Willison’s Weblog: Notes on OpenAI’s new o1 chain-of-thought models

    Source URL: https://simonwillison.net/2024/Sep/12/openai-o1/ Source: Simon Willison’s Weblog Title: Notes on OpenAI’s new o1 chain-of-thought models Feedly Summary: OpenAI released two major new preview models today: o1-preview and o1-mini (that mini one is also a preview, despite the name) – previously rumored as having the codename “strawberry". There’s a lot to understand about these models –…