Tag: token usage

  • Simon Willison’s Weblog: claude-trace

    Source URL: https://simonwillison.net/2025/Jun/2/claude-trace/ Source: Simon Willison’s Weblog Title: claude-trace Feedly Summary: claude-trace I’ve been thinking for a while it would be interesting to run some kind of HTTP proxy against the Claude Code CLI app and take a peek at how it works. Mario Zechner just published a really nice version of that. It works…

  • Tomasz Tunguz: 1000x Increase in AI Demand

    Source URL: https://www.tomtunguz.com/nvda-2025-05-29/ Source: Tomasz Tunguz Title: 1000x Increase in AI Demand Feedly Summary: NVIDIA announced earnings yesterday. In addition to continued exceptional growth, the most interesting observations revolve around a shift from simple one-shot AI to reasoning. Reasoning improves accuracy for robots – like telling a person to stop and think about an answer…

  • Slashdot: Google’s Gemini 2.5 Models Gain "Deep Think" Reasoning

    Source URL: https://tech.slashdot.org/story/25/05/20/1915256/googles-gemini-25-models-gain-deep-think-reasoning?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Google’s Gemini 2.5 Models Gain "Deep Think" Reasoning Feedly Summary: AI Summary and Description: Yes Summary: Google has rolled out significant enhancements to its Gemini 2.5 AI models, particularly a new “Deep Think” reasoning mode that improves the models’ performance on complex tasks by allowing for hypothesis evaluation. These…

  • Simon Willison’s Weblog: Gemini 2.5 Models now support implicit caching

    Source URL: https://simonwillison.net/2025/May/9/gemini-implicit-caching/#atom-everything Source: Simon Willison’s Weblog Title: Gemini 2.5 Models now support implicit caching Feedly Summary: Gemini 2.5 Models now support implicit caching I just spotted a cacheTokensDetails key in the token usage JSON while running a long chain of prompts against Gemini 2.5 Flash – despite not configuring caching myself: {“cachedContentTokenCount": 200658, "promptTokensDetails":…

  • Simon Willison’s Weblog: Feed a video to a vision LLM as a sequence of JPEG frames on the CLI (also LLM 0.25)

    Source URL: https://simonwillison.net/2025/May/5/llm-video-frames/#atom-everything Source: Simon Willison’s Weblog Title: Feed a video to a vision LLM as a sequence of JPEG frames on the CLI (also LLM 0.25) Feedly Summary: The new llm-video-frames plugin can turn a video file into a sequence of JPEG frames and feed them directly into a long context vision LLM such…

  • Simon Willison’s Weblog: llm-fragment-symbex

    Source URL: https://simonwillison.net/2025/Apr/23/llm-fragment-symbex/#atom-everything Source: Simon Willison’s Weblog Title: llm-fragment-symbex Feedly Summary: llm-fragment-symbex I released a new LLM fragment loader plugin that builds on top of my Symbex project. Symbex is a CLI tool I wrote that can run against a folder full of Python code and output functions, classes, methods or just their docstrings and…

  • Simon Willison’s Weblog: Start building with Gemini 2.5 Flash

    Source URL: https://simonwillison.net/2025/Apr/17/start-building-with-gemini-25-flash/ Source: Simon Willison’s Weblog Title: Start building with Gemini 2.5 Flash Feedly Summary: Start building with Gemini 2.5 Flash Google Gemini’s latest model is Gemini 2.5 Flash, available in (paid) preview as gemini-2.5-flash-preview-04-17. Building upon the popular foundation of 2.0 Flash, this new version delivers a major upgrade in reasoning capabilities, while…

  • Simon Willison’s Weblog: Gemini 2.5 Pro Preview pricing

    Source URL: https://simonwillison.net/2025/Apr/4/gemini-25-pro-pricing/ Source: Simon Willison’s Weblog Title: Gemini 2.5 Pro Preview pricing Feedly Summary: Gemini 2.5 Pro Preview pricing Google’s Gemini 2.5 Pro is currently the top model on LM Arena and, from my own testing, a superb model for OCR, audio transcription and long-context coding. You can now pay for it! The new…

  • Hacker News: Hunyuan T1 Mamba Reasoning model beats R1 on speed and metrics

    Source URL: https://tencent.github.io/llm.hunyuan.T1/README_EN.html Source: Hacker News Title: Hunyuan T1 Mamba Reasoning model beats R1 on speed and metrics Feedly Summary: Comments AI Summary and Description: Yes Summary: The text describes Tencent’s innovative Hunyuan-T1 reasoning model, a significant advancement in large language models that utilizes reinforcement learning and a novel architecture to improve reasoning capabilities and…

  • Simon Willison’s Weblog: New audio models from OpenAI, but how much can we rely on them?

    Source URL: https://simonwillison.net/2025/Mar/20/new-openai-audio-models/#atom-everything Source: Simon Willison’s Weblog Title: New audio models from OpenAI, but how much can we rely on them? Feedly Summary: OpenAI announced several new audio-related API features today, for both text-to-speech and speech-to-text. They’re very promising new models, but they appear to suffer from the ever-present risk of accidental (or malicious) instruction…