Tag: context window
-
Simon Willison’s Weblog: Kimi-K2-Instruct-0905
Source URL: https://simonwillison.net/2025/Sep/6/kimi-k2-instruct-0905/#atom-everything Source: Simon Willison’s Weblog Title: Kimi-K2-Instruct-0905 Feedly Summary: Kimi-K2-Instruct-0905 New not-quite-MIT licensed model from Chinese Moonshot AI, a follow-up to the highly regarded Kimi-K2 model they released in July. This one is an incremental improvement – I’ve seen it referred to online as “Kimi K-2.1". It scores a little higher on a…
-
Cloud Blog: How Baseten achieves 225% better cost-performance for AI inference (and you can too)
Source URL: https://cloud.google.com/blog/products/ai-machine-learning/how-baseten-achieves-better-cost-performance-for-ai-inference/ Source: Cloud Blog Title: How Baseten achieves 225% better cost-performance for AI inference (and you can too) Feedly Summary: Baseten is one of a growing number of AI infrastructure providers, helping other startups run their models and experiments at speed and scale. Given the importance of those two factors to its customers,…
-
Simon Willison’s Weblog: Introducing gpt-realtime
Source URL: https://simonwillison.net/2025/Sep/1/introducing-gpt-realtime/#atom-everything Source: Simon Willison’s Weblog Title: Introducing gpt-realtime Feedly Summary: Introducing gpt-realtime Released a few days ago (August 28th), gpt-realtime is OpenAI’s new “most advanced speech-to-speech model". It looks like this is a replacement for the older gpt-4o-realtime-preview model that was released last October. This is a slightly confusing release. The previous realtime…
-
Simon Willison’s Weblog: too many model context protocol servers and LLM allocations on the dance floor
Source URL: https://simonwillison.net/2025/Aug/22/too-many-mcps/#atom-everything Source: Simon Willison’s Weblog Title: too many model context protocol servers and LLM allocations on the dance floor Feedly Summary: too many model context protocol servers and LLM allocations on the dance floor Useful reminder from Geoffrey Huntley of the infrequently discussed significant token cost of using MCP. Geoffrey estimate estimates that…
-
Simon Willison’s Weblog: Claude Sonnet 4 now supports 1M tokens of context
Source URL: https://simonwillison.net/2025/Aug/12/claude-sonnet-4-1m/ Source: Simon Willison’s Weblog Title: Claude Sonnet 4 now supports 1M tokens of context Feedly Summary: Claude Sonnet 4 now supports 1M tokens of context Gemini and OpenAI both have million token models, so it’s good to see Anthropic catching up. This is 5x the previous 200,000 context length limit of the…
-
Cloud Blog: Taming the stragglers: Maximize AI training performance with automated straggler detection
Source URL: https://cloud.google.com/blog/products/compute/stragglers-in-ai-a-guide-to-automated-straggler-detection/ Source: Cloud Blog Title: Taming the stragglers: Maximize AI training performance with automated straggler detection Feedly Summary: Stragglers are an industry-wide issue for developers working with large-scale machine learning workloads. The larger and more powerful these systems become, the more their performance is hostage to the subtle misbehavior of a single component.…
-
AWS News Blog: OpenAI open weight models now available on AWS
Source URL: https://aws.amazon.com/blogs/aws/openai-open-weight-models-now-available-on-aws/ Source: AWS News Blog Title: OpenAI open weight models now available on AWS Feedly Summary: AWS continues to expand access to the most advanced foundation models with OpenAI open weight models now available in Amazon Bedrock and Amazon SageMaker JumpStart. Accessing these new models from OpenAI on AWS, gpt-oss-120b and gpt-oss-20b, gives…
-
Simon Willison’s Weblog: Coding with LLMs in the summer of 2025 (an update)
Source URL: https://simonwillison.net/2025/Jul/21/coding-with-llms/#atom-everything Source: Simon Willison’s Weblog Title: Coding with LLMs in the summer of 2025 (an update) Feedly Summary: Coding with LLMs in the summer of 2025 (an update) Salvatore Sanfilippo describes his current AI-assisted development workflow. He’s all-in on LLMs for code review, exploratory prototyping, pair-design and writing “part of the code under…
-
Cloud Blog: Announcing Vertex AI Agent Engine Memory Bank available for everyone in preview
Source URL: https://cloud.google.com/blog/products/ai-machine-learning/vertex-ai-memory-bank-in-public-preview/ Source: Cloud Blog Title: Announcing Vertex AI Agent Engine Memory Bank available for everyone in preview Feedly Summary: Developers are racing to productize agents, but a common limitation is the absence of memory. Without memory, agents treat each interaction as the first, asking repetitive questions and failing to recall user preferences. This…
-
Irrational Exuberance: What can agents actually do?
Source URL: https://lethain.com/what-can-agents-do/ Source: Irrational Exuberance Title: What can agents actually do? Feedly Summary: There’s a lot of excitement about what AI (specifically the latest wave of LLM-anchored AI) can do, and how AI-first companies are different from the prior generations of companies. There are a lot of important and real opportunities at hand, but…