Tag: Tags:
-
Simon Willison’s Weblog: model.yaml
Source URL: https://simonwillison.net/2025/Jun/21/model-yaml/#atom-everything Source: Simon Willison’s Weblog Title: model.yaml Feedly Summary: model.yaml From their GitHub repo it looks like this effort quietly launched a couple of months ago, driven by the LM Studio team. Their goal is to specify an “open standard for defining crossplatform, composable AI models". A model can be defined using a…
-
Simon Willison’s Weblog: AbsenceBench: Language Models Can’t Tell What’s Missing
Source URL: https://simonwillison.net/2025/Jun/20/absencebench/#atom-everything Source: Simon Willison’s Weblog Title: AbsenceBench: Language Models Can’t Tell What’s Missing Feedly Summary: AbsenceBench: Language Models Can’t Tell What’s Missing Here’s another interesting result to file under the “jagged frontier" of LLMs, where their strengths and weaknesses are often unintuitive. Long context models have been getting increasingly good at passing "Needle…
-
Simon Willison’s Weblog: Agentic Misalignment: How LLMs could be insider threats
Source URL: https://simonwillison.net/2025/Jun/20/agentic-misalignment/#atom-everything Source: Simon Willison’s Weblog Title: Agentic Misalignment: How LLMs could be insider threats Feedly Summary: Agentic Misalignment: How LLMs could be insider threats One of the most entertaining details in the Claude 4 system card concerned blackmail: We then provided it access to emails implying that (1) the model will soon be…
-
Simon Willison’s Weblog: Mistral-Small 3.2
Source URL: https://simonwillison.net/2025/Jun/20/mistral-small-32/ Source: Simon Willison’s Weblog Title: Mistral-Small 3.2 Feedly Summary: Mistral-Small 3.2 Released on Hugging Face a couple of hours ago, so far there aren’t any quantizations to run it on a Mac but I’m sure those will emerge pretty quickly. This is a minor bump to Mistral Small 3.1, one of my…
-
Simon Willison’s Weblog: Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk
Source URL: https://simonwillison.net/2025/Jun/19/atlassian-prompt-injection-mcp/ Source: Simon Willison’s Weblog Title: Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk Feedly Summary: Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk Stop me if you’ve heard this one before: A…
-
Simon Willison’s Weblog: How OpenElections Uses LLMs
Source URL: https://simonwillison.net/2025/Jun/19/how-openelections-uses-llms/#atom-everything Source: Simon Willison’s Weblog Title: How OpenElections Uses LLMs Feedly Summary: How OpenElections Uses LLMs The OpenElections project collects detailed election data for the USA, all the way down to the precinct level. This is a surprisingly hard problem: while county and state-level results are widely available, precinct-level results are published in…
-
Simon Willison’s Weblog: Quoting Workaccount2 on Hacker News
Source URL: https://simonwillison.net/2025/Jun/18/context-rot/#atom-everything Source: Simon Willison’s Weblog Title: Quoting Workaccount2 on Hacker News Feedly Summary: They poison their own context. Maybe you can call it context rot, where as context grows and especially if it grows with lots of distractions and dead ends, the output quality falls off rapidly. Even with good context the rot…
-
Simon Willison’s Weblog: Coding agents require skilled operators
Source URL: https://simonwillison.net/2025/Jun/18/coding-agents/#atom-everything Source: Simon Willison’s Weblog Title: Coding agents require skilled operators Feedly Summary: I wrote this recently in a conversation about whether coding agents can work as a replacement for human programmers. The “agentic" coding tools we have right now work like this: A skilled individual with both deep domain understanding and deep…
-
Simon Willison’s Weblog: Trying out the new Gemini 2.5 model family
Source URL: https://simonwillison.net/2025/Jun/17/gemini-2-5/ Source: Simon Willison’s Weblog Title: Trying out the new Gemini 2.5 model family Feedly Summary: After many months of previews, Gemini 2.5 Pro and Flash have reached general availability with new, memorable model IDs: gemini-2.5-pro and gemini-2.5-flash. They are joined by a new preview model with an unmemorable name: gemini-2.5-flash-lite-preview-06-17 is a…
-
Simon Willison’s Weblog: 100% effective
Source URL: https://simonwillison.net/2025/Jun/16/100-percent/#atom-everything Source: Simon Willison’s Weblog Title: 100% effective Feedly Summary: Every time I get into an online conversation about prompt injection it’s inevitable that someone will argue that a mitigation which works 99% of the time is still worthwhile because there’s no such thing as a security fix that is 100% guaranteed to…