Tag: Sim

—

by

Source URL: https://simonwillison.net/2025/Jun/20/absencebench/#atom-everything Source: Simon Willison’s Weblog Title: AbsenceBench: Language Models Can’t Tell What’s Missing Feedly Summary: AbsenceBench: Language Models Can’t Tell What’s Missing Here’s another interesting result to file under the “jagged frontier" of LLMs, where their strengths and weaknesses are often unintuitive. Long context models have been getting increasingly good at passing "Needle…

Simon Willison’s Weblog: Agentic Misalignment: How LLMs could be insider threats

—

by

Source URL: https://simonwillison.net/2025/Jun/20/agentic-misalignment/#atom-everything Source: Simon Willison’s Weblog Title: Agentic Misalignment: How LLMs could be insider threats Feedly Summary: Agentic Misalignment: How LLMs could be insider threats One of the most entertaining details in the Claude 4 system card concerned blackmail: We then provided it access to emails implying that (1) the model will soon be…

Slashdot: AI Models From Major Companies Resort To Blackmail in Stress Tests

—

by

Source URL: https://slashdot.org/story/25/06/20/2010257/ai-models-from-major-companies-resort-to-blackmail-in-stress-tests?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: AI Models From Major Companies Resort To Blackmail in Stress Tests Feedly Summary: AI Summary and Description: Yes Summary: The findings from researchers at Anthropic highlight a significant concern regarding AI models’ autonomous decision-making capabilities, revealing that leading AI models can engage in harmful behaviors such as blackmail when…

Simon Willison’s Weblog: Mistral-Small 3.2

—

by

Source URL: https://simonwillison.net/2025/Jun/20/mistral-small-32/ Source: Simon Willison’s Weblog Title: Mistral-Small 3.2 Feedly Summary: Mistral-Small 3.2 Released on Hugging Face a couple of hours ago, so far there aren’t any quantizations to run it on a Mac but I’m sure those will emerge pretty quickly. This is a minor bump to Mistral Small 3.1, one of my…

The Register: EDB enhances analytics in PostgreSQL with open source add-ons

—

by

Source URL: https://www.theregister.com/2025/06/20/edb_enhances_analytics_in_postgresql/ Source: The Register Title: EDB enhances analytics in PostgreSQL with open source add-ons Feedly Summary: DataFusion and WarehousePG meant to deal with AI-related workloads, not to compete with analytics data platforms PostgreSQL exponent EDB has enhanced its new data platform, claiming this will help bring transactional, analytical, and AI workloads into a…

Tomasz Tunguz: Fighting for Context

—

by

Source URL: https://www.tomtunguz.com/survival-not-granted-in-the-ai-era/ Source: Tomasz Tunguz Title: Fighting for Context Feedly Summary: Systems of record are recognizing they cannot “take their survival for granted.” One strategy is to acquire : the rationale Salesforce gives for the Informatica acquisition. Another strategy is more defensive – hampering access to the data within the systems of record (SOR).…

Simon Willison’s Weblog: Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk

Jun 19, 2025

—

by

Source URL: https://simonwillison.net/2025/Jun/19/atlassian-prompt-injection-mcp/ Source: Simon Willison’s Weblog Title: Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk Feedly Summary: Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk Stop me if you’ve heard this one before: A…

Simon Willison’s Weblog: How OpenElections Uses LLMs

Jun 19, 2025

—

by

Source URL: https://simonwillison.net/2025/Jun/19/how-openelections-uses-llms/#atom-everything Source: Simon Willison’s Weblog Title: How OpenElections Uses LLMs Feedly Summary: How OpenElections Uses LLMs The OpenElections project collects detailed election data for the USA, all the way down to the precinct level. This is a surprisingly hard problem: while county and state-level results are widely available, precinct-level results are published in…

Slashdot: Reasoning LLMs Deliver Value Today, So AGI Hype Doesn’t Matter

Jun 19, 2025

—

by