Source URL: https://simonwillison.net/2025/Aug/26/piloting-claude-for-chrome/#atom-everything
Source: Simon Willison’s Weblog
Title: Piloting Claude for Chrome
Feedly Summary: Piloting Claude for Chrome
Two days ago I said:
I strongly expect that the entire concept of an agentic browser extension is fatally flawed and cannot be built safely.
Today Anthropic announced their own take on this pattern, implemented as an invite-only preview Chrome extension.
To their credit, the majority of the blog post and accompanying support article is information about the security risks. From their post:
Just as people encounter phishing attempts in their inboxes, browser-using AIs face prompt injection attacks—where malicious actors hide instructions in websites, emails, or documents to trick AIs into harmful actions without users’ knowledge (like hidden text saying “disregard previous instructions and do [malicious action] instead").
Prompt injection attacks can cause AIs to delete files, steal data, or make financial transactions. This isn’t speculation: we’ve run “red-teaming” experiments to test Claude for Chrome and, without mitigations, we’ve found some concerning results.
Their 123 adversarial prompt injection test cases saw a 23.6% attack success rate when operating in "autonomous mode". They added mitigations:
When we added safety mitigations to autonomous mode, we reduced the attack success rate of 23.6% to 11.2%
I would argue that 11.2% is still a catastrophic failure rate. In the absence of 100% reliable protection I have trouble imagining a world in which it’s a good idea to unleash this pattern.
Anthropic don’t recommend autonomous mode – where the extension can act without human intervention. Their default configuration instead requires users to be much more hands-on:
Site-level permissions: Users can grant or revoke Claude’s access to specific websites at any time in the Settings.
Action confirmations: Claude asks users before taking high-risk actions like publishing, purchasing, or sharing personal data.
I really hate being stop energy on this topic. The demand for browser automation driven by LLMs is significant, and I can see why. Anthropic’s approach here is the most open-eyed I’ve seen yet but it still feels doomed to failure to me.
I don’t think it’s reasonable to expect end users to make good decisions about the security risks of this pattern.
Tags: browsers, chrome, security, ai, prompt-injection, generative-ai, llms, anthropic, claude, ai-agents
AI Summary and Description: Yes
Summary: The text discusses the security risks associated with the new Chrome extension “Claude” from Anthropic, particularly focusing on prompt injection attacks. It highlights the inherent vulnerabilities of AI agents in browser automation, shedding light on the challenges of ensuring secure operation without user intervention.
Detailed Description:
The conversation around the Anthropic Claude for Chrome extension brings critical insights into the security landscape of AI-driven browser tools. The text emphasizes the pressing security concerns while navigating the emerging technologies of AI, particularly in the context of generative AI and browser extensions.
– **Vulnerability to Prompt Injection Attacks**: The article outlines specific threats where malicious actors use prompt injection attacks to manipulate AI behavior. This includes:
– Tricks that compel AI to execute harmful commands without user awareness (e.g., “disregard previous instructions and do [malicious action] instead”).
– Real-world consequences like data theft, file deletions, or unauthorized financial transactions.
– **Testing and Metrics**:
– Anthropic’s internal “red-teaming” revealed a 23.6% success rate for adversarial prompt injections when Claude operated in “autonomous mode”.
– After implementing safety measures, this success rate was reduced to 11.2%, which, while an improvement, is still deemed dangerously high.
– **Operational Safeguards**:
– Claude does not recommend operating in autonomous mode due to identified risks. Instead, it implements user-centric controls:
– **Site-Level Permissions**: Users are empowered to manage Claude’s access to specific websites actively.
– **Action Confirmations**: The extension prompts users before performing significant actions (e.g., making purchases or sharing data).
– **Skepticism on Future Viability**: The author expresses doubt regarding the practicality of such browser automation without imposing high security. Concerns are raised about users’ ability to proactively manage security risks, especially when they are often unaware of the intricacies involved.
This analysis is significant for professionals in cloud, AI, and security sectors who are navigating the duality of leveraging advanced AI capabilities while ensuring robust security frameworks. As AI technologies become more integrated into everyday tools, the engagement from developers regarding how to mitigate these vulnerabilities will be crucial for future development and user trust.