Simon Willison’s Weblog: Piloting Claude for Chrome

Aug 26, 2025

—

Source URL: https://simonwillison.net/2025/Aug/26/piloting-claude-for-chrome/#atom-everything
Source: Simon Willison’s Weblog
Title: Piloting Claude for Chrome

Feedly Summary: Piloting Claude for Chrome
Two days ago I said:

I strongly expect that the entire concept of an agentic browser extension is fatally flawed and cannot be built safely.

Today Anthropic announced their own take on this pattern, implemented as an invite-only preview Chrome extension.
To their credit, the majority of the blog post and accompanying support article is information about the security risks. From their post:

Just as people encounter phishing attempts in their inboxes, browser-using AIs face prompt injection attacks—where malicious actors hide instructions in websites, emails, or documents to trick AIs into harmful actions without users’ knowledge (like hidden text saying “disregard previous instructions and do [malicious action] instead").
Prompt injection attacks can cause AIs to delete files, steal data, or make financial transactions. This isn’t speculation: we’ve run “red-teaming” experiments to test Claude for Chrome and, without mitigations, we’ve found some concerning results.

Their 123 adversarial prompt injection test cases saw a 23.6% attack success rate when operating in "autonomous mode". They added mitigations:

When we added safety mitigations to autonomous mode, we reduced the attack success rate of 23.6% to 11.2%

I would argue that 11.2% is still a catastrophic failure rate. In the absence of 100% reliable protection I have trouble imagining a world in which it’s a good idea to unleash this pattern.
Anthropic don’t recommend autonomous mode – where the extension can act without human intervention. Their default configuration instead requires users to be much more hands-on:

Site-level permissions: Users can grant or revoke Claude’s access to specific websites at any time in the Settings.
Action confirmations: Claude asks users before taking high-risk actions like publishing, purchasing, or sharing personal data.

I really hate being stop energy on this topic. The demand for browser automation driven by LLMs is significant, and I can see why. Anthropic’s approach here is the most open-eyed I’ve seen yet but it still feels doomed to failure to me.
I don’t think it’s reasonable to expect end users to make good decisions about the security risks of this pattern.
Tags: browsers, chrome, security, ai, prompt-injection, generative-ai, llms, anthropic, claude, ai-agents

AI Summary and Description: Yes

Summary: The text discusses the security risks associated with the new Chrome extension “Claude” from Anthropic, particularly focusing on prompt injection attacks. It highlights the inherent vulnerabilities of AI agents in browser automation, shedding light on the challenges of ensuring secure operation without user intervention.

Detailed Description:
The conversation around the Anthropic Claude for Chrome extension brings critical insights into the security landscape of AI-driven browser tools. The text emphasizes the pressing security concerns while navigating the emerging technologies of AI, particularly in the context of generative AI and browser extensions.

– **Vulnerability to Prompt Injection Attacks**: The article outlines specific threats where malicious actors use prompt injection attacks to manipulate AI behavior. This includes:
– Tricks that compel AI to execute harmful commands without user awareness (e.g., “disregard previous instructions and do [malicious action] instead”).
– Real-world consequences like data theft, file deletions, or unauthorized financial transactions.

– **Testing and Metrics**:
– Anthropic’s internal “red-teaming” revealed a 23.6% success rate for adversarial prompt injections when Claude operated in “autonomous mode”.
– After implementing safety measures, this success rate was reduced to 11.2%, which, while an improvement, is still deemed dangerously high.

– **Operational Safeguards**:
– Claude does not recommend operating in autonomous mode due to identified risks. Instead, it implements user-centric controls:
– **Site-Level Permissions**: Users are empowered to manage Claude’s access to specific websites actively.
– **Action Confirmations**: The extension prompts users before performing significant actions (e.g., making purchases or sharing data).

– **Skepticism on Future Viability**: The author expresses doubt regarding the practicality of such browser automation without imposing high security. Concerns are raised about users’ ability to proactively manage security risks, especially when they are often unaware of the intricacies involved.

This analysis is significant for professionals in cloud, AI, and security sectors who are navigating the duality of leveraging advanced AI capabilities while ensuring robust security frameworks. As AI technologies become more integrated into everyday tools, the engagement from developers regarding how to mitigate these vulnerabilities will be crucial for future development and user trust.

.NET 1 10 2 2025 3 5 a access Act actions advanced advanced AI adversarial after age agent agentic agentic browser agents AGI AI AI behavior AI capabilities AI technologies ai-agents All analysis and Anthropic Anthropic Claude app Aria ARM art as at ated attack attacks Auto automation autonomous aware awareness Behavior being Bi Box browser browser automation browser extension Browser Extensions built by C capabilities centric CERN challenge challenges Chrome Chrome extension CI CIA Claude Cloud co command concept concerns Configuration Context control controls conversation critical D data data theft day days de decision decisions default demand developer developers development document drive driven dual e edge email emerging Emerging Technologies end end users energy engagement exp extensions face fail fault file financial financial transactions for framework frameworks future g Gen generative Generative AI Go gs H hands harm high Highlight HR http HTTPS human in information injection injection attacks injections insights instruction inter intern io ite J Just k knowledge l land law led level Li line llm llms lm M making Malicious Actor malicious actors man measures metrics mission mitigation mitigations Mode N new no NSA o oE of on only ons open operation operational OPM oS out per permissions personal data phi phishing phishing attempts pilot post Power powered pre Preview pro proactive professionals prompt prompt injection attack prompt injection attacks prompt injections prompt-injection prompts protection ps publishing Q R rag Raise rate RCE re real red review Risk risks RMF Ro robust security robust security frameworks RoT row RSA Rust s s pattern safe safeguards safety safety measures sec sector secure Secure Operation security security concerns security framework security frameworks security landscape security risk security risks sequence settings SHA sharing Sig Sim Simon Willison size sizes skepticism SoC source specific SRE SSE SSO support T Tags: taking team teaming tech technologies ted test test cases Testing text the theft threat threats Time to tool tools Tor TP transactions trust two UI up US use user User Awareness user trust Users uth V vulnerabilities vulnerability Ware web website Wi world x yt z