Simon Willison’s Weblog: Claude API: Web fetch tool

Sep 10, 2025

—

Source URL: https://simonwillison.net/2025/Sep/10/claude-web-fetch-tool/#atom-everything
Source: Simon Willison’s Weblog
Title: Claude API: Web fetch tool

Feedly Summary: Claude API: Web fetch tool
New in the Claude API: if you pass the web-fetch-2025-09-10 beta header you can add {“type": "web_fetch_20250910", "name": "web_fetch", "max_uses": 5} to your "tools" list and Claude will gain the ability to fetch content from URLs as part of responding to your prompt.
What’s particularly interesting here is their approach to safety for this feature:

Enabling the web fetch tool in environments where Claude processes untrusted input alongside sensitive data poses data exfiltration risks. We recommend only using this tool in trusted environments or when handling non-sensitive data.
To minimize exfiltration risks, Claude is not allowed to dynamically construct URLs. Claude can only fetch URLs that have been explicitly provided by the user or that come from previous web search or web fetch results. However, there is still residual risk that should be carefully considered when using this tool.

My first impression was that this looked like an interesting new twist on this kind of tool. Prompt injection exfiltration attacks are a risk with something like this because malicious instructions that sneak into the context might cause the LLM to send private data off to an arbitrary attacker’s URL, as described by the lethal trifecta. But what if you could enforce, in the LLM harness itself, that only URLs from user prompts could be accessed in this way?
Unfortunately this isn’t quite that smart. From later in that document:

For security reasons, the web fetch tool can only fetch URLs that have previously appeared in the conversation context. This includes:

URLs in user messages
URLs in client-side tool results
URLs from previous web search or web fetch results

The tool cannot fetch arbitrary URLs that Claude generates or URLs from container-based server tools (Code Execution, Bash, etc.).

Note that URLs in "user messages" are obeyed. That’s a problem, because in many prompt-injection vulnerable applications it’s those user messages (the JSON in the {"role": "user", "content": "…"} block) that often have untrusted content concatenated into them – or sometimes in the client-side tool results which are also allowed by this system!
Anthropic do provide a much stronger mechanism here: you can allow-list domains using the "allowed_domains": ["docs.example.com"] parameter.
Provided you use allowed_domains and restrict them to domains which absolutely cannot be used for exfiltrating data (which turns out to be a tricky proposition) it should be possible to safely build some really neat things on top of this new tool.
Tags: apis, security, ai, prompt-injection, generative-ai, llms, claude, exfiltration-attacks, llm-tool-use, lethal-trifecta

AI Summary and Description: Yes

Summary: The introduction of the web-fetch tool in the Claude API illustrates a significant advancement in AI capabilities, particularly for accessing external information. However, this feature raises substantial security concerns regarding data exfiltration and prompt injection risks, necessitating the use of strict safeguards to mitigate potential threats.

Detailed Description: The Claude API’s new web-fetch tool allows users to fetch content from URLs as part of AI responses, which introduces both opportunities and risks.

– **Feature Overview**:
– The web-fetch tool can be activated using a specific beta header, allowing the inclusion of structured parameters that define its function.
– This capability allows Claude to provide more contextual and up-to-date information directly from the web within its conversations with users.

– **Security Considerations**:
– **Data Exfiltration Risks**: Enabling this tool in environments with untrusted inputs or sensitive data poses serious risks. Malicious actors could exploit vulnerabilities to extract private information by tricking the model into sending it to an attacker-controlled URL.
– **Controlled URL Fetching**: To mitigate these risks, the web-fetch tool can only retrieve URLs that are explicitly provided by users, derived from the conversation context, or obtained through previous outputs from the web-search or web-fetch functionalities.
– **Prompt Injection Concern**: There is a heightened risk of prompt injection attacks where attackers could embed malicious instructions in user messages. Since user inputs are respected by the fetching mechanism, this vulnerability needs careful monitoring.

– **Recommended Safeguards**:
– While Claude does restrict fetching arbitrary URLs, its reliance on potentially unsafe user-generated content remains a concern. Anthropic provides a stronger security feature through the allow-listing of domains, meaning only specified, safe domains can be accessed by the tool.
– Implementing allow-lists effectively can help in constructing secure applications on top of the web-fetch functionality, but identifying truly safe domains poses significant challenges.

– **Implications**:
– Security and compliance professionals should carefully evaluate the integration of such features within their systems. This includes assessing the level of control they have over allowed URLs and the associated risk of prompt injections.
– Organizations must devise strategies to ensure the safe usage of advanced AI tools while maintaining data privacy and security prerequisites.

In conclusion, while the web-fetch tool in the Claude API opens up new avenues for AI, it necessitates diligent attention to security frameworks to avert potential threats related to data integrity and confidentiality.

.NET 1 10 2 2025 5 a access Act advanced advanced AI advancement age AI AI capabilities AI tool AI tools All allow and and Risk Anthropic anti API APIs app Application applications Arch art as at ated attack attacker attackers attacks based bash Bi bot by C capabilities capability CERN challenge challenges CI CIA Claude cli client co code code execution compliance compliance professionals concerns confidentiality container content Context control conversation D data data exfiltration data integrity data privacy de DeFi document domain domains dual e effective ELF end environment environments execution exfiltration exp exploit External feature features fetch fine first for framework frameworks full function functionality g Gen generated Generated Content generative gs H handling HR http HTTPS implications in Inclusion information injection injection attacks injections instruction integration integrity inter io Iron IRS ite J js json k l led lethal lethal trifecta level Li listing llm llms lm Lock long low M Malicious Actor malicious actors malicious instructions man max mean mini Mode model Monitor monitoring my N NCA needs new no non NPU NSA o oE of off on only ons open organization organizations oS oss out output Outputs over parameter potential pre privacy Private Data pro problem process processes professionals prompt prompt injection attack prompt injection attacks prompt injections prompt-injection prompts ps Q R Raise rate RCE re real red response responses Risk risks Ro Role RSA Rust s safe safeguards safety search sec secure secure applications security security and compliance security concerns security considerations security framework security frameworks self sensitive data server side Sig Sim Simon Willison SoC source specific SSE SSO strategies structured system systems T Tags: ted text the threat threats Time times to tool tools Tor TP trie trifecta trust turn type UI untrusted content up US usage use user user inputs user prompts user-generated content Users V val vulnerabilities vulnerability web web search Wi x yt z