Simon Willison’s Weblog: ChatGPT agent’s user-agent

Source URL: https://simonwillison.net/2025/Aug/4/chatgpt-agents-user-agent/#atom-everything
Source: Simon Willison’s Weblog
Title: ChatGPT agent’s user-agent

Feedly Summary: I was exploring how ChatGPT agent works today. I learned some interesting things about how it exposes its identity through HTTP headers, then made a huge blunder in thinking it was leaking its URLs to Bingbot and Yandex… but it turned out that was a Cloudflare feature that had nothing to do with ChatGPT.
ChatGPT agent is the recently released (and confusingly named) ChatGPT feature that provides browser automation combined with terminal access as a feature of ChatGPT – replacing their previous Operator research preview which is scheduled for deprecation on August 31st.
Investigating ChatGPT agent’s user-agent
I decided to dig into how it works by creating a logged web URL endpoint using django-http-debug. Then I told ChatGPT agent mode to explore that new page:

My logging captured these request headers:
Via: 1.1 heroku-router
Host: simonwillison.net
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Cf-Ray: 96a0f289adcb8e8e-SEA
Cookie: cf_clearance=zzV8W…
Server: Heroku
Cdn-Loop: cloudflare; loops=1
Priority: u=0, i
Sec-Ch-Ua: “Not)A;Brand";v="8", "Chromium";v="138"
Signature: sig1=:1AxfqHocTf693inKKMQ7NRoHoWAZ9d/vY4D/FO0+MqdFBy0HEH3ZIRv1c3hyiTrzCvquqDC8eYl1ojcPYOSpCQ==:
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36
Cf-Ipcountry: US
X-Request-Id: 45ef5be4-ead3-99d5-f018-13c4a55864d3
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Accept-Encoding: gzip, br
Accept-Language: en-US,en;q=0.9
Signature-Agent: "https://chatgpt.com"
Signature-Input: sig1=("@authority" "@method" "@path" "signature-agent");created=1754340838;keyid="otMqcjr17mGyruktGvJU8oojQTSMHlVm7uO-lrcqbdg";expires=1754344438;nonce="_8jbGwfLcgt_vUeiZQdWvfyIeh9FmlthEXElL-O2Rq5zydBYWivw4R3sV9PV-zGwZ2OEGr3T2Pmeo2NzmboMeQ";tag="web-bot-auth";alg="ed25519"
X-Forwarded-For: 2a09:bac5:665f:1541::21e:154, 172.71.147.183
X-Request-Start: 1754340840059
Cf-Connecting-Ip: 2a09:bac5:665f:1541::21e:154
Sec-Ch-Ua-Mobile: ?0
X-Forwarded-Port: 80
X-Forwarded-Proto: http
Sec-Ch-Ua-Platform: "Linux"
Upgrade-Insecure-Requests: 1

That Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36 user-agent header is the one used by the most recent Chrome on macOS – which is a little odd here as the Sec-Ch-Ua-Platform : "Linux" indicates that the agent browser runs on Linux.
At first glance it looks like ChatGPT is being dishonest here by not including its bot identity in the user-agent header. I thought for a moment it might be reflecting my own user-agent, but I’m using Firefox on macOS and it identified itself as Chrome.
Then I spotted this header:
Signature-Agent: "https://chatgpt.com"

Which is accompanied by a much more complex header called Signature-Input:
Signature-Input: sig1=("@authority" "@method" "@path" "signature-agent");created=1754340838;keyid="otMqcjr17mGyruktGvJU8oojQTSMHlVm7uO-lrcqbdg";expires=1754344438;nonce="_8jbGwfLcgt_vUeiZQdWvfyIeh9FmlthEXElL-O2Rq5zydBYWivw4R3sV9PV-zGwZ2OEGr3T2Pmeo2NzmboMeQ";tag="web-bot-auth";alg="ed25519"

And a Signature header too.
These turn out to come from a relatively new web standard: RFC 9421 HTTP Message Signatures’ published February 2024.
The purpose of HTTP Message Signatures is to allow clients to include signed data about their request in a way that cannot be tampered with by intermediaries. The signature uses a public key that’s provided by the following well-known endpoint:
https://chatgpt.com/.well-known/http-message-signatures-directory

Add it all together and we now have a rock-solid way to identify traffic from ChatGPT agent: look for the Signature-Agent: "https://chatgpt.com" header and confirm its value by checking the signature in the Signature-Input and Signature headers.
And then came Bingbot and Yandex
Just over a minute after it captured that request, my logging endpoint got another request:
Via: 1.1 heroku-router
From: bingbot(at)microsoft.com
Host: simonwillison.net
Accept: */*
Cf-Ray: 96a0f4671d1fc3c6-SEA
Server: Heroku
Cdn-Loop: cloudflare; loops=1
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
Cf-Ipcountry: US
X-Request-Id: 6214f5dc-a4ea-5390-1beb-f2d26eac5d01
Accept-Encoding: gzip, br
X-Forwarded-For: 207.46.13.9, 172.71.150.252
X-Request-Start: 1754340916429
Cf-Connecting-Ip: 207.46.13.9
X-Forwarded-Port: 80
X-Forwarded-Proto: http

I pasted 207.46.13.9 into Microsoft’s Verify Bingbot tool (after solving a particularly taxing CAPTCHA) and it confirmed that this was indeed a request from Bingbot.
I set up a second URL to confirm… and this time got a visit from Yandex!
Via: 1.1 heroku-router
From: support@search.yandex.ru
Host: simonwillison.net
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Cf-Ray: 96a16390d8f6f3a7-DME
Server: Heroku
Cdn-Loop: cloudflare; loops=1
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Cf-Ipcountry: RU
X-Request-Id: 3cdcbdba-f629-0d29-b453-61644da43c6c
Accept-Encoding: gzip, br
X-Forwarded-For: 213.180.203.138, 172.71.184.65
X-Request-Start: 1754345469921
Cf-Connecting-Ip: 213.180.203.138
X-Forwarded-Port: 80
X-Forwarded-Proto: http

Yanex suggest a reverse DNS lookup to verify, so I ran this command:
dig -x 213.180.203.138 +short

And got back:
213-180-203-138.spider.yandex.com.

Which confirms that this is indeed a Yandex crawler.
I tried a third experiment to be sure… and got hits from both Bingbot and YandexBot.
It was Cloudflare Crawler Hints, not ChatGPT
So I wrote up and posted about my discovery… and Jatan Loya asked:
do you have crawler hints enabled in cf?
And yeah, it turned out I did. I spotted this in my caching configuration page (and it looks like I must have turned it on myself at some point in the past):

Here’s the Cloudflare documentation for that feature.
I deleted my posts on Twitter and Bluesky (since you can’t edit those and I didn’t want the misinformation to continue to spread) and edited my post on Mastodon, then updated this entry with the real reason this had happened.
I also changed the URL of this entry as it turned out Twitter and Bluesky were caching my social media preview for the previous one, which included the incorrect information in the title.

Here’s a section of my original post with my theories about what was going on before learning about Cloudflare Crawler Hints.
So what’s going on here?
There are quite a few different moving parts here.

I’m using Firefox on macOS with the 1Password and Readwise Highlighter extensions installed and active. Since I didn’t visit the debug pages at all with my own browser I don’t think any of these are relevant to these results.
ChatGPT agent makes just a single request to my debug URL …
… which is proxied through both Cloudflare and Heroku.
Within about a minute, I get hits from one or both of Bingbot and Yandex.

Presumably ChatGPT agent itself is running behind at least one proxy – I would expect OpenAI to keep a close eye on that traffic to ensure it doesn’t get abused.
I’m guessing that infrastructure is hosted by Microsoft Azure. The OpenAI Sub-processor List – though that lists Microsoft Corporation, CoreWeave Inc, Oracle Cloud Platform and Google Cloud Platform under the "Cloud infrastructure" section so it could be any of those.
Since the page is served over HTTPS my guess is that any intermediary proxies should be unable to see the path component of the URL, making the mystery of how Bingbot and Yandex saw the URL even more intriguing.

Tags: bing, privacy, search-engines, user-agents, ai, cloudflare, generative-ai, chatgpt, llms

AI Summary and Description: Yes

Summary: The text provides an in-depth exploration of the ChatGPT agent, its HTTP headers, and a misconception about URL leakage that was clarified to be due to a Cloudflare feature. This analysis highlights how traffic from ChatGPT can be verified using the new HTTP Message Signatures standard, emphasizing the importance of transparency in AI agent operations and cloud infrastructure interactions.

Detailed Description:
The text outlines the author’s investigation into the functioning and security of the ChatGPT agent, particularly focusing on its HTTP headers and user-agent behavior. This exploration has crucial implications for professionals in AI security and cloud infrastructure, showcasing the intersection of web standards, AI applications, and the verification of bot traffic.

– **ChatGPT Agent Overview**:
– The ChatGPT agent is a newly released feature enabling browser automation and terminal access, evolving from a previous research preview.

– **Investigation Highlights**:
– The author created a web URL endpoint to analyze traffic and logged several HTTP request headers from the ChatGPT agent.
– Key headers included “Signature-Agent” and “Signature-Input”, integral to verifying the agent’s identity following the RFC 9421 HTTP Message Signatures standard.

– **Misunderstanding with Crawler Requests**:
– Initially, the author believed that the ChatGPT bot was leaking URLs visible to Bingbot and Yandex due to similarities in HTTP headers.
– Upon further investigation, it was determined that Cloudflare’s Crawler Hints feature was responsible for automated traffic from these search engine bots, not a flaw in the ChatGPT agent.

– **Verification Technique**:
– The text discusses how to verify requests from the ChatGPT agent by checking for specific signature headers, ensuring that the system’s security protocols are robust against impersonation by unauthorized bots.

– **Technical Insights**:
– The analysis brings attention to potential security concerns related to HTTPS connections and proxy use, emphasizing the need for proper configuration of cloud services and understanding of intermediary behavior.

– **Conclusion and Lessons Learned**:
– The findings reiterate the importance of correct identification and verification of bot traffic in cloud environments. The text encourages rigorous investigation of security features to clarify misconceptions about underlying technology, ensuring accuracy in public commentary.

This write-up presents significant insights for security and compliance professionals, advocating for diligence in monitoring AI-driven applications and understanding the infrastructure that supports them.