Simon Willison’s Weblog: Watching o3 guess a photo’s location is surreal, dystopian and wildly entertaining

Apr 26, 2025

—

Source URL: https://simonwillison.net/2025/Apr/26/o3-photo-locations/
Source: Simon Willison’s Weblog
Title: Watching o3 guess a photo’s location is surreal, dystopian and wildly entertaining

Feedly Summary: Watching OpenAI’s new o3 model guess where a photo was taken is one of those moments where decades of science fiction suddenly come to life. It’s a cross between the Enhance Button and Omniscient Database TV Tropes.
The process is simple. Snap a photograph of somewhere, ideally without putting any obvious landmarks in there. Feed it to ChatGPT o3 (or o4-mini) and prompt “Guess where this photo was taken". Let it churn.
(I found I needed to turn off all of ChatGPT’s memory features first or it would cheat and use context from my previous conversations.)
I gave it this photo, taken while sat at an outside bar near my home in El Granada, California. I think it’s a pretty solid challenge: not many identifying features, just a bit of road, an anonymous house and some hills in the background:

… and off it went.
I’ve shared the full transcript here – be sure to expand the "Thought for 6m 48s" link to see the full details of how it attacked the problem.
It starts with a very weird mistake: it briefly pretends that it can’t see the image at all:

The user uploaded an image and is asking me to guess the location. First, I need to analyze the image to identify clues about where it was taken. Since I can’t actually see images, I’ll rely on the metadata. I see the resolution is 1024×768, and I’ll consider any textual information if available for clues. Unfortunately, I don’t have access to the image visually, so I’ll proceed with this approach.

Then it snaps out of that and starts using its vision abilities:

The image shows a residential neighborhood with small houses, one of which is light green with a white picket fence and a grey roof. The fire pit and signposts hint at a restaurant or cafe, possibly near the coast. The environment, with olive trees and California poppies, suggests a coastal California location, perhaps Central Coast like Cambria or Morro Bay. The pastel-colored houses and the hills in the background resemble areas like Big Sur. A license plate could offer more, but it’s hard to read.

Now here’s where it leaps straight into science fiction:

Alright, I’m going to zoom into the license plate to try to confirm the state, as it could help identify the location more precisely. The plate might give me clues about where this photo was taken, especially if it’s from a recognizable state. Let’s see if that’s visible enough to be useful!

It runs some Python code!

# crop license plate area
w, h = img.size
# approximate area: roughly center x maybe 1700? Actually width is likely > 3024? Let’s print size
print(img.size)

(3456, 4608)

Frustratingly the shared conversation doesn’t include most of the other Python it used (the sharing feature is still quite buggy). Here’s a screenshot of what happened next:

It kept running like this for six more minutes – cropping and zooming to an area of the photo, musing about what it could see, comparing it to previous observations and then cropping and zooming and analyzing again.
To cut straight to the end, here’s the final answer it came up with:

I’m getting strong Central-Coast-California vibes from this scene:

Clue
What it suggests

California-style license plate on the SUV
We’re almost certainly in CA

Masses of blooming California poppies, an olive tree, and a cool, marine-layer sky
Typical coastal‐central-CA vegetation and weather

Low pastel beach cottages with white picket fences and a rope-and-post patio rail
Common architecture in the little beach towns between Santa Cruz and Pismo

Rolling green coastal range right behind town (no tall peaks, no snow)
Fits the Santa Lucia / Santa Cruz coastal hills

Put together, that points to one of the small Highway 1 villages. The mix of cottages and the immediate backdrop of hills feels most like Cambria, California (just north of Morro Bay and Cayucos). I’d guess you snapped this from a café patio in Cambria’s West Village, a block or two in from Moonstone Beach Drive.
(If I’m off, my next guess would be Half Moon Bay–El Granada further north—but Cambria is my first instinct!)

How did it do? Central Coast California was exactly right. Cambria, California was about 200 miles off… but the "my next guess" was spot on! The photo was indeed taken in El Granada, a tiny town just north of Half Moon Bay.
(I’m confident it didn’t cheat and look at the EXIF data on the photograph, because if it had cheated it wouldn’t have guessed Cambria first.)
o3 certainly isn’t the only model that can do this: I’ve tried similar things with Claude 3.5 and 3.7 Sonnet and been impressed by the results there as well, although they don’t have that extravagant ability to "zoom".
Here’s Claude 3.7 Sonnet "extended thinking" guessing "a small to mid-sized California coastal town". I tried Gemini but it cheated and said "Given that my current location context is El Granada, California…" – so I tried Gemini 2.5 Pro via their API and got "Cayucos, California".
What’s different here with o3, as with search, is that the tool usage is integrated into the "thinking" phase of the response.
Tools that can be used as part of that dedicated chain-of-thought sequence are an astonishingly powerful new pattern for these models. I expect we’ll see this from other vendors soon.
What to make of this?
First, this is really fun. Watching the model’s thought process as it churns through the photo, pans and zooms and discusses different theories about where it could be is wildly entertaining. It’s like living in an episode of CSI.
It’s also deeply dystopian. Technology can identify locations from photographs now. It’s vitally important that people understand how easy this is – if you have any reason at all to be concerned about your safety, you need to know that any photo you share – even a photo as bland as my example above – could be used to identify your location.
As is frequently the case with modern AI, the fact that this technology is openly available to almost anyone has negative and positive implications. As with image generation, it’s important that people can see what this stuff can do first hand. Seeing this in action is a visceral lesson in what’s now possible.
Tags: ai-ethics, vision-llms, generative-ai, o3, ai, llms

AI Summary and Description: Yes

Summary: The text discusses OpenAI’s new o3 model’s ability to analyze photographs and guess their location using machine learning techniques, showcasing the integration of vision capabilities and tools in a thought process. This presents implications for both fun tech exploration and concerns about privacy and security.

Detailed Description:

The provided text highlights OpenAI’s o3 model, illustrating its novel capacity to guess the location of a photograph by analyzing visual elements and metadata. The discussion emphasizes both the innovative aspects of the model’s functionality and the ethical considerations tied to its use. Here are the major points drawn from the content:

– **Innovative Technology**: The o3 model combines advanced image processing with natural language capabilities, allowing it to analyze and deduce possible locations from photographs.
– **Process Description**:
– Users can input a photo without prominent identifying landmarks.
– The model attempts to analyze the image for contextual clues such as vegetation, architecture, and environmental characteristics.
– **Limitations and Errors**: Initially, the model fails to “see” the image, relying instead on metadata, a humorous yet insightful commentary on AI’s capabilities and limitations.
– **Visual Analysis**: After its initial error, the model showcases its strengths by hypothesizing about the location based on various aspects of the image:
– Identifying features such as house styles, local vegetation (like olive trees and California poppies), and atmospheric conditions hinting at a coastal Californian setting.
– **Integration of Tools**: The model runs Python code to aid its analysis, a notable feature that integrates external computational tools into its reasoning process, illustrating potential advancements in AI’s problem-solving methods.
– **Final Location Guess**: The model suggests a location based on a combination of its analyses, displaying the ability to utilize multiple data points for a coherent conclusion.
– **Ethical Considerations**:
– The text raises concerns about privacy and security, noting that such technology could easily track users and identify locations from seemingly innocuous photos.
– It emphasizes the paradox of making this technology broadly accessible while also calling for awareness of the implications on personal safety.
– **Future Implications**: The demonstration of such capabilities points to a growing trend where similar models from different vendors may adopt integrated tool usage, enhancing their analytical ability and posing further ethical questions.

This comprehensive exploration serves as a fascinating glimpse into the potentials of AI’s image-processing capabilities, while also serving as a cautionary tale about the rapid advances in technology that may compromise personal privacy and security.

.NET 1 10 2 2025 24 3 4 5 5 Pro 7 7 Sonnet a access Act advancement advancements after AI alt analysis and API app Arch architecture art as attack aware awareness based Bug by C California calling capabilities capacity caution CERN chain chat ChatGPT CI CIA Claude Claude 3.5 co code cohere Col computation concerns Condi content Context conversation CoT cross Current D data database de deep demo drive e E 3 end Entra environment environmental error errors ethical ethical considerations Ethics exp exploration eXtended External fact fail feature features first for full function functionality future future implications g Gemini Gemini 2 Gen generation generative Go GPT graph gs H high Highlight HR http HTTPS image image generation image processing implications in information Instinct integration Iron IRS ite J Just k l land language learning learning techniques led Li life limitations Link Living llm llms lm local low M mac machine Machine Learning machine learning techniques making man mass media memory memory feature Meta metadata Mila mini Mode model models Modern multi my N nation natural language next no NoC non NPU o o3 o3 model, oE of off on one only open openai opt ory out personal privacy play point post potential Power pre privacy problem problem-solving process processing prompt Py Python Python code Q question R rack rate RCE real reasoning reasoning process red resident resolution response right Ro RSA Rust s safe safety science screenshot search sec security sequence SHA sharing shot side Sig Sim Simple size Snap solid solving source SSE SSO start state T Tags: Tails tech techniques technology text the Thought to tool tools town TP tree trie turn two UI under up US usage use user Users uv V vendor vendors vibe Vision vision-llms Ware web Well white Wi x Zoom