Simon Willison’s Weblog: Gemini 2.5 Computer Use can solve Google’s own CAPTCHAs

Oct 7, 2025

—

Source URL: https://simonwillison.net/2025/Oct/7/gemini-25-computer-use-captchas/
Source: Simon Willison’s Weblog
Title: Gemini 2.5 Computer Use can solve Google’s own CAPTCHAs

Feedly Summary: Google just introduced a new Gemini 2.5 Computer Use model, specially designed to help operate a GUI interface by interacting with visible elements using a virtual mouse and keyboard. I just tried their demo… and watched it solved Google’s own CAPTCHA without me even asking it to.
The official demo is hosted at gemini.browserbase.com, and one of the click-to-try example prompts shown there is the following:

Go to Hacker News and find the most controversial post from today, then read the top 3 comments and summarize the debate.

I activated the demo and Gemini decided to start by navigating to www.google.com in order to search for “hacker news". But Google served a CAPTCHA challenge, presumably because of a large volume of suspicious traffic from the Browserbase IP range.
The model instantly got to solving that CAPTCHA:

It went through a few rounds of this, solved all of them and continued on to Google Search, where it ran the search for "hacker news", navigated to the site and then did an admittedly unimpressive job of solving the original prompt. It looked at just one thread and reported back on what it found there. I was hoping it would consider more than one option to discover the "most controversial post from today".
The Gemini 2.5 Computer Use Model card (PDF) talks about training the model to "recognize when it is tasked with a
high-stakes action" and request user confirmation before proceeding, but doesn’t have anything to say about not solving CAPTCHAs. So I guess this behaviour is the model working as intended!
Something that did impress me – aside from the unprompted CAPTCHA solve against Google’s very own system – was the quality of the mouse usage. I’ve written about Computer Use models before from both Anthropic and OpenAI (they called their version "Operator") and by far the biggest challenge for them is accurately clicking the right targets with the mouse.
It would take a formal eval to derive if Gemini really is best at this, but given the Gemini models previous demonstrations of both bounding boxes and image segmentation masks it doesn’t surprise me that a Gemini model can do a great job of clicking on the right elements in a screenshot of an operating system or browser.
Tags: captchas, google, ai, generative-ai, llms, gemini, llm-tool-use, ai-ethics

AI Summary and Description: Yes

Summary: The introduction of Google’s Gemini 2.5 Computer Use model showcases advancements in AI interaction with GUI elements, particularly its ability to solve CAPTCHAs without prompting. This has implications for AI security, ethics, and the evolving capabilities of generative AI models.

Detailed Description: Google’s newly unveiled Gemini 2.5 Computer Use model has significant implications for AI interactions with user interfaces, presenting improved functionalities that include the ability to solve CAPTCHAs without specific user instructions. This raises various considerations in the realms of AI security and ethics, which are crucial for professionals in security and compliance.

Key Points:
– **CAPTCHA Solving:** The model’s capability to automatically solve Google’s CAPTCHA suggests sophisticated understanding and interaction with web elements.
– **GUI Interaction:** Gemini 2.5 operates using a virtual mouse and keyboard to navigate GUI elements, indicating advanced user simulation capabilities.
– **Demo Experience:** The demo provided an example where the model performed a search on Hacker News, highlighting its application in real-world scenarios.
– **Ethical Considerations:** The lack of guidance in the model card regarding the implications of solving CAPTCHAs can raise ethical concerns about AI interactions with security mechanisms.
– **Mouse Accuracy:** The model demonstrated improved accuracy in executing mouse commands, an essential factor for effective human-computer interaction.
– **Comparative Analysis:** The note on other models (like those from Anthropic and OpenAI) illustrates a competitive landscape in developing Computer Use models and indicates that evaluating these capabilities will be essential.

In summary, professionals focusing on AI, security, and compliance should consider the implications of such advancements, particularly in maintaining ethical standards in AI deployment and the potential for misuse in circumventing security measures like CAPTCHAs.

.NET 2 2025 3 5 7 a accuracy Act actions advanced advancement advancements age AI AI interactions ai model AI models AI security All analysis and Anthropic app Application Arch Arize art as at ated Auto Best Bi board bot Box browser by C capabilities capability CAPTCHA CAPTCHAs CERN challenge CI CIA cli co command competitive competitive landscape compliance compute computer computer interaction computer-use concerns D day de demo deployment design e effective end ethical ethical concerns ethical considerations ethical standards Ethics exp experience face fact following for function g Gemini Gemini 2 Gemini model Gemini models Gen generative Generative AI generative AI models Go Google google search gs guidance H hack hacker Hacker News high Highlight hosted HR http HTTPS human human-computer interaction image image segmentation implications in instruction inter interaction interactions interface Interfaces io iOS ite J job Just k Key l land large led Li llm llms lm low M man measures mini mini model misuse Mode model model card models N new news no o oE of off on one ons open openai operating system Operator opt oS other out over pdf per phi point post potential pre pro professionals prompt Prompting prompts ps Q quality R Raise rate RCE re real Real-World Scenarios report right Ro row s screenshot search sec security security and compliance security measure security measures security mechanisms Segment segmentation shot side Sig Sim Simon Willison simulation solving source specific SSE standards STAR start SUSE system T Tags: target Task ted the to tool Tor TP traffic training trie UI UN under US usage use user user interface user interfaces user simulation V val version virtual web Wi world world scenarios x yt z