The Register: Anubis guards gates against hordes of LLM bot crawlers

Jul 9, 2025

—

Source URL: https://www.theregister.com/2025/07/09/anubis_fighting_the_llm_hordes/
Source: The Register
Title: Anubis guards gates against hordes of LLM bot crawlers

Feedly Summary: Using proof of work to block the web-crawlers of ‘AI’ companies
Anubis is a sort of CAPTCHA test, but flipped: instead of checking visitors are human, it aims to make web crawling prohibitively expensive for companies trying to feed their hungry LLM bots.…

AI Summary and Description: Yes

Summary: The text discusses a novel approach to preventing unauthorized web crawling by AI companies utilizing a proof-of-work mechanism, termed Anubis. This concept is particularly relevant for professionals concerned about AI security and content protection against automated data mining.

Detailed Description: The core idea presented in the text centers around the innovative use of a proof-of-work system to deter web crawlers employed by AI companies, especially those utilizing large language models (LLMs). This method flips the traditional CAPTCHA approach, which typically verifies human visitors. Instead, it raises the cost for automated systems attempting to collect data from websites.

Key Points:
– **Anubis Mechanism**: Introduced as a CAPTCHA inversion, Anubis aims to increase the operational cost for AI bots, thereby reducing their ability to scrape information efficiently.
– **Defense Strategy**: By making web crawling expensive and resource-intensive, website owners can protect their content from being harvested indiscriminately by LLM-driven applications.
– **Significance for AI Security**: This technique represents a critical step in safeguarding proprietary data from AI tools that depend on extensive data inputs for training and operational purposes.

The implications of this approach suggest a more secure digital environment where content creators have more control over their assets, which is vital in light of the increasing reliance on AI technologies. It also opens discussions around the ethics of data usage and the protective measures websites can adopt against what may be viewed as data theft.

2 2025 5 7 a AI AI security AI technologies AI tool AI tools and Anubis app Application applications art as assets at ated Auto Automated Systems being Bi bots by C CAPTCHA centers CERN checking CI CIA co Col companies concept content content creators content protection control core cost crawler crawlers critical D data data input data mining data theft data usage de defense defense strategy digital digital environment drive driven driven applications e efficient end environment Ethics event exp fighting for g GIS git H http HTTPS human implications in information intensive io Iron ite k Key l language language model language models large large language model large language models Large Language Models (LLMs) led Li llm llms lm M making man measures mini Mode model models N no NPU o of on open operation operational cost opt oS out over point pre pro professionals proof proof of work proprietary proprietary data protection protective measures ps Q R Raise rate RCE red resource Ro RoT s safe sec secure security Sig source SSE Strategy system systems T tech technologies ted test text the theft to tool tools Tor TP training US usage use uth V version web web crawler web crawlers web crawling website x z