The Register: Perplexity AI accused of scraping content against websites’ will with unlisted IP ranges

Aug 4, 2025

—

Source URL: https://www.theregister.com/2025/08/04/perplexity_ai_crawlers_accused_data_raids/
Source: The Register
Title: Perplexity AI accused of scraping content against websites’ will with unlisted IP ranges

Feedly Summary: Cloudflare finds AI search biz ignoring crawl prohibitions and trying to hide its spiders
Perplexity, an AI search startup, has been spotted trying to disguise its content-scraping bots while flouting websites’ no-crawl directives.…

AI Summary and Description: Yes

Summary: The text discusses an incident involving Perplexity, an AI search startup that has been accused of violating websites’ crawl prohibitions by disguising its content-scraping bots. This raises concerns regarding ethical web scraping practices and the implications for data privacy and security in AI applications.

Detailed Description:
The situation highlighted in the text involves Perplexity, an emerging player in the AI search engine market. The crux of the issue is centered around the company’s alleged attempts to bypass restrictions imposed by websites, specifically those that request not to be crawled by automated bots. Such practices can have significant implications in the fields of AI and information security, particularly regarding compliance with web standards and ethical guidelines.

Key Points:

– **Crawling Violations**: Perplexity’s actions reportedly include ignoring directives from various websites that specifically deny permission for their content to be crawled. This raises questions about adherence to data ownership rights and ethical standards in AI development.

– **Disguising Bots**: The effort to mask the identity of its scraping bots suggests a lack of transparency and potentially exposes both the company and its users to risks, including legal ramifications and reputational damage.

– **Implications for AI Security**: The incident underscores the necessity for stricter guidelines and security measures in AI applications to ensure they operate within legal frameworks and ethical boundaries.

– **Broader Context**: This case reflects wider challenges within the AI landscape, where the balance between data utilization for training models and respecting web governance can often be precarious.

The repercussions of such behavior can influence public trust in AI technologies and may prompt regulatory bodies to impose stricter rules that govern how AI companies access and utilize web content. Compliance professionals should monitor situations like this to better understand the evolving landscape of AI regulation and data privacy.

2 2025 4 5 a access Act actions age AI AI applications AI development AI landscape AI regulation AI security AI technologies and API app Application applications Arch art as at ated Auto automated bots Behavior Bi bots by bypass C centered CERN challenge challenges CI Cloud Cloudflare co companies compliance compliance professionals concerns content Context core crawler crawlers D data data ownership data ownership rights data privacy data utilization de development directive e emerging ERP ethical ethical boundaries Ethical Guidelines ethical standards exp for framework frameworks g GIS Go governance guidelines H high Highlight http HTTPS identity implications in incident Influence information information security io issue ite k Key l Lance land led Legal Legal Framework legal frameworks legal ramifications Li M market measures mission Mode model models Monitor N no o of on ons OPM ory oS out over ownership per perplexity play point potential practices pre privacy pro professionals prompt ps public public trust Q question R Raise rate RCE re red Regulation regulatory regulatory bodies report reputation restrictions right Risk risks Ro Rust s scraping search search engine search engine market sec security security measure security measures Sig source specific SSE standards STAR start startup T tech technologies ted text the to Tor TP training transparency trust trust in AI UI under up US use user Users utilization V Violations web web content web scraping web standards website Wi x z