Tag: robots.txt
-
Slashdot: RSS Co-Creator Launches New Protocol For AI Data Licensing
Source URL: https://tech.slashdot.org/story/25/09/10/2320207/rss-co-creator-launches-new-protocol-for-ai-data-licensing?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: RSS Co-Creator Launches New Protocol For AI Data Licensing Feedly Summary: AI Summary and Description: Yes Summary: The Real Simple Licensing (RSL) initiative seeks to standardize and simplify the licensing of online content for AI training, backed by major publishers such as Reddit and Medium. It aims to create…
-
Slashdot: Are AI Web Crawlers ‘Destroying Websites’ In Their Hunt for Training Data?
Source URL: https://tech.slashdot.org/story/25/08/31/1820249/are-ai-web-crawlers-destroying-websites-in-their-hunt-for-training-data Source: Slashdot Title: Are AI Web Crawlers ‘Destroying Websites’ In Their Hunt for Training Data? Feedly Summary: AI Summary and Description: Yes Summary: The text discusses the adverse effects of AI web crawlers on website performance, highlighting the increasing web traffic attributed to these bots. It addresses the challenges website owners face…
-
The Cloudflare Blog: A deeper look at AI crawlers: breaking down traffic by purpose and industry
Source URL: https://blog.cloudflare.com/ai-crawler-traffic-by-purpose-and-industry/ Source: The Cloudflare Blog Title: A deeper look at AI crawlers: breaking down traffic by purpose and industry Feedly Summary: We are extending AI-related insights on Cloudflare Radar with new industry-focused data and a breakdown of bot traffic by purpose, such as training or user action. AI Summary and Description: Yes Summary:…
-
The Cloudflare Blog: Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives
Source URL: https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/ Source: The Cloudflare Blog Title: Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives Feedly Summary: Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites. AI Summary and Description: Yes Summary: The…
-
Slashdot: Perplexity is Using Stealth, Undeclared Crawlers To Evade Website No-Crawl Directives, Cloudflare Says
Source URL: https://tech.slashdot.org/story/25/08/04/1459240/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives-cloudflare-says?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Perplexity is Using Stealth, Undeclared Crawlers To Evade Website No-Crawl Directives, Cloudflare Says Feedly Summary: AI Summary and Description: Yes Summary: The report highlights ethical concerns regarding the web crawling practices of the AI startup Perplexity. By using undetected methods to bypass website restrictions on automated access, this behavior…
-
Simon Willison’s Weblog: TIL: Rate limiting by IP using Cloudflare’s rate limiting rules
Source URL: https://simonwillison.net/2025/Jul/3/rate-limiting-by-ip/#atom-everything Source: Simon Willison’s Weblog Title: TIL: Rate limiting by IP using Cloudflare’s rate limiting rules Feedly Summary: TIL: Rate limiting by IP using Cloudflare’s rate limiting rules My blog started timing out on some requests a few days ago, and it turned out there were misbehaving crawlers that were spidering my /search/…
-
The Cloudflare Blog: From Googlebot to GPTBot: who’s crawling your site in 2025
Source URL: https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/ Source: The Cloudflare Blog Title: From Googlebot to GPTBot: who’s crawling your site in 2025 Feedly Summary: From May 2024 to May 2025, crawler traffic rose 18%, with GPTBot growing 305% and Googlebot 96%. AI Summary and Description: Yes Summary: The text discusses the evolution of web crawlers, particularly focusing on the…
-
The Cloudflare Blog: Control content use for AI training with Cloudflare’s managed robots.txt and blocking for monetized content
Source URL: https://blog.cloudflare.com/control-content-use-for-ai-training/ Source: The Cloudflare Blog Title: Control content use for AI training with Cloudflare’s managed robots.txt and blocking for monetized content Feedly Summary: Cloudflare is making it easier for publishers and content creators of all sizes to prevent their content from being scraped for AI training by managing robots.txt on their behalf. AI…
-
Simon Willison’s Weblog: System Card: Claude Opus 4 & Claude Sonnet 4
Source URL: https://simonwillison.net/2025/May/25/claude-4-system-card/#atom-everything Source: Simon Willison’s Weblog Title: System Card: Claude Opus 4 & Claude Sonnet 4 Feedly Summary: System Card: Claude Opus 4 & Claude Sonnet 4 Direct link to a PDF on Anthropic’s CDN because they don’t appear to have a landing page anywhere for this document. Anthropic’s system cards are always worth…
-
Simon Willison’s Weblog: Claude feature drop
Source URL: https://simonwillison.net/2025/May/2/claude-search/ Source: Simon Willison’s Weblog Title: Claude feature drop Feedly Summary: It’s not in their release notes yet but Anthropic pushed some big new features today. Alex Albert: We’ve improved web search and rolled it out worldwide to all paid plans. Web search now combines light Research functionality, allowing Claude to automatically adjust…