Tag: data scraping

  • The Register: Cloudflare builds an AI to lead AI scraper bots into a horrible maze of junk content

    Source URL: https://www.theregister.com/2025/03/21/cloudflare_ai_labyrinth/ Source: The Register Title: Cloudflare builds an AI to lead AI scraper bots into a horrible maze of junk content Feedly Summary: Slop-making machine will feed unauthorized scrapers what they so richly deserve, hopefully without poisoning the internet Cloudflare has created a bot-busting AI to make life hell for AI crawlers.… AI…

  • Slashdot: AI Crawlers Haven’t Learned To Play Nice With Websites

    Source URL: https://slashdot.org/story/25/03/19/1027251/ai-crawlers-havent-learned-to-play-nice-with-websites?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: AI Crawlers Haven’t Learned To Play Nice With Websites Feedly Summary: AI Summary and Description: Yes Summary: SourceHut is experiencing service disruptions due to aggressive web crawling by AI companies collecting data for training large language models (LLMs). They have implemented mitigations, including blocking certain cloud providers due to…

  • Hacker News: AI crawlers haven’t learned to play nice with websites

    Source URL: https://www.theregister.com/2025/03/18/ai_crawlers_sourcehut/ Source: Hacker News Title: AI crawlers haven’t learned to play nice with websites Feedly Summary: Comments AI Summary and Description: Yes Summary: SourceHut reports that excessive crawling by AI companies’ web crawlers is disrupting its services. These crawlers, primarily for training large language models (LLMs), have compelled SourceHut to implement several mitigations,…

  • The Register: AI crawlers haven’t learned to play nice with websites

    Source URL: https://www.theregister.com/2025/03/18/ai_crawlers_sourcehut/ Source: The Register Title: AI crawlers haven’t learned to play nice with websites Feedly Summary: SourceHut says it’s getting DDoSed by LLM bots SourceHut, an open source git-hosting service, says web crawlers for AI companies are slowing down services through their excessive demands for data.… AI Summary and Description: Yes Summary: The…

  • Slashdot: BlueSky Proposes ‘New Standard’ for When Scraping Data for AI Training

    Source URL: https://tech.slashdot.org/story/25/03/17/0434237/bluesky-proposes-new-standard-for-when-scraping-data-for-ai-training?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: BlueSky Proposes ‘New Standard’ for When Scraping Data for AI Training Feedly Summary: AI Summary and Description: Yes Summary: The article discusses Bluesky’s proposal for user data consent regarding scraping for generative AI training and archiving. This initiative signifies a potential shift in how user data privacy is managed…

  • Hacker News: AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt

    Source URL: https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/ Source: Hacker News Title: AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the creation of a new malware named Nepenthes, designed by a software developer to combat AI web crawlers that ignore “no scraping” directives…

  • Slashdot: Developer Creates Infinite Maze That Traps AI Training Bots

    Source URL: https://slashdot.org/story/25/01/23/2135205/developer-creates-infinite-maze-that-traps-ai-training-bots?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Developer Creates Infinite Maze That Traps AI Training Bots Feedly Summary: AI Summary and Description: Yes Summary: The text discusses the development of an open-source program called Nepenthes, designed to trap AI web crawlers in an endless loop of link generation, effectively wasting their resources. This innovative approach provides…

  • Hacker News: Nepenthes is a tarpit to catch AI web crawlers

    Source URL: https://zadzmo.org/code/nepenthes/ Source: Hacker News Title: Nepenthes is a tarpit to catch AI web crawlers Feedly Summary: Comments AI Summary and Description: Yes Summary: The text describes “Nepenthes,” a tarpit software devised to trap web crawlers, particularly those scraping data for large language models (LLMs). It offers unique functionalities and deployment setups, with explicit…