Tag: robots.txt

  • The Register: Copyright-ignoring AI scraper bots laugh at robots.txt so the IETF is trying to improve it

    Source URL: https://www.theregister.com/2025/04/09/ietf_ai_preferences_working_group/ Source: The Register Title: Copyright-ignoring AI scraper bots laugh at robots.txt so the IETF is trying to improve it Feedly Summary: Recently formed AI Preferences Working Group has August deadline to deliver proposals The Internet Engineering Task Force has chartered a group it hopes will create a standard that lets content creators…

  • Slashdot: Open Source Devs Say AI Crawlers Dominate Traffic, Forcing Blocks On Entire Countries

    Source URL: https://tech.slashdot.org/story/25/03/26/016244/open-source-devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries Source: Slashdot Title: Open Source Devs Say AI Crawlers Dominate Traffic, Forcing Blocks On Entire Countries Feedly Summary: AI Summary and Description: Yes Summary: The text discusses the challenges faced by software developers, particularly open source maintainers, in managing aggressive AI crawler traffic that overwhelms their repositories. This scenario underscores the urgent…

  • Hacker News: Devs say AI crawlers dominate traffic, forcing blocks on entire countries

    Source URL: https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/ Source: Hacker News Title: Devs say AI crawlers dominate traffic, forcing blocks on entire countries Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the challenges faced by software developers in managing aggressive AI crawler traffic that negatively affects open-source projects, leading to significant service instability and increased operational…

  • Hacker News: IETF setting standards for AI preferences

    Source URL: https://www.ietf.org/blog/aipref-wg/ Source: Hacker News Title: IETF setting standards for AI preferences Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the formation of the AI Preferences (AIPREF) Working Group, aimed at standardizing how content preferences are expressed for AI model training, amid concerns from content publishers about unauthorized use. This…

  • The Register: Cloudflare builds an AI to lead AI scraper bots into a horrible maze of junk content

    Source URL: https://www.theregister.com/2025/03/21/cloudflare_ai_labyrinth/ Source: The Register Title: Cloudflare builds an AI to lead AI scraper bots into a horrible maze of junk content Feedly Summary: Slop-making machine will feed unauthorized scrapers what they so richly deserve, hopefully without poisoning the internet Cloudflare has created a bot-busting AI to make life hell for AI crawlers.… AI…

  • Hacker News: FOSS infrastructure is under attack by AI companies

    Source URL: https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/ Source: Hacker News Title: FOSS infrastructure is under attack by AI companies Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses recent disruptions faced by open-source projects due to aggressive AI crawlers that disregard robots.txt protocols, leading to significant operations challenges and increased workloads for system administrators. It highlights…

  • Hacker News: AI crawlers haven’t learned to play nice with websites

    Source URL: https://www.theregister.com/2025/03/18/ai_crawlers_sourcehut/ Source: Hacker News Title: AI crawlers haven’t learned to play nice with websites Feedly Summary: Comments AI Summary and Description: Yes Summary: SourceHut reports that excessive crawling by AI companies’ web crawlers is disrupting its services. These crawlers, primarily for training large language models (LLMs), have compelled SourceHut to implement several mitigations,…

  • Hacker News: Please stop externalizing your costs directly into my face

    Source URL: https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html Source: Hacker News Title: Please stop externalizing your costs directly into my face Feedly Summary: Comments AI Summary and Description: Yes Summary: The text reflects a sysadmin’s frustration with the disruptive impact of LLM crawlers on operational stability. It discusses ongoing battles against the misuse of computing resources by malicious bots, underscoring…

  • The Register: AI crawlers haven’t learned to play nice with websites

    Source URL: https://www.theregister.com/2025/03/18/ai_crawlers_sourcehut/ Source: The Register Title: AI crawlers haven’t learned to play nice with websites Feedly Summary: SourceHut says it’s getting DDoSed by LLM bots SourceHut, an open source git-hosting service, says web crawlers for AI companies are slowing down services through their excessive demands for data.… AI Summary and Description: Yes Summary: The…

  • Slashdot: BlueSky Proposes ‘New Standard’ for When Scraping Data for AI Training

    Source URL: https://tech.slashdot.org/story/25/03/17/0434237/bluesky-proposes-new-standard-for-when-scraping-data-for-ai-training?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: BlueSky Proposes ‘New Standard’ for When Scraping Data for AI Training Feedly Summary: AI Summary and Description: Yes Summary: The article discusses Bluesky’s proposal for user data consent regarding scraping for generative AI training and archiving. This initiative signifies a potential shift in how user data privacy is managed…