Tag: scraper

  • The Register: Copyright-ignoring AI scraper bots laugh at robots.txt so the IETF is trying to improve it

    Source URL: https://www.theregister.com/2025/04/09/ietf_ai_preferences_working_group/ Source: The Register Title: Copyright-ignoring AI scraper bots laugh at robots.txt so the IETF is trying to improve it Feedly Summary: Recently formed AI Preferences Working Group has August deadline to deliver proposals The Internet Engineering Task Force has chartered a group it hopes will create a standard that lets content creators…

  • Hacker News: AI bots are destroying Open Access

    Source URL: https://go-to-hellman.blogspot.com/2025/03/ai-bots-are-destroying-open-access.html Source: Hacker News Title: AI bots are destroying Open Access Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses the ongoing battle between AI companies and institutions like libraries and open-access publishers, highlighting the aggressive tactics employed by AI bots that threaten the availability of quality information. It points…

  • The Register: Cloudflare builds an AI to lead AI scraper bots into a horrible maze of junk content

    Source URL: https://www.theregister.com/2025/03/21/cloudflare_ai_labyrinth/ Source: The Register Title: Cloudflare builds an AI to lead AI scraper bots into a horrible maze of junk content Feedly Summary: Slop-making machine will feed unauthorized scrapers what they so richly deserve, hopefully without poisoning the internet Cloudflare has created a bot-busting AI to make life hell for AI crawlers.… AI…

  • The Register: AI crawlers haven’t learned to play nice with websites

    Source URL: https://www.theregister.com/2025/03/18/ai_crawlers_sourcehut/ Source: The Register Title: AI crawlers haven’t learned to play nice with websites Feedly Summary: SourceHut says it’s getting DDoSed by LLM bots SourceHut, an open source git-hosting service, says web crawlers for AI companies are slowing down services through their excessive demands for data.… AI Summary and Description: Yes Summary: The…

  • Slashdot: BlueSky Proposes ‘New Standard’ for When Scraping Data for AI Training

    Source URL: https://tech.slashdot.org/story/25/03/17/0434237/bluesky-proposes-new-standard-for-when-scraping-data-for-ai-training?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: BlueSky Proposes ‘New Standard’ for When Scraping Data for AI Training Feedly Summary: AI Summary and Description: Yes Summary: The article discusses Bluesky’s proposal for user data consent regarding scraping for generative AI training and archiving. This initiative signifies a potential shift in how user data privacy is managed…

  • Simon Willison’s Weblog: Cutting-edge web scraping techniques at NICAR

    Source URL: https://simonwillison.net/2025/Mar/8/cutting-edge-web-scraping/#atom-everything Source: Simon Willison’s Weblog Title: Cutting-edge web scraping techniques at NICAR Feedly Summary: Cutting-edge web scraping techniques at NICAR Here’s the handout for a workshop I presented this morning at NICAR 2025 on web scraping, focusing on lesser know tips and tricks that became possible only with recent developments in LLMs. For…