Tag: scraping

  • The Register: Perplexity vexed by Cloudflare’s claims its bots are bad

    Source URL: https://www.theregister.com/2025/08/05/perplexity_vexed_by_cloudflares_claims/ Source: The Register Title: Perplexity vexed by Cloudflare’s claims its bots are bad Feedly Summary: AI search biz insists its content capture and summarization is okay because someone asked for it AI search biz Perplexity claims that Cloudflare has mischaracterized its site crawlers as malicious bots and that the content delivery network…

  • Slashdot: Nearly 100,000 ChatGPT Conversations Were Searchable on Google

    Source URL: https://yro.slashdot.org/story/25/08/05/1535248/nearly-100000-chatgpt-conversations-were-searchable-on-google?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Nearly 100,000 ChatGPT Conversations Were Searchable on Google Feedly Summary: AI Summary and Description: Yes Summary: The text discusses a significant privacy concern regarding nearly 100,000 publicly shared conversations from OpenAI’s ChatGPT that were indexed by Google. It highlights the potential risks involved when users share conversations, revealing a…

  • The Register: Perplexity AI accused of scraping content against websites’ will with unlisted IP ranges

    Source URL: https://www.theregister.com/2025/08/04/perplexity_ai_crawlers_accused_data_raids/ Source: The Register Title: Perplexity AI accused of scraping content against websites’ will with unlisted IP ranges Feedly Summary: Cloudflare finds AI search biz ignoring crawl prohibitions and trying to hide its spiders Perplexity, an AI search startup, has been spotted trying to disguise its content-scraping bots while flouting websites’ no-crawl directives.……

  • The Cloudflare Blog: Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives

    Source URL: https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/ Source: The Cloudflare Blog Title: Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives Feedly Summary: Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites. AI Summary and Description: Yes Summary: The…

  • Cloud Blog: How Jina AI built its 100-billion-token web grounding system with Cloud Run GPUs

    Source URL: https://cloud.google.com/blog/products/application-development/how-jina-ai-built-its-100-billion-token-web-grounding-system-with-cloud-run-gpus/ Source: Cloud Blog Title: How Jina AI built its 100-billion-token web grounding system with Cloud Run GPUs Feedly Summary: Editor’s note: The Jina AI Reader is a specialized tool that transforms raw web content from URLs or local files into a clean, structured, and LLM-friendly format.  In this post, Han Xiao details…

  • Slashdot: Browser Extensions Turn Nearly 1 Million Browsers Into Website-Scraping Bots

    Source URL: https://tech.slashdot.org/story/25/07/09/2257245/browser-extensions-turn-nearly-1-million-browsers-into-website-scraping-bots?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Browser Extensions Turn Nearly 1 Million Browsers Into Website-Scraping Bots Feedly Summary: AI Summary and Description: Yes **Summary:** The text discusses the alarming discovery of over 240 browser extensions that have exploited users’ browsers to scrape sensitive data without their consent. This incident highlights substantial privacy and security implications,…

  • Slashdot: The Open-Source Software Saving the Internet From AI Bot Scrapers

    Source URL: https://news.slashdot.org/story/25/07/07/2146228/the-open-source-software-saving-the-internet-from-ai-bot-scrapers?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: The Open-Source Software Saving the Internet From AI Bot Scrapers Feedly Summary: AI Summary and Description: Yes Summary: The text discusses “Anubis,” a tool designed to combat AI bot scrapers by using browser features to automate CAPTCHA verification through cryptographic math. Its adoption by notable organizations highlights the tool’s…

  • The Register: Cloudflare creates AI crawler tollbooth to pay publishers

    Source URL: https://www.theregister.com/2025/07/01/cloudflare_creates_ai_crawler_toll/ Source: The Register Title: Cloudflare creates AI crawler tollbooth to pay publishers Feedly Summary: The bargain between content makers and crawlers has broken down ai-pocalypse Cloudflare has started blocking AI web crawlers by default in a bid to become the internet’s gatekeeper.… AI Summary and Description: Yes Summary: The text highlights a…

  • Slashdot: Cloudflare Flips AI Scraping Model With Pay-Per-Crawl System For Publishers

    Source URL: https://tech.slashdot.org/story/25/07/01/1745245/cloudflare-flips-ai-scraping-model-with-pay-per-crawl-system-for-publishers?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Cloudflare Flips AI Scraping Model With Pay-Per-Crawl System For Publishers Feedly Summary: AI Summary and Description: Yes Summary: Cloudflare’s new “Pay Per Crawl” program introduces a monetization option for website owners, allowing them to charge AI companies for content access used for model training. This initiative is significant as…