Tag: data licensing
-
Slashdot: Reddit Wants ‘Deeper Integration’ with Google in Exchange for Licensed AI Training Data
Source URL: https://tech.slashdot.org/story/25/09/22/0313234/reddit-wants-deeper-integration-with-google-in-exchange-for-licensed-ai-training-data?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Reddit Wants ‘Deeper Integration’ with Google in Exchange for Licensed AI Training Data Feedly Summary: AI Summary and Description: Yes Summary: The text discusses Reddit’s ongoing negotiations with Google for a new deal that involves deeper integration with AI products and a dynamic pricing structure for licensing its data.…
-
Simon Willison’s Weblog: Releasing the largest multilingual open pretraining dataset
Source URL: https://simonwillison.net/2024/Nov/14/releasing-the-largest-multilingual-open-pretraining-dataset/#atom-everything Source: Simon Willison’s Weblog Title: Releasing the largest multilingual open pretraining dataset Feedly Summary: Releasing the largest multilingual open pretraining dataset Common Corpus is a new “open and permissible licensed text dataset, comprising over 2 trillion tokens (2,003,039,184,047 tokens)" released by French AI Lab PleIAs. This appears to be the largest available…
-
Wired: This Startup Wants YouTube Creators to Get Paid for AI Training Data
Source URL: https://www.wired.com/story/license-to-scrape-youtube-ai-data-license-creators/ Source: Wired Title: This Startup Wants YouTube Creators to Get Paid for AI Training Data Feedly Summary: While big platforms like Reddit have signed deals with the AI giants, YouTube leaves licensing in the hands of individual creators. The “License to Scrape” program aims to give those streaming stars proper leverage. AI…