Tag: training practices
-
Slashdot: Senator Introduces Bill To Compel More Transparency From AI Developers
Source URL: https://yro.slashdot.org/story/24/11/26/0047249/senator-introduces-bill-to-compel-more-transparency-from-ai-developers?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Senator Introduces Bill To Compel More Transparency From AI Developers Feedly Summary: AI Summary and Description: Yes Summary: The introduction of the TRAIN Act, aimed at enhancing the rights of human creators regarding the use of their work in training AI models, highlights a significant step towards accountability in…
-
Wired: New York Times Says OpenAI Erased Potential Lawsuit Evidence
Source URL: https://www.wired.com/story/new-york-times-openai-erased-potential-lawsuit-evidence/ Source: Wired Title: New York Times Says OpenAI Erased Potential Lawsuit Evidence Feedly Summary: As part of an ongoing copyright lawsuit, The New York Times says it spent 150 hours sifting through OpenAI’s training data looking for potential evidence—only for OpenAI to delete all of its work. AI Summary and Description: Yes…
-
Slashdot: AI Lab PleIAs Releases Fully Open Dataset, as AMD, Ai2 Release Open AI Models
Source URL: https://news.slashdot.org/story/24/11/16/0326222/ai-lab-pleias-releases-fully-open-dataset-as-amd-ai2-release-open-ai-models?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: AI Lab PleIAs Releases Fully Open Dataset, as AMD, Ai2 Release Open AI Models Feedly Summary: AI Summary and Description: Yes Summary: The text outlines PleIAs’ commitment to open training for large language models (LLMs) through the release of Common Corpus, highlighting the significance of open data for LLM…
-
Simon Willison’s Weblog: Releasing the largest multilingual open pretraining dataset
Source URL: https://simonwillison.net/2024/Nov/14/releasing-the-largest-multilingual-open-pretraining-dataset/#atom-everything Source: Simon Willison’s Weblog Title: Releasing the largest multilingual open pretraining dataset Feedly Summary: Releasing the largest multilingual open pretraining dataset Common Corpus is a new “open and permissible licensed text dataset, comprising over 2 trillion tokens (2,003,039,184,047 tokens)" released by French AI Lab PleIAs. This appears to be the largest available…