Tag: training practices

  • Slashdot: Microsoft’s LinkedInn Sued For Disclosing Customer Information To Train AI Models

    Source URL: https://yro.slashdot.org/story/25/01/22/236253/microsofts-linkedinn-sued-for-disclosing-customer-information-to-train-ai-models?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Microsoft’s LinkedInn Sued For Disclosing Customer Information To Train AI Models Feedly Summary: AI Summary and Description: Yes Summary: The text reports on a lawsuit against LinkedIn by its Premium customers, alleging unauthorized disclosure of private messages to third parties for training generative AI models, resulting in contract breach…

  • Slashdot: Authors Seek Meta’s Torrent Client Logs and Seeding Data In AI Piracy Probe

    Source URL: https://meta.slashdot.org/story/25/01/20/224209/authors-seek-metas-torrent-client-logs-and-seeding-data-in-ai-piracy-probe Source: Slashdot Title: Authors Seek Meta’s Torrent Client Logs and Seeding Data In AI Piracy Probe Feedly Summary: AI Summary and Description: Yes Summary: The text discusses legal actions against Meta for allegedly using pirated materials, specifically books from the piracy site LibGen, to train its AI models. It highlights arguments about…

  • Hacker News: Zuckerberg appeared to know Llama trained on Libgen

    Source URL: https://www.rollingstone.com/culture/culture-news/ai-meta-pirated-library-zuckerberg-1235235394/ Source: Hacker News Title: Zuckerberg appeared to know Llama trained on Libgen Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The unsealed internal communications at Meta reveal its questionable practices in using pirated text from Library Genesis for training its AI model, Llama. This raises significant legal concerns about copyright infringement…

  • The Register: Court docs allege Meta trained its AI models on contentious trove of maybe-pirated content

    Source URL: https://www.theregister.com/2025/01/10/meta_libgen_allegation/ Source: The Register Title: Court docs allege Meta trained its AI models on contentious trove of maybe-pirated content Feedly Summary: Did Zuck’s definition of ‘free expression’ just get even broader? Meta allegedly downloaded material from an online source that’s been sued for breaching copyright, because it wanted the material to train its…

  • Wired: Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal

    Source URL: https://www.wired.com/story/new-documents-unredacted-meta-copyright-ai-lawsuit/ Source: Wired Title: Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal Feedly Summary: One of the most important AI copyright legal battles just took a major turn. AI Summary and Description: Yes Summary: Meta has faced a significant legal setback regarding its training practices for…

  • Hacker News: OpenAI whistleblower found dead in San Francisco apartment

    Source URL: https://www.mercurynews.com/2024/12/13/openai-whistleblower-found-dead-in-san-francisco-apartment/ Source: Hacker News Title: OpenAI whistleblower found dead in San Francisco apartment Feedly Summary: Comments AI Summary and Description: Yes Summary: The text details the death of Suchir Balaji, a former OpenAI researcher and whistleblower, amid ongoing lawsuits against the company regarding its data practices and potential copyright violations related to the…

  • Slashdot: Senator Introduces Bill To Compel More Transparency From AI Developers

    Source URL: https://yro.slashdot.org/story/24/11/26/0047249/senator-introduces-bill-to-compel-more-transparency-from-ai-developers?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Senator Introduces Bill To Compel More Transparency From AI Developers Feedly Summary: AI Summary and Description: Yes Summary: The introduction of the TRAIN Act, aimed at enhancing the rights of human creators regarding the use of their work in training AI models, highlights a significant step towards accountability in…

  • Wired: New York Times Says OpenAI Erased Potential Lawsuit Evidence

    Source URL: https://www.wired.com/story/new-york-times-openai-erased-potential-lawsuit-evidence/ Source: Wired Title: New York Times Says OpenAI Erased Potential Lawsuit Evidence Feedly Summary: As part of an ongoing copyright lawsuit, The New York Times says it spent 150 hours sifting through OpenAI’s training data looking for potential evidence—only for OpenAI to delete all of its work. AI Summary and Description: Yes…

  • Slashdot: AI Lab PleIAs Releases Fully Open Dataset, as AMD, Ai2 Release Open AI Models

    Source URL: https://news.slashdot.org/story/24/11/16/0326222/ai-lab-pleias-releases-fully-open-dataset-as-amd-ai2-release-open-ai-models?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: AI Lab PleIAs Releases Fully Open Dataset, as AMD, Ai2 Release Open AI Models Feedly Summary: AI Summary and Description: Yes Summary: The text outlines PleIAs’ commitment to open training for large language models (LLMs) through the release of Common Corpus, highlighting the significance of open data for LLM…

  • Simon Willison’s Weblog: Releasing the largest multilingual open pretraining dataset

    Source URL: https://simonwillison.net/2024/Nov/14/releasing-the-largest-multilingual-open-pretraining-dataset/#atom-everything Source: Simon Willison’s Weblog Title: Releasing the largest multilingual open pretraining dataset Feedly Summary: Releasing the largest multilingual open pretraining dataset Common Corpus is a new “open and permissible licensed text dataset, comprising over 2 trillion tokens (2,003,039,184,047 tokens)" released by French AI Lab PleIAs. This appears to be the largest available…