Slashdot: Lawsuit Accuses Meta Of Training AI On Torrented 82TB Dataset Of Pirated Books

Feb 16, 2025

—

Source URL: https://yro.slashdot.org/story/25/02/16/0346210/lawsuit-accuses-meta-of-training-ai-on-torrented-82tb-dataset-of-pirated-books?utm_source=rss1.0mainlinkanon&utm_medium=feed
Source: Slashdot
Title: Lawsuit Accuses Meta Of Training AI On Torrented 82TB Dataset Of Pirated Books

Feedly Summary:

AI Summary and Description: Yes

**Summary:** The text discusses a class action lawsuit against Meta related to copyright infringement using illegally acquired data for AI training. It sheds light on the ethical concerns raised internally within the company about utilizing content from shadow libraries, as well as practices aimed at masking their tracks, which are critical points for professionals in AI ethics, compliance, and data security.

**Detailed Description:**
This case highlights significant issues in AI development practices concerning data acquisition, ethical considerations, and compliance with copyright laws. Key points include:

– **Lawsuit Context:** Meta is facing a class action lawsuit for allegedly infringing copyrights by using data sourced from torrent sites, which raises questions about their compliance with intellectual property laws.

– **Extent of Illegally Acquired Data:** Reports state that Meta purportedly utilized 81.7TB of copyrighted material from shadow libraries, underlining the scale and potential impact of the infringement.

– **Internal Ethical Concerns:** Meta employees expressed discomfort and ethical objections to the practice of using such data, pointing to a potential culture of ethical negligence within the organization related to AI model training.

– **Impact of Corporate Decisions:** The involvement of senior leadership, specifically mentions that concerns reached CEO Mark Zuckerberg, implies a top-down approach in decision-making regarding data acquisition practices.

– **Use of VPNs for Anonymity:** The discussion among employees about using VPNs to conceal their IP addresses to facilitate the downloading of this data indicates a deliberate effort to avoid detection, which raises further ethical and compliance concerns regarding internal security practices.

– **Governance and Compliance Implications:** This situation highlights the necessity for robust governance frameworks within organizations that manage AI development, particularly concerning compliance with copyright laws and ethical standards related to data usage.

This case serves as a critical reminder for security and compliance professionals in the tech industry to ensure rigorous adherence to ethical standards, data integrity, and legal compliance when developing AI technologies.

1 2 3 4 5 7 a acquisition Act action lawsuit AI AI development AI Ethics ai model AI technologies and anonymity art as AWS by C CERN class class action compliance compliance implications compliance professionals concerns content Context copyright copyright infringement Copyright Law copyright laws Copyrights critical culture D data data acquisition data acquisition practices data integrity data security data usage dataset de decision decision-making decisions detection development development practices DoT e ethical ethical concerns ethical considerations ethical standards Ethics exp for framework frameworks g Gen Go governance governance framework governance frameworks high Highlight http HTTPS implications in industry integrity Intel Intellectual Property intellectual property law intellectual property laws inter intern ite J k Key l law lawsuit leadership led Legal legal compliance libraries Link making man Meta model model training negligence no non o of on OPM organization organizations ory out over point potential pre professionals Py question R rack rate RCE red report right Ro s Scale sec security security and compliance security practices SHA shadow libraries side Sig source specific SSE standards state T tech tech industry technologies text the to Tor TP training UI US usage use V VPN Well Wi x