Source URL: https://arstechnica.com/tech-policy/2025/02/meta-torrented-over-81-7tb-of-pirated-books-to-train-ai-authors-say/
Source: Hacker News
Title: Meta torrented & seeded 81.7 TB dataset containing copyrighted data
Feedly Summary: Comments
AI Summary and Description: Yes
Summary: The text presents serious allegations against Meta regarding copyright violations involving the unauthorized use of pirated books for training AI models. Newly revealed emails indicate substantial illegal downloading and potential legal ramifications, highlighting risks in AI training practices that could impact security and compliance professionals.
Detailed Description: The situation revolves around Meta’s alleged illegal activities concerning the training of its AI models using copyrighted materials. Here are the key points:
– **Evidence of Copyright Violation**: Newly unsealed emails suggest that Meta engaged in a significant torrenting operation, accumulating over 162 terabytes of data from pirated sources.
– **Torrented Data Sources**: The data was reportedly sourced from known shadow libraries, specifically Z-Library and LibGen, raising major ethical and legal questions regarding data acquisition for AI training.
– **Legal Implications**: There are concerns about copyright infringement laws and the implications of seeding pirated content, which could lead to litigation and serious penalties for Meta unless properly addressed.
– **Internal Concerns**: Emails from Meta employees reflect an evolving acknowledgment of the potential legal risks associated with torrenting. Initial humorous dismissals gave way to serious discussions with legal teams about the risks of using corporate resources to download pirated content.
– **Comparison with Smaller Cases**: The authors highlight the contrast between Meta’s large-scale alleged infringement and smaller past cases of piracy that led to significant legal action.
This case not only calls into question the ethical practices of using pirated data but also raises concerns for AI developers about compliance with copyright laws and the governance of data used in AI training. Professionals in AI security, compliance, and governance must closely monitor such developments to ensure they adhere to legal standards and ethical guidelines in their own practices.