Meta Faces Allegations of Using Pirated Books to Train AI Systems
Meta Platforms, the parent company of Facebook, Instagram, and WhatsApp, is facing allegations of using pirated versions of copyrighted books to train its AI systems. The claims, outlined in newly disclosed court filings, assert that Meta’s CEO, Mark Zuckerberg, approved the use of the contentious materials despite internal concerns.
Prominent authors, including Ta-Nehisi Coates and comedian Sarah Silverman, are among those suing Meta for copyright infringement. Their lawsuit, filed in 2023, contends that Meta misused their works to train its large language model, Llama.
Newly revealed evidence, disclosed in filings made public on Wednesday in a California federal court, suggests Meta knowingly utilized the AI training dataset LibGen, which allegedly contains millions of pirated works.
Internal Communications Spark Controversy
According to the authors, internal communications obtained during the discovery process demonstrate that Zuckerberg approved the use of LibGen for training purposes. The dataset, which Meta allegedly distributed via peer-to-peer torrents, was flagged internally as “pirated” by Meta’s AI executive team.
The court filings quote internal Meta communications stating, “LibGen is a dataset we know to be pirated.” Despite these concerns, the decision to proceed reportedly received executive approval. Meta has yet to comment on the allegations, and company spokespersons have not immediately responded to media inquiries.
Expanding Legal Claims
The authors requested court approval on Wednesday to amend their original complaint, citing this new evidence as critical to their case. They argue that the revelations strengthen their claims of copyright infringement and justify reviving a previously dismissed claim regarding the removal of copyright management information (CMI).
The authors have also introduced a new claim of computer fraud. U.S. District Judge Vince Chhabria, overseeing the case, allowed the writers to file an updated complaint but expressed skepticism about the validity of the CMI and fraud claims during a hearing on Thursday. However, the judge has yet to make a final ruling on these issues.
Widening Debate on AI Training Practices
This case is part of a broader debate over the use of copyrighted materials in AI training. Multiple lawsuits have accused tech companies of misappropriating works by authors, artists, and other creators to develop AI tools without permission. Defendants in these cases, including Meta, have often argued that their use of such materials qualifies as “fair use.”
The outcome of this lawsuit could have far-reaching implications for the tech industry, potentially shaping how companies source data for AI training while balancing intellectual property rights.