New research reveals that Meta‘s Llama 3.1 70B AI model has memorized nearly complete text from popular books including Harry Potter and the Philosopher’s Stone, The Great Gatsby, and 1984. This discovery could expose Meta to over $1 billion in statutory damages if courts rule against the company in ongoing copyright infringement cases, fundamentally shifting the legal landscape around AI training on copyrighted materials.
What you should know: Researchers tested 13 AI models to determine how much copyrighted book content they could reproduce verbatim, finding dramatic differences between companies.
- Meta’s model demonstrated extensive memorization of entire books, while most other models from Google, DeepSeek, EleutherAI, and Microsoft showed minimal verbatim retention.
- The testing method involved splitting book excerpts into prefix and suffix sections, then prompting models to complete the passages.
- Researchers examined 36 copyrighted books, including works by authors currently suing Meta in the Kadrey v Meta Platforms case.
The big picture: This memorization finding challenges AI companies’ core legal defense that their models transform rather than replicate copyrighted works.
- Tech companies have argued their models create “fresh combinations of words” based on training data, qualifying for fair use protections.
- The discovery that at least one major model can reproduce books verbatim undermines claims that AI systems merely learn “general relationships between words.”
- “AI models are not just ‘plagiarism machines’, as some have alleged, but it also means that they do more than just learn general relationships between words,” says Mark Lemley, a Stanford University law professor.
Why this matters: The financial stakes are enormous, with researchers estimating that copyright infringement on just 3% of the Books3 dataset could result in nearly $1 billion in damages.
- The Books3 dataset contains nearly 200,000 copyrighted books, including many pirated copies, used to train popular AI models.
- Legal precedent from this case could determine whether AI companies must seek permission before training on copyrighted materials.
- The memorization capability varies significantly between models and books, making it “very hard to set a clear legal rule that will work across all cases.”
Legal implications differ by jurisdiction: The memorization findings carry different weight in US versus UK copyright law.
- US courts will evaluate whether the memorization qualifies under “fair use” doctrine, which provides broader exceptions for copyrighted material use.
- UK copyright law follows “fair dealing” concept with much narrower exceptions, making AI models that memorized pirated books unlikely to qualify for protection.
- Robert Lands at Howard Kennedy, a London law firm, notes the finding could be “very significant from a copyright perspective” in the UK.
What they’re saying: Legal experts emphasize this creates a powerful forensic tool while highlighting fundamental questions about AI training rights.
- “The question is, did they have the right to do it?” asks Randy McCarthy at Hall Estill, an Oklahoma law firm, noting that companies typically acknowledge training on copyrighted materials.
- Meta spokesperson Emil Vazquez maintains that “fair use of copyrighted materials is vital” for developing AI models, stating “We disagree with Plaintiffs’ assertions, and the full record tells a different story.”
- Mark Lemley, who previously defended Meta but dropped them as a client in January 2025, says he still believes the company should win the case.
Key details: The research methodology provides a replicable framework for testing AI memorization across different models and content types.
- Testing involved comparing AI completion probabilities against random chance to determine genuine memorization.
- The technique could serve as a “good forensic tool” for identifying the extent of AI memorization in future legal proceedings.
- Multiple ongoing lawsuits in US and UK courts will determine whether this memorization constitutes copyright infringement.
Meta's AI memorised books verbatim – that could cost it billions