Federal judge rules Anthropic's use of 7 million books for AI training is fair use — but a damages trial looms
Jun 24, 2025
Key Points
- A federal judge ruled that Anthropic's use of 7 million books to train Claude constitutes fair use under US copyright law, the first major test of this doctrine in generative AI.
- The court found Anthropic infringed copyright by storing pirated books in a central repository, setting up a December damages trial where statutory awards could exceed $1 billion but actual settlements likely reach $10–20 per book.
- The ruling creates precedent that transformative AI training may qualify as fair use, potentially extending to video models like Sora training on YouTube content and shifting publisher leverage toward licensing deals.
Summary
A federal judge ruled that Anthropic's use of 7 million books to train Claude qualifies as fair use under US copyright law. This is the first major application of fair use doctrine to generative AI training, a legal issue all AI labs are tracking closely.
The court found that Anthropic infringed copyright by storing over 7 million pirated books in a central repository. A damages trial is scheduled for December, with statutory damages potentially reaching $150,000 per book. At that rate, total damages could exceed $1 billion, though both speakers expect the actual award to settle far below statutory maximums, potentially around $10–$20 per book. Anthropic could have legitimately accessed the same books through public libraries rather than pirating them.
The fair use ruling has broader strategic weight. One speaker argues that fair use is essential to AI development. Without it, US models would be crippled while Chinese competitors train on unrestricted data regardless of copyright constraints. The ruling establishes that transformative use — summarizing themes in Harry Potter or extracting concepts — is distinct from reproducing the original work, which LLM users rarely do.
The decision raises unresolved questions about video training data. If book training qualifies as fair use, the same logic could extend to video models like OpenAI's Sora training on YouTube content. The speakers also note that rights holders may seek licensing deals with AI labs as an alternative. Publishers like Simon & Schuster or Penguin Random House could grant deeper access in exchange for per-query payments.
Authors filed the case against Anthropic, which is backed by Amazon and Alphabet, over the company's use of books to train Claude without compensation.