Anthropic judge rejects $1.5B AI copyright settlement
What the Case Is Actually About
- Multiple commenters stress that this suit is not about training on copyrighted books in general.
- The judge has already ruled that using purchased and scanned books for training is fair use; the problem is Anthropic downloading pirated copies (LibGen, Pirate Library Mirror) and keeping them in a “central library.”
- The alleged infringement is at procurement / library creation time, not at model-training time. Whether using pirated copies for training is fair use is described as ambiguous or unresolved in this ruling.
Judge’s Rejection of the Settlement
- The settlement was rejected “without prejudice” mainly for procedural reasons, not because the dollar amount is clearly too low or too high.
- Concerns raised:
- How authors are notified, how they file claims, and how payments are administered.
- Whether Anthropic is properly protected from later, duplicative suits (“double dipping”).
- Whether lawyers’ fees will consume too much of the $1.5B pool.
- Commenters expect the parties could fix these issues without changing the per-book amount.
Is ~$3,000 per Book Fair?
- One author in the thread (with 3 included books) feels ~$9k total is fair, especially for titles with low advances that never earned out.
- Others argue it’s too low relative to statutory damages (up to $150k per willful infringement), the value of models built on the corpus, and the deterrence needed so companies don’t just “steal first, pay later.”
- Some see it as a windfall: compared to a $30 book, ~$3k/book is ~100×, clearly punitive for one pirated copy.
- Disputes arise over who should get the money (authors vs publishers; impact of advances and rights arrangements).
Copying, Fair Use, and “Statistical” Learning
- Long subthread debates whether training is “copying” or merely learning statistics:
- One side: training explicitly reproduces sequences during optimization; models can regurgitate text and code; this is causally linked to underlying infringement.
- Other side: proper training is about aggregate statistics, not exact memorization; accidental verbatim output is overfitting, not the intent.
- Analogy battles: pirated Photoshop used to make a game; humans imitating style; music “substantial similarity” cases; whether style vs expression is protectable.
- Some insist copyright should hinge on outputs (substantial similarity), not internal representations; others say illegal acquisition itself is enough to trigger liability.
Humans vs Machines
- One camp warns: if “learning from copyrighted works” is treated as infringement, it logically extends to humans and would destroy normal artistic practice.
- The opposing view: law can—and should—treat corporate AI systems differently from human creators; scale, profit motive, and replace-all-creative-work ambitions matter.
Broader IP and AI Concerns
- Philosophical split:
- Anti-IP voices say copyright is overlong, protects incumbents, and isn’t needed for creativity in many domains.
- Pro-IP voices argue that large, risky investments (drugs, blockbuster films, complex software) depend on enforceable rights.
- Some predict generative AI will erode markets for books, news, and other writing by capturing value without paying sources; others doubt it has meaningfully replaced books for them.