2025-09-09

Anthropic judge rejects $1.5B AI copyright settlement

What the Case Is Actually About

Multiple commenters stress that this suit is not about training on copyrighted books in general.
The judge has already ruled that using purchased and scanned books for training is fair use; the problem is Anthropic downloading pirated copies (LibGen, Pirate Library Mirror) and keeping them in a “central library.”
The alleged infringement is at procurement / library creation time, not at model-training time. Whether using pirated copies for training is fair use is described as ambiguous or unresolved in this ruling.

Judge’s Rejection of the Settlement

The settlement was rejected “without prejudice” mainly for procedural reasons, not because the dollar amount is clearly too low or too high.
Concerns raised:
- How authors are notified, how they file claims, and how payments are administered.
- Whether Anthropic is properly protected from later, duplicative suits (“double dipping”).
- Whether lawyers’ fees will consume too much of the $1.5B pool.
Commenters expect the parties could fix these issues without changing the per-book amount.

Is ~$3,000 per Book Fair?

One author in the thread (with 3 included books) feels ~$9k total is fair, especially for titles with low advances that never earned out.
Others argue it’s too low relative to statutory damages (up to $150k per willful infringement), the value of models built on the corpus, and the deterrence needed so companies don’t just “steal first, pay later.”
Some see it as a windfall: compared to a $30 book, ~$3k/book is ~100×, clearly punitive for one pirated copy.
Disputes arise over who should get the money (authors vs publishers; impact of advances and rights arrangements).

Copying, Fair Use, and “Statistical” Learning

Long subthread debates whether training is “copying” or merely learning statistics:
- One side: training explicitly reproduces sequences during optimization; models can regurgitate text and code; this is causally linked to underlying infringement.
- Other side: proper training is about aggregate statistics, not exact memorization; accidental verbatim output is overfitting, not the intent.
Analogy battles: pirated Photoshop used to make a game; humans imitating style; music “substantial similarity” cases; whether style vs expression is protectable.
Some insist copyright should hinge on outputs (substantial similarity), not internal representations; others say illegal acquisition itself is enough to trigger liability.

Humans vs Machines

One camp warns: if “learning from copyrighted works” is treated as infringement, it logically extends to humans and would destroy normal artistic practice.
The opposing view: law can—and should—treat corporate AI systems differently from human creators; scale, profit motive, and replace-all-creative-work ambitions matter.

Broader IP and AI Concerns

Philosophical split:
- Anti-IP voices say copyright is overlong, protects incumbents, and isn’t needed for creativity in many domains.
- Pro-IP voices argue that large, risky investments (drugs, blockbuster films, complex software) depend on enforceable rights.
Some predict generative AI will erode markets for books, news, and other writing by capturing value without paying sources; others doubt it has meaningfully replaced books for them.

Related topics