Anthropic judge rejects $1.5B AI copyright settlement

What the Case Is Actually About

  • Multiple commenters stress that this suit is not about training on copyrighted books in general.
  • The judge has already ruled that using purchased and scanned books for training is fair use; the problem is Anthropic downloading pirated copies (LibGen, Pirate Library Mirror) and keeping them in a “central library.”
  • The alleged infringement is at procurement / library creation time, not at model-training time. Whether using pirated copies for training is fair use is described as ambiguous or unresolved in this ruling.

Judge’s Rejection of the Settlement

  • The settlement was rejected “without prejudice” mainly for procedural reasons, not because the dollar amount is clearly too low or too high.
  • Concerns raised:
    • How authors are notified, how they file claims, and how payments are administered.
    • Whether Anthropic is properly protected from later, duplicative suits (“double dipping”).
    • Whether lawyers’ fees will consume too much of the $1.5B pool.
  • Commenters expect the parties could fix these issues without changing the per-book amount.

Is ~$3,000 per Book Fair?

  • One author in the thread (with 3 included books) feels ~$9k total is fair, especially for titles with low advances that never earned out.
  • Others argue it’s too low relative to statutory damages (up to $150k per willful infringement), the value of models built on the corpus, and the deterrence needed so companies don’t just “steal first, pay later.”
  • Some see it as a windfall: compared to a $30 book, ~$3k/book is ~100×, clearly punitive for one pirated copy.
  • Disputes arise over who should get the money (authors vs publishers; impact of advances and rights arrangements).

Copying, Fair Use, and “Statistical” Learning

  • Long subthread debates whether training is “copying” or merely learning statistics:
    • One side: training explicitly reproduces sequences during optimization; models can regurgitate text and code; this is causally linked to underlying infringement.
    • Other side: proper training is about aggregate statistics, not exact memorization; accidental verbatim output is overfitting, not the intent.
  • Analogy battles: pirated Photoshop used to make a game; humans imitating style; music “substantial similarity” cases; whether style vs expression is protectable.
  • Some insist copyright should hinge on outputs (substantial similarity), not internal representations; others say illegal acquisition itself is enough to trigger liability.

Humans vs Machines

  • One camp warns: if “learning from copyrighted works” is treated as infringement, it logically extends to humans and would destroy normal artistic practice.
  • The opposing view: law can—and should—treat corporate AI systems differently from human creators; scale, profit motive, and replace-all-creative-work ambitions matter.

Broader IP and AI Concerns

  • Philosophical split:
    • Anti-IP voices say copyright is overlong, protects incumbents, and isn’t needed for creativity in many domains.
    • Pro-IP voices argue that large, risky investments (drugs, blockbuster films, complex software) depend on enforceable rights.
  • Some predict generative AI will erode markets for books, news, and other writing by capturing value without paying sources; others doubt it has meaningfully replaced books for them.