A federal judge sides with Anthropic in lawsuit over training AI on books
Scope of the ruling: training vs piracy
- Many commenters read the decision as:
- Training LLMs on copyrighted books can be fair use if the use is transformative and the model isn’t a market substitute for the works.
- Acquiring books via piracy is not fair use; the judge calls that “inherently, irredeemably infringing,” and leaves damages for a separate trial.
- Several see this as analogous to Google Books: destructive scanning of purchased books and storing full text is allowed if downstream access is constrained.
Transformative use and human analogies
- Supporters argue training is like a person reading books and forming internal representations, then creating new works; copyright protects expression, not ideas or knowledge.
- Critics respond that an LLM is “a tool, not a person,” and that calling its learning “reading” is anthropomorphism; a model is a machine built from copyrighted works.
- Scale is a key counterargument: a human can’t memorize or reproduce millions of works; LLMs can approximate that at industrial scale.
Memorization, outputs, and open weights
- The order assumes, for the sake of argument, substantial memorization, but finds training fair use when outputs are filtered to prevent verbatim reproduction, analogizing to Google’s snippet limits.
- This worries some:
- Hosted, filtered models may be safe, but open-weight models might be vulnerable if users can extract memorized text.
- Others point to a separate case holding that model weights themselves are not infringing derivative works; infringement turns on specific outputs and uses.
Contracts, licenses, and “no AI training” clauses
- Commenters debate whether publishers can block training via contract terms or EULAs.
- Physical books generally lack enforceable licenses beyond copyright; ebooks and databases are different.
- Fair use can override the need for a license, but not a signed contract—breach-of-contract remedies would be separate from copyright.
Economic and ethical concerns
- Skeptics see “plagiarism automation at scale”: a small number of firms monetize the distilled product of billions of human hours without compensation, potentially chilling future creation and driving DRM and information silos.
- Others emphasize copyright’s constitutional purpose (promote progress, not guarantee pay for every use) and warn against using copyright to halt a broadly useful technology.
- Some propose intermediary solutions, like an “LLM levy” analogous to cassette-copying royalties, with pooled payments to rights holders.