A federal judge sides with Anthropic in lawsuit over training AI on books

Scope of the ruling: training vs piracy

  • Many commenters read the decision as:
    • Training LLMs on copyrighted books can be fair use if the use is transformative and the model isn’t a market substitute for the works.
    • Acquiring books via piracy is not fair use; the judge calls that “inherently, irredeemably infringing,” and leaves damages for a separate trial.
  • Several see this as analogous to Google Books: destructive scanning of purchased books and storing full text is allowed if downstream access is constrained.

Transformative use and human analogies

  • Supporters argue training is like a person reading books and forming internal representations, then creating new works; copyright protects expression, not ideas or knowledge.
  • Critics respond that an LLM is “a tool, not a person,” and that calling its learning “reading” is anthropomorphism; a model is a machine built from copyrighted works.
  • Scale is a key counterargument: a human can’t memorize or reproduce millions of works; LLMs can approximate that at industrial scale.

Memorization, outputs, and open weights

  • The order assumes, for the sake of argument, substantial memorization, but finds training fair use when outputs are filtered to prevent verbatim reproduction, analogizing to Google’s snippet limits.
  • This worries some:
    • Hosted, filtered models may be safe, but open-weight models might be vulnerable if users can extract memorized text.
    • Others point to a separate case holding that model weights themselves are not infringing derivative works; infringement turns on specific outputs and uses.

Contracts, licenses, and “no AI training” clauses

  • Commenters debate whether publishers can block training via contract terms or EULAs.
  • Physical books generally lack enforceable licenses beyond copyright; ebooks and databases are different.
  • Fair use can override the need for a license, but not a signed contract—breach-of-contract remedies would be separate from copyright.

Economic and ethical concerns

  • Skeptics see “plagiarism automation at scale”: a small number of firms monetize the distilled product of billions of human hours without compensation, potentially chilling future creation and driving DRM and information silos.
  • Others emphasize copyright’s constitutional purpose (promote progress, not guarantee pay for every use) and warn against using copyright to halt a broadly useful technology.
  • Some propose intermediary solutions, like an “LLM levy” analogous to cassette-copying royalties, with pooled payments to rights holders.