Thomson Reuters wins first major AI copyright case in the US

What the case was actually about

  • Ross Intelligence built a legal search tool meant to substitute for Westlaw.
  • They used Westlaw headnotes and key-number indexing as training data, via human “translations” of those headnotes.
  • The court held that Westlaw’s headnotes are individually copyrightable works, not mere uncopyrightable facts or raw case law.
  • Ross’s system was essentially a semantic search engine over those headnotes (non‑generative), returning case opinions, not new text.

Fair use analysis and copyrightability

  • The judge emphasized purpose and market effect: Ross meant to create a cheaper market substitute using Westlaw’s value‑add annotations.
  • That intent and commercial competition weighed heavily against fair use, even though end users did not see verbatim headnotes.
  • The opinion analogizes headnote selection to sculpture: choosing which parts of an opinion to quote or summarize is a creative act.
  • Some commenters think extending copyright to “selection” of quotes is overbroad and likely vulnerable on appeal; others say it fits existing doctrine that creativity, not “sweat of the brow,” is what matters.

Implications for AI and LLM training

  • One camp sees this as a narrow ruling about copying proprietary summaries to build a directly competing search service, not about broad LLM training.
  • Another camp thinks the fair‑use reasoning (non‑transformative, market‑substituting use of copyrighted inputs) is a worrying precedent for generative AI trained on news, books, art, etc.
  • Debate splits over whether training itself is infringement or only distribution/outputs matter; there’s no consensus in the thread.
  • Some note that if generative AIs exist mainly to be cheaper substitutes for human creators, that undercuts fair‑use arguments.

Power, open source, and future licensing regimes

  • Many expect large AI vendors to pivot to licensing major corpora, further entrenching big tech and legacy media, and locking out open‑source and small players.
  • Others argue that licensing “everything” is practically impossible; this pressure might force new law (e.g., compulsory licensing/collecting societies for training data).
  • Several commenters explicitly prefer strong enforcement, even if it slows or restricts AI, to prevent uncompensated mass appropriation.

Broader concerns and side discussions

  • Worries about non‑Western models ignoring copyright and outpacing Western systems.
  • Analogies drawn to Google News snippets, phone books, court reporters, and educational fair use.
  • Some argue this is “good for humans” and creators; others see it as shifting AI power from startups and open communities to a few well‑capitalized firms.