Judge said Meta illegally used books to build its AI
Status of the Case & Headline Dispute
- Commenters stress the judge has not ruled; this was a pretrial hearing on summary judgment.
- The original Wired title framed the case as about “the next Taylor Swift,” while the HN title implies a ruling that hasn’t happened.
- The judge appears focused on economic harm: plaintiffs must show Meta’s models plausibly reduce sales or market value of their works, not just that training “feels wrong.”
Harm, Markets, and Substitution
- Many see the plaintiffs’ focus on “lost book sales” as a weak theory of harm, especially given current AI fiction quality.
- Others argue the real harm is long‑term erosion of creative livelihoods and diversion of readers’ attention toward platform content (feeds, chats, AI outputs), which is hard to quantify.
- Debate over whether LLM summaries or “books in the style of X” materially substitute for reading or buying the originals. Analogies invoked: Reader’s Digest, Wikipedia, book‑summary sites.
Fair Use, Copying, and Training vs Outputs
- One camp: training is a transformative, intermediate use akin to search indexing or humans learning; infringement, if any, arises only when outputs substantially reproduce a protected work.
- Opposing camp: copying entire books (especially from pirate sources) to train a commercial model is itself infringement, regardless of what the model later emits. Napster/DVD‑copying comparisons recur.
- Long subthread on whether LLMs are “just tools” or effectively lossy compression of the corpus, capable of verbatim regurgitation; this matters for whether model weights are “copies.”
- Human‑learning analogies are attacked as legally irrelevant, since brains aren’t regulated as fixed media under copyright.
Piracy Sources and Double Standards
- Strong criticism that Meta allegedly used LibGen/Books3–type corpora: “ordinary” downloaders were punished for similar behavior, yet big firms seek fair‑use shelter.
- Others counter that the primary infringers are the uploaders/hosts, and that merely downloading (without seeding) was rarely prosecuted.
Policy, Power, and Global Competition
- Some want drastic copyright reform or abolition; others are furious that corporations may get exceptions while individuals remain exposed.
- Several predict stricter Western rules will advantage Chinese models trained on “everything,” pushing innovation offshore.
Proposed Directions & Workarounds
- Suggestions include:
- Training only on licensed, open, or public‑domain corpora; emerging “legal” foundation models.
- Content‑ID‑like systems for AI outputs, with revenue sharing.
- Per‑book or catalog licenses for training, priced by the market.
- Tools to trace outputs back to training data and enforce attribution or opt‑out.