Zuckerberg approved training Llama on LibGen [pdf]

Meta, LibGen, and the LLaMA Lawsuit

  • Thread centers on court filings showing Meta leadership approved downloading LibGen (shadow library of pirated books) data for LLaMA training.
  • Some see this as straightforward, large‑scale, willful copyright infringement (downloading and distributing pirated works).
  • Others argue the key unresolved legal question is whether training on such data (as opposed to outputting it) violates copyright.

Copyright, Fair Use, and Model Training

  • One side:
    • Training on unlicensed, pirated content is no different from any other copyright violation.
    • Evidence of torrenting and seeding is particularly damning.
    • “Free to use” models still underpin commercial products, so noncommercial rhetoric is irrelevant.
  • Other side:
    • Models are not archives or compression of the training set; weights are tiny relative to input data.
    • The real legal issue is reproducing copyrighted text in outputs, not ingesting it.
    • Training is likened to a human learning from books, which is not restricted.

Big Tech vs Big Copyright and Power Asymmetry

  • Many highlight perceived hypocrisy: big tech aggressively enforces its own IP while ignoring others’.
  • Some expect an eventual narrow “AI training exemption” or compulsory licensing regime that entrenches big players and harms smaller competitors.
  • Comparison with other platforms (YouTube, Spotify, Reddit, Google Books) where initial piracy or uncompensated use eventually led to negotiated deals.

Shadow Libraries and Access to Knowledge

  • LibGen and similar sites are praised as de facto global research libraries, especially where paywalls and high per‑article prices block access.
  • Frustration that individuals have been heavily punished for similar behavior, while corporations quietly exploit the same resources.
  • Repeated references to past prosecutions over academic journal downloads to highlight “free for me, not for thee.”

Economic and Social Fallout

  • Concerns about creators’ livelihoods if training on copyrighted works is free and widespread.
  • Others argue royalties are already negligible in a saturated attention economy; copyright has been eroding since the internet.
  • Broader anxiety about AI, inequality, and whether responses like UBI or stronger IP enforcement are viable or will just benefit elites.