Zuckerberg approved training Llama on LibGen [pdf]
Meta, LibGen, and the LLaMA Lawsuit
- Thread centers on court filings showing Meta leadership approved downloading LibGen (shadow library of pirated books) data for LLaMA training.
- Some see this as straightforward, large‑scale, willful copyright infringement (downloading and distributing pirated works).
- Others argue the key unresolved legal question is whether training on such data (as opposed to outputting it) violates copyright.
Copyright, Fair Use, and Model Training
- One side:
- Training on unlicensed, pirated content is no different from any other copyright violation.
- Evidence of torrenting and seeding is particularly damning.
- “Free to use” models still underpin commercial products, so noncommercial rhetoric is irrelevant.
- Other side:
- Models are not archives or compression of the training set; weights are tiny relative to input data.
- The real legal issue is reproducing copyrighted text in outputs, not ingesting it.
- Training is likened to a human learning from books, which is not restricted.
Big Tech vs Big Copyright and Power Asymmetry
- Many highlight perceived hypocrisy: big tech aggressively enforces its own IP while ignoring others’.
- Some expect an eventual narrow “AI training exemption” or compulsory licensing regime that entrenches big players and harms smaller competitors.
- Comparison with other platforms (YouTube, Spotify, Reddit, Google Books) where initial piracy or uncompensated use eventually led to negotiated deals.
Shadow Libraries and Access to Knowledge
- LibGen and similar sites are praised as de facto global research libraries, especially where paywalls and high per‑article prices block access.
- Frustration that individuals have been heavily punished for similar behavior, while corporations quietly exploit the same resources.
- Repeated references to past prosecutions over academic journal downloads to highlight “free for me, not for thee.”
Economic and Social Fallout
- Concerns about creators’ livelihoods if training on copyrighted works is free and widespread.
- Others argue royalties are already negligible in a saturated attention economy; copyright has been eroding since the internet.
- Broader anxiety about AI, inequality, and whether responses like UBI or stronger IP enforcement are viable or will just benefit elites.