The current state of the theory that GPL propagates to AI models

License vs. copyright, and the role of fair use

  • Much of the debate is framed as copyright, not contract: if training is fair use, license terms (GPL, MIT, proprietary) may not bite at all.
  • Some argue that in the US, training on legally obtained public material is already treated as fair use, making license type irrelevant.
  • Others push back: fair use is US‑specific, limited or absent elsewhere, and not clearly settled for LLMs; litigation is ongoing and outcomes may diverge by domain and jurisdiction.

GPL enforceability and “virality”

  • Commenters distinguish between enforcing GPL on GPL code itself (well‑tested) vs enforcing “propagation” to larger combined works (much less tested).
  • Several note that GPL doesn’t magically relicense other code; it simply withholds permission to use GPL code unless distribution conditions are met.
  • Enforcement history (BusyBox, Cisco, French judgments) is cited as supporting GPL’s robustness, but mostly on straightforward distribution violations, not on exotic propagation theories.

Does GPL propagate to models or outputs?

  • Many doubt that models trained on GPL code become GPL themselves, or that all outputs inherit GPL terms; that’s seen as an extreme, legally unsupported position.
  • Others argue that if a model can reproduce GPL’d code (or large chunks of copyrighted text) on demand, that looks like copying, not mere “learning.”
  • Analogy disputes: some equate training to humans learning from code; others stress that LLMs are stored, redistributable artifacts, unlike human brains.

New license ideas and free‑software tensions

  • Proposals include licenses that forbid AI training entirely, or allow it only if resulting models and weights are open.
  • Critics say such clauses would violate “freedom 0” and likely be non‑free; under GPLv3 they might also count as “further restrictions.”
  • Others suspect courts would treat anti‑training clauses as void where training is fair use, or require contract‑style click‑through instead of pure copyright licenses.

Proof, training data, and “copyright laundering”

  • A recurring concern: models act as “copyright‑laundering machines” – mining open and copyleft code into proprietary services with little traceability.
  • People ask how to prove a model used GPL/AGPL data, and conversely how to prove that particular outputs are clean.
  • Suggested mechanisms: discovery in litigation, training‑data disclosure mandates, model inversion / extraction research, or requiring published datasets.

Policy, reform, and community reaction

  • Some want legislative clarification or shorter copyright terms plus opt‑in public datasets with royalties.
  • Others distrust new laws, pointing to DMCA‑style capture by large firms, and prefer courts refining fair‑use boundaries.
  • There is visible disillusionment: some stop contributing to OSS, feeling licenses are ignored; others embrace LLMs as transformative productivity tools, deepening the values split inside the developer community.