2025-11-27

The current state of the theory that GPL propagates to AI models

License vs. copyright, and the role of fair use

Much of the debate is framed as copyright, not contract: if training is fair use, license terms (GPL, MIT, proprietary) may not bite at all.
Some argue that in the US, training on legally obtained public material is already treated as fair use, making license type irrelevant.
Others push back: fair use is US‑specific, limited or absent elsewhere, and not clearly settled for LLMs; litigation is ongoing and outcomes may diverge by domain and jurisdiction.

GPL enforceability and “virality”

Commenters distinguish between enforcing GPL on GPL code itself (well‑tested) vs enforcing “propagation” to larger combined works (much less tested).
Several note that GPL doesn’t magically relicense other code; it simply withholds permission to use GPL code unless distribution conditions are met.
Enforcement history (BusyBox, Cisco, French judgments) is cited as supporting GPL’s robustness, but mostly on straightforward distribution violations, not on exotic propagation theories.

Does GPL propagate to models or outputs?

Many doubt that models trained on GPL code become GPL themselves, or that all outputs inherit GPL terms; that’s seen as an extreme, legally unsupported position.
Others argue that if a model can reproduce GPL’d code (or large chunks of copyrighted text) on demand, that looks like copying, not mere “learning.”
Analogy disputes: some equate training to humans learning from code; others stress that LLMs are stored, redistributable artifacts, unlike human brains.

New license ideas and free‑software tensions

Proposals include licenses that forbid AI training entirely, or allow it only if resulting models and weights are open.
Critics say such clauses would violate “freedom 0” and likely be non‑free; under GPLv3 they might also count as “further restrictions.”
Others suspect courts would treat anti‑training clauses as void where training is fair use, or require contract‑style click‑through instead of pure copyright licenses.

Proof, training data, and “copyright laundering”

A recurring concern: models act as “copyright‑laundering machines” – mining open and copyleft code into proprietary services with little traceability.
People ask how to prove a model used GPL/AGPL data, and conversely how to prove that particular outputs are clean.
Suggested mechanisms: discovery in litigation, training‑data disclosure mandates, model inversion / extraction research, or requiring published datasets.

Policy, reform, and community reaction

Some want legislative clarification or shorter copyright terms plus opt‑in public datasets with royalties.
Others distrust new laws, pointing to DMCA‑style capture by large firms, and prefer courts refining fair‑use boundaries.
There is visible disillusionment: some stop contributing to OSS, feeling licenses are ignored; others embrace LLMs as transformative productivity tools, deepening the values split inside the developer community.

Related topics