Judge dismisses majority of GitHub Copilot copyright claims
Scope of the ruling
- Commenters note the judge mainly dismissed DMCA “copyright management information” (CMI) claims (17 USC 1202), not all copyright issues.
- Two claims survive: breach of contract and open‑source license violations; some see these as potentially important but legally weaker.
- Several emphasize the decision is about output behavior, not definitively about the legality of training on copyrighted code.
Reproduction vs training-time infringement
- Many argue copyright law is primarily about unauthorized duplication, not merely reading or accessing works.
- One view: Copilot rarely emits memorized code in “benign” situations, so plaintiffs struggled to show specific, infringing reproductions with removed CMI.
- Counter‑view: even rare verbatim regurgitation matters; if a model can output copyrighted code, both provider and user may face infringement claims.
Liability: who is responsible?
- One camp: machines have no agency; the human who copies model output into a product is the infringer, analogous to copying from Stack Overflow.
- Another camp: the operator (e.g. Copilot provider) is distributing copies on request, similar to Napster or other services facilitating mass infringement.
- Some expect corporate tools will add second‑layer scanners to flag outputs that match known copyrighted code.
AI as “copyright laundering”
- Strong worry that LLMs let companies “wash” open‑source and GPL code into proprietary products, selling assistance built on unpaid community labor.
- Others argue someone intent on stealing code can already just download it; using an LLM is a roundabout, weak “loophole”.
Open source, centralization, and power
- Several express betrayal: open‑source contributions now fuel closed, capital‑intensive AI services they can’t replicate or audit.
- There’s concern that AI breaks copyright only to re‑centralize control in large firms with the compute to train models.
- Responses range from quitting open source or self‑hosting git, to deliberately using public domain or “no‑restrictions” licensing to support open models.
Human vs machine learning and clean-room analogies
- Long debate over whether “training” is analogous to a human reading code:
- One side says yes; if humans may learn from code, tools helping them should also be allowed.
- The other side stresses that legal rights attach to humans, not models; scale and automation change the equation.
- Clean‑room reverse engineering is repeatedly invoked as the traditional, more disciplined way to avoid infringement.
Practical risks and anecdotes
- Multiple anecdotes describe LLMs outputting near‑identical code (including typos) from older online examples, sometimes from repos with no license.
- This leads some developers to only use AI for guidance, not for directly copying suggested code.