2024-08-28

Judge dismisses majority of GitHub Copilot copyright claims

Scope of the ruling

Commenters note the judge mainly dismissed DMCA “copyright management information” (CMI) claims (17 USC 1202), not all copyright issues.
Two claims survive: breach of contract and open‑source license violations; some see these as potentially important but legally weaker.
Several emphasize the decision is about output behavior, not definitively about the legality of training on copyrighted code.

Reproduction vs training-time infringement

Many argue copyright law is primarily about unauthorized duplication, not merely reading or accessing works.
One view: Copilot rarely emits memorized code in “benign” situations, so plaintiffs struggled to show specific, infringing reproductions with removed CMI.
Counter‑view: even rare verbatim regurgitation matters; if a model can output copyrighted code, both provider and user may face infringement claims.

Liability: who is responsible?

One camp: machines have no agency; the human who copies model output into a product is the infringer, analogous to copying from Stack Overflow.
Another camp: the operator (e.g. Copilot provider) is distributing copies on request, similar to Napster or other services facilitating mass infringement.
Some expect corporate tools will add second‑layer scanners to flag outputs that match known copyrighted code.

AI as “copyright laundering”

Strong worry that LLMs let companies “wash” open‑source and GPL code into proprietary products, selling assistance built on unpaid community labor.
Others argue someone intent on stealing code can already just download it; using an LLM is a roundabout, weak “loophole”.

Open source, centralization, and power

Several express betrayal: open‑source contributions now fuel closed, capital‑intensive AI services they can’t replicate or audit.
There’s concern that AI breaks copyright only to re‑centralize control in large firms with the compute to train models.
Responses range from quitting open source or self‑hosting git, to deliberately using public domain or “no‑restrictions” licensing to support open models.

Human vs machine learning and clean-room analogies

Long debate over whether “training” is analogous to a human reading code:
- One side says yes; if humans may learn from code, tools helping them should also be allowed.
- The other side stresses that legal rights attach to humans, not models; scale and automation change the equation.
Clean‑room reverse engineering is repeatedly invoked as the traditional, more disciplined way to avoid infringement.

Practical risks and anecdotes

Multiple anecdotes describe LLMs outputting near‑identical code (including typos) from older online examples, sometimes from repos with no license.
This leads some developers to only use AI for guidance, not for directly copying suggested code.

Related topics