2024-07-09

Judge dismisses DMCA copyright claim in GitHub Copilot suit

Case outcome and legal reasoning

Judge dismissed the DMCA §1202(b) claim because plaintiffs did not show Copilot outputting their code identically with copyright/attribution stripped, which §1202 requires.
Commenters note the ruling is narrow: about DMCA “copyright management information,” not all copyright issues.
Some think plaintiffs’ strategy was weak: they alleged verbatim copying but couldn’t produce a single accepted example from their own code.

Evidence and “identicality”

People recall public demos of Copilot reproducing famous snippets (e.g., Quake fast inverse sqrt) or NYT text, but note:
- Those rights-holders weren’t plaintiffs here.
- Courts require evidence tied to plaintiffs’ works, not “in theory this happens.”
GitHub reportedly added a “copyright filter”; debate on whether that’s prudence or “destroying evidence.” Others note old versions still exist and can be subpoenaed.

Training on copyrighted code and fair use

One side: training on public code (even GPL, art, prose) is non‑infringing “learning”; function and style aren’t protected, only expression.
Other side: training creates a derivative commercial product built on copyrighted works without consent or compensation; fair use was never meant for mass AI training.
Dispute over whether model weights are a derivative work and whether paraphrased output can still infringe or violate licenses (e.g., GPL conditions, attribution).

Ethics, scale, and impact on creators

Critics see AI training on non‑consenting artists’ and coders’ work as “pure exploitation,” especially when it displaces their income.
Defenders argue automation has always displaced labor; the economic problem is distribution, not the tool.
Scale and lack of accountability of machine agents are recurring concerns.

Licensing, GitHub, and OSS reactions

Debate on whether GitHub’s ToS gives it rights to use code for Copilot; some quote language that seems limited to “providing the service.”
Edge cases: code uploaded by non‑authors; GPL projects mirrored on GitHub; authors whose code was uploaded by others.
Some propose “no-AI” or anti-training licenses; others note if training is ruled fair use, such clauses may be ineffective and are not FOSS.
A few developers say they’ll stop publishing open source or avoid GitHub; others think the OSS ecosystem will largely continue.

Technical behavior of LLMs

Discussion of memorization vs abstraction: models usually compress patterns, but can “recite” training data in some prompts.
Filters that avoid verbatim output don’t prevent close paraphrases, which may still raise legal and ethical questions.

Related topics