2026-03-05

Relicensing with AI-Assisted Rewrite

Context: AI-assisted chardet rewrite and relicensing

Thread centers on a Python library (chardet) whose v7 was rewritten with an LLM and relicensed from LGPL to MIT while keeping name and API.
Many see this as a test case for whether AI-assisted “rewrites” can shed copyleft obligations or if the result is still a derivative work.

Clean-room reimplementation vs AI use

Classic “clean room” pattern: one team studies original, writes a spec; a second, untainted team implements only from that spec. IBM PC BIOS and NEC v. Intel are cited as precedents.
People debate whether an LLM can ever be “clean” if it was trained on the original codebase or similar code.
Some propose 2-model or 2-phase pipelines (model A derives specs, model B writes code) as an automated clean room; others argue training contamination makes this non-credible.

AI authorship, copyrightability, and public domain

A recent U.S. decision is discussed: AI itself cannot be an “author”; only humans using AI can hold copyright.
One lawyer in the thread stresses this does not mean AI-assisted works are copyright-free; the human operator is usually the author.
Others explore the idea that purely machine-generated outputs might fall into a “public domain by default” hole, but note this is legally unsettled and jurisdiction-dependent.

Impact on GPL/copyleft and open source

Strong concern that if “AI laundering” can relicense GPL/LGPL code, copyleft effectively dies; any project could be run through an LLM and reissued under MIT or proprietary terms.
Some fear this will push developers away from open source toward closed code or a “dark forest” where nothing is published.
Others argue that even if code becomes cheap to rewrite, the real leverage remains in maintenance, community, and support.

LLM training data, fair use, and legality

Ongoing disputes over whether training on all public code (including GPL and proprietary) is fair use.
Some point to recent U.S. rulings treating training as transformative and fair; others emphasize these are not Supreme Court-level precedents, and other countries may diverge.
Concerns raised that models can reproduce sizeable verbatim chunks of training data (code, books), making them potential infringers or de facto copyright “laundromats.”

Ethical and practical reactions

Many view the chardet relicensing as ethically “scummy” or reckless, especially given visible overlap (tests, metadata, docstrings).
There’s discussion of tools to fingerprint similarity and of using AI for reverse engineering games and binaries, showing how cheap cloning has become.
Some propose radical responses: forcing AI outputs under GPL, taxing AI companies, or mandating open-weight models when trained on public data; others dismiss these as impractical.

Related topics