Relicensing with AI-Assisted Rewrite

Context: AI-assisted chardet rewrite and relicensing

  • Thread centers on a Python library (chardet) whose v7 was rewritten with an LLM and relicensed from LGPL to MIT while keeping name and API.
  • Many see this as a test case for whether AI-assisted “rewrites” can shed copyleft obligations or if the result is still a derivative work.

Clean-room reimplementation vs AI use

  • Classic “clean room” pattern: one team studies original, writes a spec; a second, untainted team implements only from that spec. IBM PC BIOS and NEC v. Intel are cited as precedents.
  • People debate whether an LLM can ever be “clean” if it was trained on the original codebase or similar code.
  • Some propose 2-model or 2-phase pipelines (model A derives specs, model B writes code) as an automated clean room; others argue training contamination makes this non-credible.

AI authorship, copyrightability, and public domain

  • A recent U.S. decision is discussed: AI itself cannot be an “author”; only humans using AI can hold copyright.
  • One lawyer in the thread stresses this does not mean AI-assisted works are copyright-free; the human operator is usually the author.
  • Others explore the idea that purely machine-generated outputs might fall into a “public domain by default” hole, but note this is legally unsettled and jurisdiction-dependent.

Impact on GPL/copyleft and open source

  • Strong concern that if “AI laundering” can relicense GPL/LGPL code, copyleft effectively dies; any project could be run through an LLM and reissued under MIT or proprietary terms.
  • Some fear this will push developers away from open source toward closed code or a “dark forest” where nothing is published.
  • Others argue that even if code becomes cheap to rewrite, the real leverage remains in maintenance, community, and support.

LLM training data, fair use, and legality

  • Ongoing disputes over whether training on all public code (including GPL and proprietary) is fair use.
  • Some point to recent U.S. rulings treating training as transformative and fair; others emphasize these are not Supreme Court-level precedents, and other countries may diverge.
  • Concerns raised that models can reproduce sizeable verbatim chunks of training data (code, books), making them potential infringers or de facto copyright “laundromats.”

Ethical and practical reactions

  • Many view the chardet relicensing as ethically “scummy” or reckless, especially given visible overlap (tests, metadata, docstrings).
  • There’s discussion of tools to fingerprint similarity and of using AI for reverse engineering games and binaries, showing how cheap cloning has become.
  • Some propose radical responses: forcing AI outputs under GPL, taxing AI companies, or mandating open-weight models when trained on public data; others dismiss these as impractical.