An embarrassingly simple approach to recover unlearned knowledge for LLMs

Overview of the result

  • Paper claims: model “unlearning” is often implemented as small weight updates that suppress specific knowledge while preserving overall performance.
  • Discussion consensus: quantization can effectively erase those tiny deltas, making the “forgotten” knowledge accessible again in the quantized model.
  • Several commenters liken this to removing a thin layer of censorship rather than erasing the underlying memory.

Unlearning vs. guardrails

  • Distinction:
    • Unlearning = trying to make the model truly forget certain facts via weight changes.
    • Guardrails = instructing the model not to say certain things, while the knowledge remains.
  • Multiple comments argue most current “unlearning” is closer to “guardrails in weights” – lowering the probability of certain outputs.
  • From an information-theoretic angle, some argue that if information can be recovered by any process (like quantization or clever prompting), it was never really removed.

Threat models, safety, and misuse

  • Concern: if unlearning is fragile, models “cleaned” of harmful or copyrighted content may still leak it via quantization or other transformations.
  • Specific risks mentioned: instructions for drugs, poisons, explosives, and other illegal activities.
  • Counterpoint: much of this information is already widely available (e.g., manuals, Wikipedia), and regulators often fixate on AI while ignoring existing channels.
  • Some expect future “quantization-robust unlearning,” but others think quantization is just one of many ways to undo weak unlearning.

Copyright, data ownership, and ethics

  • Long subthread criticizes LLMs as extracting value from a public good (the internet) without compensating most creators, especially small ones.
  • Others compare this to humans, teachers, or encyclopedias learning from and reselling knowledge, arguing the key issue is verbatim copying and IP misuse, not training itself.
  • There is disagreement on whether current practices are “theft” or transformative fair use; courts and new laws are seen as inevitable.

Broader AI debates

  • Some see this as more evidence that we’re just hacking censorship layers onto “spicy autocomplete,” and that speculative AGI/“superalignment” discourse distracts from present harms.
  • Others argue long-term transformative impact of AI is still likely, analogous (positively or negatively) to past overhyped technologies like 3D printing.

Paper quality and language

  • One commenter criticizes the English of the preprint; others respond that it’s just an arXiv draft, that the writing is acceptable, and that attacking non-native English is unfair or racist.