LLMs corrupt your documents when you delegate

Overall Reaction to the Paper

  • Many see the results as unsurprising: repeatedly “washing” text through LLMs predictably degrades fidelity, akin to repeated JPEG compression or the telephone game.
  • Others argue the experiment is misleading because it doesn’t reflect how serious users or modern agents actually edit documents.

How and Where Degradation Shows Up

  • Users report that each LLM “edit” tends to blur nuance, flatten style, and regress toward generic, “mean” prose (“semantic ablation,” “mean reversion”).
  • Degradation is especially visible in:
    • Long-running editing sessions,
    • Large documents or complex formats (codebases, spreadsheets → markdown),
    • Tasks where the LLM is allowed to rewrite whole files instead of making local changes.
  • Examples include lost sharp phrases, sanitized language, genericized resumes, and codebases that gradually accumulate subtle mistakes.

Humans vs LLMs

  • Some argue a human forced to reproduce a whole document from memory with edits would degrade it even more, so the benchmark setup is unrealistic.
  • Others counter that:
    • Humans don’t work that way; they reference the source and can deliberately aim for exact copying.
    • LLMs lack persistent memory and a concept of “accuracy,” so they need explicit safeguards.

Harnesses, Tools, and “Agentic” Design

  • Strong view that the paper’s simple read/write harness is the real problem: modern coding agents use diff/patch, str_replace, and other surgical tools, avoiding full round-trip rewrites.
  • Critics claim a better-designed harness and prompts would significantly reduce corruption; supporters note most end-users don’t have such setups and just paste documents into chat UIs.

Practical Guidance & Workflows

  • Use LLMs to:
    • Generate drafts, prototypes, and helper scripts, not as unsupervised delegates on canonical documents.
    • Write deterministic tools that then transform data, rather than having the LLM “speak out” the transformed result.
  • Always:
    • Keep documents small and task-scoped.
    • Review diffs, run tests, and constrain write access (e.g., separate users, no direct git pushes).
    • Treat LLM layers as “lossy” or “bullshit” layers between deterministic systems.

Broader Concerns

  • Several connect this to “model collapse” and the risk of AI-generated slop eroding both training data and human knowledge online.
  • Others stress that despite inherent, seemingly “incorrigible” errors, LLMs remain highly useful when bounded, audited, and used as assistive—not autonomous—systems.