2026-05-09

LLMs corrupt your documents when you delegate

Overall Reaction to the Paper

Many see the results as unsurprising: repeatedly “washing” text through LLMs predictably degrades fidelity, akin to repeated JPEG compression or the telephone game.
Others argue the experiment is misleading because it doesn’t reflect how serious users or modern agents actually edit documents.

How and Where Degradation Shows Up

Users report that each LLM “edit” tends to blur nuance, flatten style, and regress toward generic, “mean” prose (“semantic ablation,” “mean reversion”).
Degradation is especially visible in:
- Long-running editing sessions,
- Large documents or complex formats (codebases, spreadsheets → markdown),
- Tasks where the LLM is allowed to rewrite whole files instead of making local changes.
Examples include lost sharp phrases, sanitized language, genericized resumes, and codebases that gradually accumulate subtle mistakes.

Humans vs LLMs

Some argue a human forced to reproduce a whole document from memory with edits would degrade it even more, so the benchmark setup is unrealistic.
Others counter that:
- Humans don’t work that way; they reference the source and can deliberately aim for exact copying.
- LLMs lack persistent memory and a concept of “accuracy,” so they need explicit safeguards.

Harnesses, Tools, and “Agentic” Design

Strong view that the paper’s simple read/write harness is the real problem: modern coding agents use diff/patch, str_replace, and other surgical tools, avoiding full round-trip rewrites.
Critics claim a better-designed harness and prompts would significantly reduce corruption; supporters note most end-users don’t have such setups and just paste documents into chat UIs.

Practical Guidance & Workflows

Use LLMs to:
- Generate drafts, prototypes, and helper scripts, not as unsupervised delegates on canonical documents.
- Write deterministic tools that then transform data, rather than having the LLM “speak out” the transformed result.
Always:
- Keep documents small and task-scoped.
- Review diffs, run tests, and constrain write access (e.g., separate users, no direct git pushes).
- Treat LLM layers as “lossy” or “bullshit” layers between deterministic systems.

Broader Concerns

Several connect this to “model collapse” and the risk of AI-generated slop eroding both training data and human knowledge online.
Others stress that despite inherent, seemingly “incorrigible” errors, LLMs remain highly useful when bounded, audited, and used as assistive—not autonomous—systems.

Related topics