LLMs corrupt your documents when you delegate
Overall Reaction to the Paper
- Many see the results as unsurprising: repeatedly “washing” text through LLMs predictably degrades fidelity, akin to repeated JPEG compression or the telephone game.
- Others argue the experiment is misleading because it doesn’t reflect how serious users or modern agents actually edit documents.
How and Where Degradation Shows Up
- Users report that each LLM “edit” tends to blur nuance, flatten style, and regress toward generic, “mean” prose (“semantic ablation,” “mean reversion”).
- Degradation is especially visible in:
- Long-running editing sessions,
- Large documents or complex formats (codebases, spreadsheets → markdown),
- Tasks where the LLM is allowed to rewrite whole files instead of making local changes.
- Examples include lost sharp phrases, sanitized language, genericized resumes, and codebases that gradually accumulate subtle mistakes.
Humans vs LLMs
- Some argue a human forced to reproduce a whole document from memory with edits would degrade it even more, so the benchmark setup is unrealistic.
- Others counter that:
- Humans don’t work that way; they reference the source and can deliberately aim for exact copying.
- LLMs lack persistent memory and a concept of “accuracy,” so they need explicit safeguards.
Harnesses, Tools, and “Agentic” Design
- Strong view that the paper’s simple read/write harness is the real problem: modern coding agents use diff/patch,
str_replace, and other surgical tools, avoiding full round-trip rewrites. - Critics claim a better-designed harness and prompts would significantly reduce corruption; supporters note most end-users don’t have such setups and just paste documents into chat UIs.
Practical Guidance & Workflows
- Use LLMs to:
- Generate drafts, prototypes, and helper scripts, not as unsupervised delegates on canonical documents.
- Write deterministic tools that then transform data, rather than having the LLM “speak out” the transformed result.
- Always:
- Keep documents small and task-scoped.
- Review diffs, run tests, and constrain write access (e.g., separate users, no direct git pushes).
- Treat LLM layers as “lossy” or “bullshit” layers between deterministic systems.
Broader Concerns
- Several connect this to “model collapse” and the risk of AI-generated slop eroding both training data and human knowledge online.
- Others stress that despite inherent, seemingly “incorrigible” errors, LLMs remain highly useful when bounded, audited, and used as assistive—not autonomous—systems.