2026-03-08

We should revisit literate programming in the agent era

Perceived Problems with Traditional Literate Programming

Main historical failure mode: prose drifts away from code because it’s not executable or testable.
Comments and narrative can “lie” about behavior; there’s no compiler for prose.
Natural language is inherently ambiguous; adding more of it can increase confusion rather than clarity.
Code is navigated as a graph (jumping between definitions and uses), whereas narrative is linear, which doesn’t match how people or tools read large codebases.

Documentation, Comments, and “Why”

Broad agreement that code shows what/how; documentation and comments are most valuable for why and why not (business rules, tradeoffs, hardware quirks, rejected approaches).
Disagreement on density of comments: some see “lots of comments” as a code smell; others see it as professionalism and a gift to future maintainers.
Tests, commit messages, and VCS history are proposed as alternative or complementary carriers of intent, with debate over whether they’re more discoverable than inline comments.

How LLMs/Agents Change the Equation

Optimistic view:
- Agents can detect and fix out-of-sync comments, run doctests/notebooks to keep prose “runnable,” and use docs heavily, finally providing a strong incentive to write them.
- LLMs are good at mapping between compressed (code) and uncompressed (prose) representations, lowering the cost of maintaining both.
- Agents can leave their own structured comments (e.g., “remarks”) that future agents (and humans) reuse as long-term memory.
Skeptical view:
- More prose increases token cost and can hurt model performance; minimal, precise context is better.
- Agents already read code well; explanations can be generated on demand, making persistent literate narratives unnecessary.
- Hallucinated or philosophical “intent” layers give models more room to output confident but wrong text.

Lighter-Weight Alternatives and Practices

Many prefer “lite” literate programming: good naming, docstrings, module-level overviews, README/architecture docs, doctests, and symmetric test–prod code.
Notebooks, Org-mode, Rakudoc/Podlite, and similar tools are cited as practical LP-style environments where examples double as tests.
Some propose config- or spec-driven approaches, file-level “intent” markdown that compiles to code, or CUE-like declarative specs combined with LLMs for safer generation.

Open Questions

Whether agents can reliably keep rich prose in sync at scale remains unclear.
Tension persists between making codebases readable narratives for humans/agents and keeping them minimal, precise, and maintainable.

Related topics