We should revisit literate programming in the agent era
Perceived Problems with Traditional Literate Programming
- Main historical failure mode: prose drifts away from code because it’s not executable or testable.
- Comments and narrative can “lie” about behavior; there’s no compiler for prose.
- Natural language is inherently ambiguous; adding more of it can increase confusion rather than clarity.
- Code is navigated as a graph (jumping between definitions and uses), whereas narrative is linear, which doesn’t match how people or tools read large codebases.
Documentation, Comments, and “Why”
- Broad agreement that code shows what/how; documentation and comments are most valuable for why and why not (business rules, tradeoffs, hardware quirks, rejected approaches).
- Disagreement on density of comments: some see “lots of comments” as a code smell; others see it as professionalism and a gift to future maintainers.
- Tests, commit messages, and VCS history are proposed as alternative or complementary carriers of intent, with debate over whether they’re more discoverable than inline comments.
How LLMs/Agents Change the Equation
- Optimistic view:
- Agents can detect and fix out-of-sync comments, run doctests/notebooks to keep prose “runnable,” and use docs heavily, finally providing a strong incentive to write them.
- LLMs are good at mapping between compressed (code) and uncompressed (prose) representations, lowering the cost of maintaining both.
- Agents can leave their own structured comments (e.g., “remarks”) that future agents (and humans) reuse as long-term memory.
- Skeptical view:
- More prose increases token cost and can hurt model performance; minimal, precise context is better.
- Agents already read code well; explanations can be generated on demand, making persistent literate narratives unnecessary.
- Hallucinated or philosophical “intent” layers give models more room to output confident but wrong text.
Lighter-Weight Alternatives and Practices
- Many prefer “lite” literate programming: good naming, docstrings, module-level overviews, README/architecture docs, doctests, and symmetric test–prod code.
- Notebooks, Org-mode, Rakudoc/Podlite, and similar tools are cited as practical LP-style environments where examples double as tests.
- Some propose config- or spec-driven approaches, file-level “intent” markdown that compiles to code, or CUE-like declarative specs combined with LLMs for safer generation.
Open Questions
- Whether agents can reliably keep rich prose in sync at scale remains unclear.
- Tension persists between making codebases readable narratives for humans/agents and keeping them minimal, precise, and maintainable.