We should revisit literate programming in the agent era

Perceived Problems with Traditional Literate Programming

  • Main historical failure mode: prose drifts away from code because it’s not executable or testable.
  • Comments and narrative can “lie” about behavior; there’s no compiler for prose.
  • Natural language is inherently ambiguous; adding more of it can increase confusion rather than clarity.
  • Code is navigated as a graph (jumping between definitions and uses), whereas narrative is linear, which doesn’t match how people or tools read large codebases.

Documentation, Comments, and “Why”

  • Broad agreement that code shows what/how; documentation and comments are most valuable for why and why not (business rules, tradeoffs, hardware quirks, rejected approaches).
  • Disagreement on density of comments: some see “lots of comments” as a code smell; others see it as professionalism and a gift to future maintainers.
  • Tests, commit messages, and VCS history are proposed as alternative or complementary carriers of intent, with debate over whether they’re more discoverable than inline comments.

How LLMs/Agents Change the Equation

  • Optimistic view:
    • Agents can detect and fix out-of-sync comments, run doctests/notebooks to keep prose “runnable,” and use docs heavily, finally providing a strong incentive to write them.
    • LLMs are good at mapping between compressed (code) and uncompressed (prose) representations, lowering the cost of maintaining both.
    • Agents can leave their own structured comments (e.g., “remarks”) that future agents (and humans) reuse as long-term memory.
  • Skeptical view:
    • More prose increases token cost and can hurt model performance; minimal, precise context is better.
    • Agents already read code well; explanations can be generated on demand, making persistent literate narratives unnecessary.
    • Hallucinated or philosophical “intent” layers give models more room to output confident but wrong text.

Lighter-Weight Alternatives and Practices

  • Many prefer “lite” literate programming: good naming, docstrings, module-level overviews, README/architecture docs, doctests, and symmetric test–prod code.
  • Notebooks, Org-mode, Rakudoc/Podlite, and similar tools are cited as practical LP-style environments where examples double as tests.
  • Some propose config- or spec-driven approaches, file-level “intent” markdown that compiles to code, or CUE-like declarative specs combined with LLMs for safer generation.

Open Questions

  • Whether agents can reliably keep rich prose in sync at scale remains unclear.
  • Tension persists between making codebases readable narratives for humans/agents and keeping them minimal, precise, and maintainable.