Unit tests as documentation

Role of Unit Tests as Documentation

  • Many argue unit tests are a useful form of documentation: they show concrete inputs/outputs, expected behaviors, and “contracts” that should remain stable over time.
  • Tests can help new contributors understand how to call APIs, what shapes of data are expected, and how subsystems interact.
  • Some treat tests (especially in TDD) as executable specifications or “living specs,” particularly useful for legacy or poorly understood code.

Limits and Critiques

  • Strong pushback on treating tests as the documentation.
  • Tests and code show “what happens”, but often not “why”, business rules, trade‑offs, or domain concepts.
  • Tests enumerate finite cases and may omit important edge conditions; absence of a test does not equal undefined behavior.
  • For end‑users and non‑coders, tests are essentially useless as docs.
  • Poor tests (flaky, incomplete, mislabeled) can mislead as much as stale prose.

Integration vs Unit Tests

  • Several contend integration tests are far more valuable as documentation of real usage patterns and behavior across components.
  • Others defend a mix (test pyramid): unit tests catch fine‑grained issues and are easier to debug; integration tests validate lifelike flows.
  • Debate over whether integration‑heavy strategies can replace most unit tests, with concerns about combinatorial explosion and debugging cost.

Keeping Docs and Tests in Sync

  • Thread highlights tools and language features where examples in documentation are executable tests (Python/Rust doctests, Elixir, D, etc.).
  • This helps ensure examples stay valid, though prose explanations can still rot.
  • Some projects generate docs from tests or embed tests in docs to get “executable documentation.”

Test Quality: Names, Structure, and Readability

  • To be useful as documentation, tests should be simple, independent, and focused on clear scenarios.
  • Descriptive test names and/or BDD‑style descriptions are promoted, but many note social friction in enforcing good naming.
  • Others claim organization, fixtures, and comments around tests matter more than function names.

AI and Tests/Docs

  • People report success feeding test suites to LLMs to generate human‑readable documentation for under‑documented libraries.
  • Conversely, AI‑generated tests that merely mirror implementation can create noisy “regression tests” with little specification value.