Hypothesis: Property-Based Testing for Python

Getting started with property-based testing

  • Several commenters struggle to apply Hypothesis when they don’t fully understand the existing code; they default to writing more example-based unit tests.
  • Recommended starter pattern:
    • Begin with very general properties like “does not crash” or “only throws allowed exceptions”.
    • Use Hypothesis’ strategies to generate broad input classes; refine constraints over time instead of trying to model “all possible inputs”.
  • The @example decorator is highlighted as a bridge from hand-written edge cases to generated ones.
  • Property-based tests can be seen as “parameterized tests with autogenerated tables”.

Typical properties and patterns

  • Round-trip invariants: decode(encode(x)) == x (e.g., JSON or other serialization) are cited as a highly motivating, practical use case.
  • Equality to a simpler or reference implementation (oracle) is common when there’s a naive but trusted version, or when migrating between implementations.
  • For sorting and similar algorithms, suggested properties include:
    • Output length equals input length.
    • Output is ordered.
    • Multiset of elements is preserved.
    • Idempotence: sorting twice = sorting once.
    • Permuting inputs doesn’t change results.
  • Other recurring properties: idempotence, commutativity, associativity, identity elements, order independence, and state-machine style “operation sequences obey invariants” (e.g., UI focus, DB drivers, delete/lookup semantics).

Shrinking, randomness, and determinism

  • Hypothesis’ shrinking (minimizing failing examples) is repeatedly described as its most powerful feature and more advanced than classic QuickCheck.
  • It uses heuristics (e.g., edge-case values, tricky strings/floats) and maintains a failure database; seeds and failing examples can be replayed, making “random” tests reproducible.
  • Some worry about non-deterministic tests; others counter that you log seeds, commit failing examples, and over time cover more of the input space than fixed tests.

Use cases and benefits

  • Reported successes include:
    • Finding subtle numeric, Unicode, and boundary bugs (e.g., specific list sizes, Turkish “İ”, ß lowercasing, NaN/∞).
    • Stress-testing APIs to ensure no 500s/NPEs and robust input validation.
    • Verifying data structures, compilers, parsers, SQL/DDL tools, DB migration behavior, and complex drivers.
  • Libraries built on Hypothesis such as Schemathesis are praised for uncovering many API validation bugs.

Critiques, tradeoffs, and adoption

  • Some argue PBT can require complex generators or even model/state-machine implementations; tests risk being as complex as the SUT and harder to maintain.
  • Others respond that:
    • You do not need to reimplement business logic; you test general properties and relations between functions, often simpler than enumerating examples.
    • PBT complements, not replaces, example-based tests; failing cases can be turned into fixed regression tests.
  • Barriers to adoption include:
    • Misconceptions about “non-deterministic tests” being inherently bad.
    • The learning curve for expressing good properties and strategies.
    • Test runtime concerns; suggested mitigations include fewer examples during development and fuller runs in CI.

Ecosystem and documentation

  • Hypothesis is compared favorably to other PBT libraries (QuickCheck, FsCheck, Rust’s proptest, Go’s rapid, JS’ fast-check), especially in shrinking and heuristics.
  • Some note Hypothesis’ pytest integration is better than with unittest.
  • The Hypothesis docs’ “Explanations” section and its design-philosophy content are praised for deepening understanding beyond quickstarts.