2025-11-05

Hypothesis: Property-Based Testing for Python

Getting started with property-based testing

Several commenters struggle to apply Hypothesis when they don’t fully understand the existing code; they default to writing more example-based unit tests.
Recommended starter pattern:
- Begin with very general properties like “does not crash” or “only throws allowed exceptions”.
- Use Hypothesis’ strategies to generate broad input classes; refine constraints over time instead of trying to model “all possible inputs”.
The @example decorator is highlighted as a bridge from hand-written edge cases to generated ones.
Property-based tests can be seen as “parameterized tests with autogenerated tables”.

Typical properties and patterns

Round-trip invariants: decode(encode(x)) == x (e.g., JSON or other serialization) are cited as a highly motivating, practical use case.
Equality to a simpler or reference implementation (oracle) is common when there’s a naive but trusted version, or when migrating between implementations.
For sorting and similar algorithms, suggested properties include:
- Output length equals input length.
- Output is ordered.
- Multiset of elements is preserved.
- Idempotence: sorting twice = sorting once.
- Permuting inputs doesn’t change results.
Other recurring properties: idempotence, commutativity, associativity, identity elements, order independence, and state-machine style “operation sequences obey invariants” (e.g., UI focus, DB drivers, delete/lookup semantics).

Shrinking, randomness, and determinism

Hypothesis’ shrinking (minimizing failing examples) is repeatedly described as its most powerful feature and more advanced than classic QuickCheck.
It uses heuristics (e.g., edge-case values, tricky strings/floats) and maintains a failure database; seeds and failing examples can be replayed, making “random” tests reproducible.
Some worry about non-deterministic tests; others counter that you log seeds, commit failing examples, and over time cover more of the input space than fixed tests.

Use cases and benefits

Reported successes include:
- Finding subtle numeric, Unicode, and boundary bugs (e.g., specific list sizes, Turkish “İ”, ß lowercasing, NaN/∞).
- Stress-testing APIs to ensure no 500s/NPEs and robust input validation.
- Verifying data structures, compilers, parsers, SQL/DDL tools, DB migration behavior, and complex drivers.
Libraries built on Hypothesis such as Schemathesis are praised for uncovering many API validation bugs.

Critiques, tradeoffs, and adoption

Some argue PBT can require complex generators or even model/state-machine implementations; tests risk being as complex as the SUT and harder to maintain.
Others respond that:
- You do not need to reimplement business logic; you test general properties and relations between functions, often simpler than enumerating examples.
- PBT complements, not replaces, example-based tests; failing cases can be turned into fixed regression tests.
Barriers to adoption include:
- Misconceptions about “non-deterministic tests” being inherently bad.
- The learning curve for expressing good properties and strategies.
- Test runtime concerns; suggested mitigations include fewer examples during development and fuller runs in CI.

Ecosystem and documentation

Hypothesis is compared favorably to other PBT libraries (QuickCheck, FsCheck, Rust’s proptest, Go’s rapid, JS’ fast-check), especially in shrinking and heuristics.
Some note Hypothesis’ pytest integration is better than with unittest.
The Hypothesis docs’ “Explanations” section and its design-philosophy content are praised for deepening understanding beyond quickstarts.

Related topics