Hypothesis: Property-Based Testing for Python
Getting started with property-based testing
- Several commenters struggle to apply Hypothesis when they don’t fully understand the existing code; they default to writing more example-based unit tests.
- Recommended starter pattern:
- Begin with very general properties like “does not crash” or “only throws allowed exceptions”.
- Use Hypothesis’ strategies to generate broad input classes; refine constraints over time instead of trying to model “all possible inputs”.
- The
@exampledecorator is highlighted as a bridge from hand-written edge cases to generated ones. - Property-based tests can be seen as “parameterized tests with autogenerated tables”.
Typical properties and patterns
- Round-trip invariants:
decode(encode(x)) == x(e.g., JSON or other serialization) are cited as a highly motivating, practical use case. - Equality to a simpler or reference implementation (oracle) is common when there’s a naive but trusted version, or when migrating between implementations.
- For sorting and similar algorithms, suggested properties include:
- Output length equals input length.
- Output is ordered.
- Multiset of elements is preserved.
- Idempotence: sorting twice = sorting once.
- Permuting inputs doesn’t change results.
- Other recurring properties: idempotence, commutativity, associativity, identity elements, order independence, and state-machine style “operation sequences obey invariants” (e.g., UI focus, DB drivers, delete/lookup semantics).
Shrinking, randomness, and determinism
- Hypothesis’ shrinking (minimizing failing examples) is repeatedly described as its most powerful feature and more advanced than classic QuickCheck.
- It uses heuristics (e.g., edge-case values, tricky strings/floats) and maintains a failure database; seeds and failing examples can be replayed, making “random” tests reproducible.
- Some worry about non-deterministic tests; others counter that you log seeds, commit failing examples, and over time cover more of the input space than fixed tests.
Use cases and benefits
- Reported successes include:
- Finding subtle numeric, Unicode, and boundary bugs (e.g., specific list sizes, Turkish “İ”, ß lowercasing, NaN/∞).
- Stress-testing APIs to ensure no 500s/NPEs and robust input validation.
- Verifying data structures, compilers, parsers, SQL/DDL tools, DB migration behavior, and complex drivers.
- Libraries built on Hypothesis such as Schemathesis are praised for uncovering many API validation bugs.
Critiques, tradeoffs, and adoption
- Some argue PBT can require complex generators or even model/state-machine implementations; tests risk being as complex as the SUT and harder to maintain.
- Others respond that:
- You do not need to reimplement business logic; you test general properties and relations between functions, often simpler than enumerating examples.
- PBT complements, not replaces, example-based tests; failing cases can be turned into fixed regression tests.
- Barriers to adoption include:
- Misconceptions about “non-deterministic tests” being inherently bad.
- The learning curve for expressing good properties and strategies.
- Test runtime concerns; suggested mitigations include fewer examples during development and fuller runs in CI.
Ecosystem and documentation
- Hypothesis is compared favorably to other PBT libraries (QuickCheck, FsCheck, Rust’s proptest, Go’s rapid, JS’ fast-check), especially in shrinking and heuristics.
- Some note Hypothesis’ pytest integration is better than with
unittest. - The Hypothesis docs’ “Explanations” section and its design-philosophy content are praised for deepening understanding beyond quickstarts.