Parse, Don't Validate (2019)
Core idea & interpretations
- Many see the article as: centralize validation at the edges, produce richer types, then trust those types internally instead of sprinkling
if/guards everywhere. - Others phrase it as “translation at the edge” or “use your type system” rather than focusing on the parse/validate wording, which some find confusing.
Benefits of parsing into rich types
- Once you parse a raw value (string, JSON, etc.) into a domain type (e.g.
PhoneNumber,Email,NonEmpty,Occupants), illegal states become unrepresentable, or at least harder to represent. - This prevents whole classes of bugs: mixing up different string fields, misordered arguments, forgetting to re-check invariants, or re-implementing the same checks in many places.
- Strong functional view: types are propositions and values are proofs (Curry–Howard). A
NonEmpty acarries the proof that the list isn’t empty;Option<ValidatedEmail>carries “may or may not be validated”.
Value objects and domain modeling
- Big thread on whether wrappers like
PhoneNumber(String)orEmail(String)are worth it:- Pro: compile-time separation of concepts, single parsing/validation point, clearer APIs, better refactoring.
- Con: boilerplate, runtime cost in OO languages, friction with APIs/serialization where everything is strings.
- Some prefer separate
UnvalidatedX/ValidatedXtypes; others prefer a raw string plus a separate “isValid”/state flag, or move constraints into the database (e.g. SQL constraints). - Many warn against overzealous validation (e.g. email/phone regexes, dates, calendars) that rejects real-world data or encodes wrong assumptions.
Language and ecosystem differences
- Ergonomics vary:
- Haskell/Rust/F#/Kotlin/Elm/Roc: cheap newtypes, sum types, and pattern matching make this style natural.
- Java/C#: possible but often verbose; people mention using records, DUs, value/inline classes, or codegen to manage hundreds of value objects.
- Go’s “zero-value is valid” philosophy clashes somewhat; people still simulate “parse, don’t validate” with
NewT() (T, error)constructors. - Python/JS: type hints, Pydantic, TS, etc. move in this direction; but dynamic cultures still often pass strings/maps around (e.g. Pandas dataframes vs parsed objects).
Tradeoffs, skepticism, and nuance
- Some argue static typing fans underestimate messy, evolving business data and the cost of early, rigid modeling; dynamic/introspective approaches are sometimes better in data engineering/ETL.
- Others counter that types and tests are complementary, and good types improve evolvability if used to express only what a component truly needs.
- Overuse of “everything is a tiny wrapper” can be as harmful as “everything is a string/dict”; judgment and boundaries matter.
- Error reporting: you can still collect multiple errors (
Result<T, List<Error>>) instead of failing on the first; “parse” doesn’t force fail-fast UX.
Related ideas and tools
- Closely linked to “making impossible states impossible” and “deep interfaces”.
- Mentioned tools/approaches: protobuf/Schematron validators, language-specific “newtype”/abstract/value classes, email/date libraries, dataframe schema validators, BNF/grammars for LLM output.
- Generalized slogan offered: push effects (including validation and error reporting) and untyped data to the edges, and use typed, structured values inside.