Parse, Don't Validate (2019)

Core idea & interpretations

  • Many see the article as: centralize validation at the edges, produce richer types, then trust those types internally instead of sprinkling if/guards everywhere.
  • Others phrase it as “translation at the edge” or “use your type system” rather than focusing on the parse/validate wording, which some find confusing.

Benefits of parsing into rich types

  • Once you parse a raw value (string, JSON, etc.) into a domain type (e.g. PhoneNumber, Email, NonEmpty, Occupants), illegal states become unrepresentable, or at least harder to represent.
  • This prevents whole classes of bugs: mixing up different string fields, misordered arguments, forgetting to re-check invariants, or re-implementing the same checks in many places.
  • Strong functional view: types are propositions and values are proofs (Curry–Howard). A NonEmpty a carries the proof that the list isn’t empty; Option<ValidatedEmail> carries “may or may not be validated”.

Value objects and domain modeling

  • Big thread on whether wrappers like PhoneNumber(String) or Email(String) are worth it:
    • Pro: compile-time separation of concepts, single parsing/validation point, clearer APIs, better refactoring.
    • Con: boilerplate, runtime cost in OO languages, friction with APIs/serialization where everything is strings.
  • Some prefer separate UnvalidatedX / ValidatedX types; others prefer a raw string plus a separate “isValid”/state flag, or move constraints into the database (e.g. SQL constraints).
  • Many warn against overzealous validation (e.g. email/phone regexes, dates, calendars) that rejects real-world data or encodes wrong assumptions.

Language and ecosystem differences

  • Ergonomics vary:
    • Haskell/Rust/F#/Kotlin/Elm/Roc: cheap newtypes, sum types, and pattern matching make this style natural.
    • Java/C#: possible but often verbose; people mention using records, DUs, value/inline classes, or codegen to manage hundreds of value objects.
    • Go’s “zero-value is valid” philosophy clashes somewhat; people still simulate “parse, don’t validate” with NewT() (T, error) constructors.
    • Python/JS: type hints, Pydantic, TS, etc. move in this direction; but dynamic cultures still often pass strings/maps around (e.g. Pandas dataframes vs parsed objects).

Tradeoffs, skepticism, and nuance

  • Some argue static typing fans underestimate messy, evolving business data and the cost of early, rigid modeling; dynamic/introspective approaches are sometimes better in data engineering/ETL.
  • Others counter that types and tests are complementary, and good types improve evolvability if used to express only what a component truly needs.
  • Overuse of “everything is a tiny wrapper” can be as harmful as “everything is a string/dict”; judgment and boundaries matter.
  • Error reporting: you can still collect multiple errors (Result<T, List<Error>>) instead of failing on the first; “parse” doesn’t force fail-fast UX.

Related ideas and tools

  • Closely linked to “making impossible states impossible” and “deep interfaces”.
  • Mentioned tools/approaches: protobuf/Schematron validators, language-specific “newtype”/abstract/value classes, email/date libraries, dataframe schema validators, BNF/grammars for LLM output.
  • Generalized slogan offered: push effects (including validation and error reporting) and untyped data to the edges, and use typed, structured values inside.