Stop using zip codes for geospatial analysis (2019)

What ZIP codes actually are

  • Described as mail-sorting constructs, not geographic polygons: abstract sets of delivery points along routes.
  • Can be:
    • Non-contiguous areas.
    • A single point (e.g., a large company).
    • A single line (highway routes).
    • Overlapping or ambiguous when forced into polygons.
  • They reflect USPS operational structure and logistics, not geography or political boundaries; they also change over time as routes change.

Why ZIP codes are problematic for spatial analysis

  • Treating ZIPs as polygons is called a “category error” and can produce misleading results, especially for:
    • Rural vs urban comparisons, where a single ZIP can mix dense town centers with large rural surroundings.
    • Demographic or socioeconomic analysis where internal variation is large.
  • Census “ZIP Code Tabulation Areas” (ZCTAs) are only approximations and may include overlaps, missing regions, and temporal drift.
  • Relates to the Modifiable Areal Unit Problem: statistics and patterns change with how you draw boundaries.

Arguments that ZIPs are “good enough”

  • Widely known by the public, easy to collect on forms, and embedded in addresses.
  • Often “uniform enough” and “contiguous enough by travel time” for coarse analyses, marketing, bulk mail, sales-tax lookup, and sports blackouts.
  • Seen as a practical first step when you need aggregation but lack more precise spatial data.

Alternatives proposed

  • Census units (blocks, tracts, counties, CSAs): better-defined geographies and often population-normalized, but harder to collect from users and to explain.
  • Spatial grids like H3:
    • Hexagonal cells with consistent neighbor relationships and tunable resolution.
    • Good for counting people/phenomena in areas and joining disparate datasets.
    • Rectangular lat/long bins are possible but bring design choices and edge issues.
  • Exact addresses + geocoding, then aggregating to chosen units.
  • Custom regions (e.g., DMAs, custom market areas) built from population + internal data.

Data, privacy, and international issues

  • ZIP+4 and Canadian postal codes can be extremely granular, verging on household or building identifiers, raising re-identification risk.
  • Postal code concepts differ by country; some boundaries are proprietary or not widely known, and user familiarity varies.
  • Address and postal-code validation is messy, with numerous edge cases and conflicting datasets.