Stop using zip codes for geospatial analysis (2019)
What ZIP codes actually are
- Described as mail-sorting constructs, not geographic polygons: abstract sets of delivery points along routes.
- Can be:
- Non-contiguous areas.
- A single point (e.g., a large company).
- A single line (highway routes).
- Overlapping or ambiguous when forced into polygons.
- They reflect USPS operational structure and logistics, not geography or political boundaries; they also change over time as routes change.
Why ZIP codes are problematic for spatial analysis
- Treating ZIPs as polygons is called a “category error” and can produce misleading results, especially for:
- Rural vs urban comparisons, where a single ZIP can mix dense town centers with large rural surroundings.
- Demographic or socioeconomic analysis where internal variation is large.
- Census “ZIP Code Tabulation Areas” (ZCTAs) are only approximations and may include overlaps, missing regions, and temporal drift.
- Relates to the Modifiable Areal Unit Problem: statistics and patterns change with how you draw boundaries.
Arguments that ZIPs are “good enough”
- Widely known by the public, easy to collect on forms, and embedded in addresses.
- Often “uniform enough” and “contiguous enough by travel time” for coarse analyses, marketing, bulk mail, sales-tax lookup, and sports blackouts.
- Seen as a practical first step when you need aggregation but lack more precise spatial data.
Alternatives proposed
- Census units (blocks, tracts, counties, CSAs): better-defined geographies and often population-normalized, but harder to collect from users and to explain.
- Spatial grids like H3:
- Hexagonal cells with consistent neighbor relationships and tunable resolution.
- Good for counting people/phenomena in areas and joining disparate datasets.
- Rectangular lat/long bins are possible but bring design choices and edge issues.
- Exact addresses + geocoding, then aggregating to chosen units.
- Custom regions (e.g., DMAs, custom market areas) built from population + internal data.
Data, privacy, and international issues
- ZIP+4 and Canadian postal codes can be extremely granular, verging on household or building identifiers, raising re-identification risk.
- Postal code concepts differ by country; some boundaries are proprietary or not widely known, and user familiarity varies.
- Address and postal-code validation is messy, with numerous edge cases and conflicting datasets.