You'll regret using natural keys

Natural vs. Surrogate Primary Keys

  • Many argue you “almost always” want an internal, synthetic primary key (int or UUID) and treat natural identifiers as regular columns with unique constraints.
  • Recurrent theme: every “obviously stable” natural key eventually changes or is reused (emails, usernames, SSNs, government IDs, license plates, VINs, gamer tags, phone numbers, addresses, etc.).
  • Opposing view: natural keys are essential for modelling real‑world identity and forcing you to think about what really makes an entity unique; surrogate keys don’t save you from messy external reality.

External IDs and Evolving Reality

  • Multiple horror stories: systems using external IDs (SSN, national IDs, stock codes, journal ISSNs, Discord usernames, PSN/Steam logins, phone numbers) as PKs, later broken by policy changes, duplicates, format changes, gender changes, or recycled numbers.
  • Strong advice from many: never use IDs you don’t control as your primary key; treat them as data + unique indexes and keep a stable internal ID.

Practical Schema Design Patterns

  • Common compromise: synthetic PK + natural “business key” enforced via UNIQUE, possibly composite.
  • Natural keys often work fine as alternate keys and for human‑facing lookups, but not as PKs that propagate everywhere via foreign keys.
  • Some note you can refer to alternate unique keys in foreign keys (not just PKs), though ORMs often push toward a single synthetic PK.

Performance and UUID Debates

  • Pro‑surrogate camp highlights:
    • Smaller, fixed‑size integer PKs make indexes and joins cheaper.
    • Sequential or k‑sortable keys (auto‑increment, UUIDv7, Snowflake‑style) avoid B‑tree fragmentation; pure UUIDv4 can hurt at scale.
  • Some like dual IDs: internal int PK + public UUID or short random string (Stripe‑style), to prevent enumeration and “German tank problem” info leakage.

Security, Privacy, and PII

  • Natural keys that contain PII (emails, SSNs, national IDs) “infect” every referencing table; complicates compliance (HIPAA/GDPR).
  • Debate over whether surrogate IDs themselves count as PII; some standards treat any linkable identifier as sensitive, others carve out internal codes as non‑PHI.

Distributed Systems and Integration

  • In SOA/microservices, data warehouses, job queues, caches, and external integrations, changing a PK becomes extremely painful.
  • Partitioned systems may need natural‑ish keys for dedup across partitions, but most still recommend an internal surrogate plus natural uniqueness where needed.

Overall Takeaways

  • Broad (but not unanimous) consensus:
    • Use surrogate PKs under your control.
    • Enforce real‑world uniqueness with natural keys via constraints.
    • Expect natural identifiers to change; design so that change has minimal blast radius.