You'll regret using natural keys
Natural vs. Surrogate Primary Keys
- Many argue you “almost always” want an internal, synthetic primary key (int or UUID) and treat natural identifiers as regular columns with unique constraints.
- Recurrent theme: every “obviously stable” natural key eventually changes or is reused (emails, usernames, SSNs, government IDs, license plates, VINs, gamer tags, phone numbers, addresses, etc.).
- Opposing view: natural keys are essential for modelling real‑world identity and forcing you to think about what really makes an entity unique; surrogate keys don’t save you from messy external reality.
External IDs and Evolving Reality
- Multiple horror stories: systems using external IDs (SSN, national IDs, stock codes, journal ISSNs, Discord usernames, PSN/Steam logins, phone numbers) as PKs, later broken by policy changes, duplicates, format changes, gender changes, or recycled numbers.
- Strong advice from many: never use IDs you don’t control as your primary key; treat them as data + unique indexes and keep a stable internal ID.
Practical Schema Design Patterns
- Common compromise: synthetic PK + natural “business key” enforced via UNIQUE, possibly composite.
- Natural keys often work fine as alternate keys and for human‑facing lookups, but not as PKs that propagate everywhere via foreign keys.
- Some note you can refer to alternate unique keys in foreign keys (not just PKs), though ORMs often push toward a single synthetic PK.
Performance and UUID Debates
- Pro‑surrogate camp highlights:
- Smaller, fixed‑size integer PKs make indexes and joins cheaper.
- Sequential or k‑sortable keys (auto‑increment, UUIDv7, Snowflake‑style) avoid B‑tree fragmentation; pure UUIDv4 can hurt at scale.
- Some like dual IDs: internal int PK + public UUID or short random string (Stripe‑style), to prevent enumeration and “German tank problem” info leakage.
Security, Privacy, and PII
- Natural keys that contain PII (emails, SSNs, national IDs) “infect” every referencing table; complicates compliance (HIPAA/GDPR).
- Debate over whether surrogate IDs themselves count as PII; some standards treat any linkable identifier as sensitive, others carve out internal codes as non‑PHI.
Distributed Systems and Integration
- In SOA/microservices, data warehouses, job queues, caches, and external integrations, changing a PK becomes extremely painful.
- Partitioned systems may need natural‑ish keys for dedup across partitions, but most still recommend an internal surrogate plus natural uniqueness where needed.
Overall Takeaways
- Broad (but not unanimous) consensus:
- Use surrogate PKs under your control.
- Enforce real‑world uniqueness with natural keys via constraints.
- Expect natural identifiers to change; design so that change has minimal blast radius.