Scaling PostgreSQL to power 800M ChatGPT users
PostgreSQL-at-scale architecture
- Core setup: one PostgreSQL primary handling all writes plus ~50 read replicas; read-heavy traffic is offloaded, write-heavy shardable workloads are moved to Azure CosmosDB and other sharded systems.
- New tables are no longer added to the main Postgres deployment; new features default to sharded systems.
Write scalability and MVCC limits
- Discussion centers on MVCC causing write and read amplification, bloat, and autovacuum complexity under heavy write load.
- Some point to LSM-tree systems (e.g., TiDB, RocksDB-style designs) as better suited for high write throughput; others report mixed performance experiences with these systems.
- Several note that MySQL or SQL Server can outperform Postgres on certain write-heavy or query-planning workloads, but licensing and cost make them unattractive for startups.
Sharding vs single primary
- Strong debate over whether sharding is “just a DB concern” or necessarily leaks into the application via joins, cross-shard transactions, and consistency semantics.
- Comments highlight that cross-shard operations often become non-transactional or rely on 2PC with eventual consistency and operational complexity (schema changes, resharding, observability).
- Some argue OpenAI effectively did shard—just by moving workloads to different databases instead of sharded Postgres itself.
Replication, hardware, and infra
- Curiosity about replication details: likely async streaming replication; concerns about lagging replicas causing WAL retention and potential slowdowns.
- Alternatives discussed: shipping WAL to object storage and having replicas pull from there, with higher baseline lag and dependence on object-store performance.
- Thread dives into massive Azure/AWS VM SKUs (hundreds–thousands of cores, tens of TB RAM), their high cost, and advice to prefer multiple “medium” boxes over giant NUMA monsters.
Operational lessons
- Emphasis on “boring” techniques: connection pooling (pgbouncer), query optimization, caching, schema-change timeouts.
- Anecdote on idle transactions exhausting connection slots and using compile-time checks to prevent holding connections across async waits.
- One theme: Postgres scales very far if used mainly as a transactional “source of truth” while offloading search/analytics/discovery elsewhere.
Reception of the article
- Some praise it as a grounded example that a single primary plus replicas can support enormous scale and that many companies over-shard prematurely.
- Others criticize it as vague, repetitive, Azure/CosmosDB marketing with little novel technical detail, and point out the resulting multi-database complexity and lock-in.