Making Postgres scale
What a “modern” database vs Postgres would mean
- Some argue that Postgres’s 1980s origin doesn’t matter: it’s still actively developed and best-in-class for relational + ACID.
- Others want a “new” DB that is:
- Natively horizontally scalable (multi-writer, not just bigger boxes).
- Postgres-compatible but with first‑class NoSQL/JSON support that avoids JSONB race conditions.
- Under a permissive, non-telemetry, non-“bizarro” license.
Vertical vs horizontal scaling of Postgres
- Several commenters say 99.9% of companies can live on one large Postgres node plus a replica; availability is usually a bigger issue than raw scale.
- A detailed example: a single 16TB/100B+ row Postgres instance on beefy bare metal (EPYC, 1TB RAM, NVMe+ZFS) handles ~150k inserts/s, 40k tx/s, 4M reads/s with “nothing weird” beyond:
- ZFS, non-standard Postgres block size, aggressive autovacuum tuning.
- Clarification that “Postgres doesn’t scale” usually means “you can’t just add more machines for writes.” Reads can be scaled with replicas, but fully consistent reads and multi-writer setups still need sharding or another system.
- Elastic, on-demand scaling (cloud-style) is distinguished from simply being able to handle high sustained load.
Sharding / external scale-out approaches
- Classic approach: application-level sharding via stored procedures (e.g., PL/Proxy) has been proven at very large scale.
- Citus: works very well if schema is designed around a shard/distribution key (multi-tenant is a sweet spot), but retrofitting existing schemas can take months; cross-shard FKs and some patterns remain hard.
- PgDog (topic of the article):
- Rust-based proxy that shards, load-balances, and pools connections in front of regular Postgres nodes.
- Positioned as “Vitess for Postgres,” avoiding Azure lock-in and aiming for easy migration from “one big DB” (treating 1 shard as base case).
- Better replica handling noted vs Citus, but currently limited cross-shard aggregates; GROUP BY and basic aggregates are being implemented, postgres_fdw suggested as interim fallback.
Alternative databases & trade-offs
- Aurora (PG/MySQL): some report it being significantly slower than community Postgres for small queries; Aurora DSQL is mentioned as a promising, but early, distributed Postgres-compatible system.
- CockroachDB: praised for transparent distribution and Postgres-like SQL, but:
- Multiple reports of very high resource usage and substantially higher cost vs self-hosted Postgres.
- Frustration with changing licenses, telemetry/phone-home requirements, and aggressive enterprise focus.
- Oracle DB: cited as an example that already delivers horizontally scalable, fully transactional SQL + NoSQL features, but proprietary and not aligned with “fully open” desires.
Other technical subthreads
- JSONB: concurrent updates to different fields update the whole document; correctness may require row locks or version columns, unlike Mongo’s per-field writes.
- Stored procedures: some advocate stored-proc-only access for safety, encapsulation, and SECURITY DEFINER; others find dev experience poor despite PL/Python, PL/Perl, PL/Rust, etc.