2025-12-20

Go ahead, self-host Postgres

When 24/7 Uptime Really Matters

Strong disagreement on how often “3 AM pages” are truly justified.
Some describe near-universal expectations of 24/7 availability (overnight batch jobs, banking/healthcare integrations, SLAs, reporting), even when humans aren’t working.
Others argue many important systems accept overnight or weekend downtime, have no pager rotation, and rely on manual fallbacks or VIP-only workarounds.
Uptime is also a sales/reputation lever: enterprises expect “always on” even if usage doesn’t strictly require it.

Self-Hosting vs Managed Postgres

Many report years or decades of trouble-free self-hosting with simple setups: single server, automated backups, basic monitoring.
Others emphasize that production-grade setups (backups, PITR, replicas, failover, upgrades, tuning) are nontrivial and time-consuming, especially without in-house DB expertise.
Managed services (RDS, Cloud SQL, AlloyDB, Supabase, etc.) are praised for backups, upgrades, monitoring, and reduced operational toil, but criticized as expensive and opaque, with limited control during incidents.
Both sides agree: managed DBs do not eliminate the need for database skills, disaster recovery planning, or backup verification.

High Availability and Clustering

Postgres is widely seen as lacking “batteries-included” HA compared to MongoDB’s replica sets.
Common HA tooling: Patroni, CloudNativePG, Zalando operator, Autobase, pg_auto_failover; but these add complexity and are not zero-downtime in all failure modes.
Some argue most businesses don’t actually need true zero-downtime HA; fast recovery plus occasional brief outages is acceptable. Others find that for critical workloads, Postgres HA remains too hard without specialist DBAs.

Backups, Monitoring, and Reliability

Consensus that no backup strategy (including RDS) should be blindly trusted; test restores regularly.
Tools mentioned: pgBackRest, Barman, ZFS snapshots, WAL archiving, pgdash, netdata, pganalyze.
A recurring failure mode: running out of disk space on managed or self-hosted nodes, leading to painful recovery.

Performance, Latency, and Cost

Self-hosted Postgres on bare metal or cheap VPS/Hetzner-style servers with NVMe is reported to vastly outperform cloud-managed offerings at a fraction of the price.
Network latency between app and DB can dominate query time; colocating them (same host or LAN) yields large speedups.
For small projects, some advocate SQLite + Litestream instead of any networked database.

People, Skills, and Responsibility

Management often prefers big-name cloud/SaaS for blame-shifting and reduced “bus factor,” even if cost is higher.
Others argue companies overpay for cloud while still needing infra engineers; black-box debugging of managed services can be as hard as self-hosting.
Several lament that basic sysadmin skills (Unix, RAID, backups) are now seen as exotic, and that fear of terminals helps drive adoption of expensive managed databases.

Related topics