2025-05-07

Waiting for Postgres 18: Accelerating Disk Reads with Asynchronous I/O

Platform support and implementation

New async I/O in Postgres 18 has multiple backends:
- io_uring (Linux-only, needs kernel support + liburing).
- worker (background I/O processes), which works on all OSes and is the default in beta.
- Legacy sync is the old behavior.
Windows:
- Windows has long had async I/O via IOCP and now IoRing, but Postgres doesn’t yet use them in 18.
- There are prototype implementations for IOCP and IoRing; IoRing is attractive because it supports async flush, but it currently lacks scatter/gather, which is essential for the buffer manager.
- Async networking on Windows will likely still need IOCP, since IoRing doesn’t support sockets.
FreeBSD: aio(4) is seen as a strong in‑kernel AIO implementation and there are patches to integrate it with Postgres for a future release.

Performance characteristics and low-level tuning

worker I/O can still outperform sync and, in some cases, even helps when data is already in the page cache by parallelizing kernel→user copies and checksums—important on memory‑bandwidth‑limited CPUs.
Optimistic use of preadv2(..., RWF_NOWAIT) was tested but double page‑cache lookups cost more than they saved.
Direct I/O is an explicit long‑term goal; it can already be enabled experimentally and shows big gains for some read‑heavy workloads, but more AIO plumbing (writes, readahead) is needed.
Discussion dives into kernel details: SMAP overhead, large folios, RWF_UNCACHED, TLB shootdowns, and mmap strategies; these mostly highlight how hard it is to make large oneshot I/O really fast.

io_uring security and operational concerns

Some participants recall kernels where io_uring had serious CVEs and was disabled by default.
Others argue disabling it blindly is inappropriate; risk depends on whether you run untrusted workloads on a shared kernel.
A practical constraint: if your cloud/provider disables io_uring, you can’t opt in even if your own risk profile would allow it.

Cloud vs bare metal I/O

Several comments contrast modest cloud IOPS limits (e.g., ~20k IOPS on EBS) with cheap consumer NVMe drives advertising ~1M+ IOPS.
Instance-local NVMe on cloud can be fast but ephemeral (wiped on stop/hibernate), so some see it as risky for primary database storage, others say major managed DBs do exactly that with appropriate backup/restore flows.
Complaints about cloud throttling (IOPS, bandwidth, vCPU) and especially high egress pricing; some note entire companies couldn’t exist if they paid cloud rates instead of running on providers like Hetzner.
Counterpoint: enterprises often choose AWS/Azure for brand, perceived safety, and operational convenience, even at much higher cost.

Real-world Postgres deployment and backups

One user describes a cost‑effective bare‑metal Postgres on Hetzner with NVMe, TailScale for secure access, pgTune, PgHero, pgcron, and a custom pgdump+ZSTD+S3 backup tool.
Others stress that pgdump-style logical backups alone are insufficient for many cases; recommend mature tools like pgBackRest or barman for PITR, replication-friendly restores, retention policies, encryption, and tested edge cases.
Debate over whether snapshots count as “real” backups: acceptable for some risk profiles, but not for all.

Connection scalability and threading roadmap

People ask when Postgres will support more concurrent connections without pgbouncer.
Consensus: this likely requires moving away from “process per connection,” so it’s a multi‑release effort.
There is active work toward multithreading, with preparatory changes already landing in 18; pgbouncer remains the practical solution for now.

Comparisons to MySQL/InnoDB and storage layout

Some note Postgres is just now adopting async I/O strategies InnoDB has used for years.
Discussion of heap tables vs clustered B‑trees:
- Postgres stores table data in heap files; indexes (including PKs) are B-trees pointing into the heap.
- InnoDB uses a clustered primary key B‑tree whose leaves contain the row data; secondary indexes point to PKs.
- Tradeoffs: clustered storage can be superior for PK‑range scans and predictable OLTP, while heaps can be better for many secondary indexes, wide or varied access patterns, and some analytical workloads.
There’s interest in richer storage engines for Postgres (e.g., OrioleDB‑style, LSM‑like, better compression and page compression).
Some argue that even if basic “apples to apples” benchmarks show MySQL family faster, Postgres’s richer features (ranges, exclusion constraints, arrays, etc.) often let you model problems so that overall system performance ends up better—assuming ORMs don’t hide those features.

Ecosystem, adoption, and sentiment

Multiple commenters praise the quality and pace of Postgres development and see this as a foundational change that will enable more async I/O use in future versions (e.g., checkpointer, WAL, networking).
There’s enthusiasm for running Postgres 18 with async I/O on NVMe, including on managed services (one cloud Postgres provider confirms they’ll ship 18 on day one).
Some reflect that the old “MySQL vs Postgres” debate has largely been resolved in Postgres’s favor for new projects, especially given its ecosystem (e.g., pgvector, embedded ML extensions).

Related topics