2025-04-29

Jepsen: Amazon RDS for PostgreSQL 17.4

Scope of the issue

The tested system is Amazon RDS for PostgreSQL multi‑AZ clusters (the newer “cluster” flavor with two readable standbys), not:
- Single‑instance RDS
- Classic multi‑AZ instance failover, nor plain upstream single‑node Postgres.
The key finding: multi‑AZ clusters violate snapshot isolation and behave more like “parallel snapshot isolation,” including “long fork” / fractured‑read style anomalies.
The anomalies occur on healthy systems, without fault injection.

Root cause and relation to upstream Postgres

Several commenters explain a subtle upstream behavior:
- On the primary, visibility order is based on when the backend marks a transaction as committed.
- On replicas, visibility is based on WAL commit record order.
- These orders can diverge, so a replica can see transaction T but miss some transactions that logically happened before T.
This explains how a read‑only transaction on a replica can observe inconsistent snapshots even if the primary has proper snapshot isolation.
There is ongoing upstream work to improve cross‑node snapshot consistency, but it’s unfinished and involves serious tradeoffs (e.g., read‑your‑writes vs durability/latency).

Practical impact & example anomalies

It’s not just “slightly stale reads”; you can see states that could never arise in any serial or single‑snapshot execution.
Examples discussed:
- Chained background updates (GPS → postal code → city) observed out of logical order (city updated without postal code, etc.).
- “First commenter” / uniqueness checks granting the same badge to multiple users.
- Git‑like “read‑check‑write” flows ending in hashes that don’t correspond to any valid state.
The risk is highest when applications:
- Assume snapshot isolation,
- Use read replicas (multi‑AZ reader endpoint) in logic that conditions writes on prior reads.

AWS guarantees, documentation, and tradeoffs

Upstream Postgres documents snapshot isolation; commenters argue AWS does not clearly state that multi‑AZ clusters weaken this.
Some see this as a bug or at least an undocumented deviation; others frame it as a deliberate performance/availability tradeoff in a distributed system with “no free lunch.”
Several expect AWS either to:
- Fix the behavior (with potential latency/throughput costs), or
- Explicitly document the weaker guarantees and recommended usage (e.g., critical transactions against the writer only).

RDS flavors and other systems

Confusion is noted between:
- Multi‑AZ instances (classic synchronous replica for failover only), and
- Multi‑AZ clusters (two readable standbys with quorum‑like behavior).
Some speculate that similar anomalies may appear in other Postgres replication setups, but this remains unclear; behavior is confirmed safe only for single‑node Postgres.
Aurora is discussed: its shared‑storage architecture differs, so its behavior may be different, but it was not tested here.

Reaction to Jepsen and writing style

Many praise the rigor and clarity of the Jepsen report and wish more vendor docs were equally precise.
Others find the style dense/academic and initially inaccessible; multiple replies offer explanations, learning advice, and suggest using LLMs or tutorials to bridge the gap.
One critical view claims the report lacks context and overstates failure; others counter that checking advertised guarantees against actual behavior is precisely the point.

Broader themes

Thread reiterates that distributed system guarantees (including major cloud offerings) are often weaker or more complex than users assume.
There is side discussion of other Jepsen targets (MongoDB, ZooKeeper, FoundationDB) and a desire for comprehensive Jepsen coverage of all RDS variants.
Several commenters note that many developers, even seniors and architects, do not understand isolation levels, which makes these subtle consistency issues especially dangerous in real applications.

Related topics