2025-03-13

IO Devices and Latency

Interactive Visuals and Accessibility

Commenters widely praise the animations as some of the best latency explanations they’ve seen; many say they forgot it was effectively an ad.
Visuals are implemented with heavy use of d3.js; other libraries like GSAP and SVG.js are mentioned as alternatives.
Some users browse with JavaScript disabled and see no visuals, requesting static images as a fallback.
Others report breakage from browser extensions (dark mode, ad blockers, user styles) and some browser-specific issues (Safari, Chrome/Firefox mismatches).

Durability, Replication, and Probability

The article’s “1 in a million” durability remark is viewed as too pessimistic: commenters note that failures are only dangerous during the short window before a replica is replaced.
One commenter provides a back-of-the-envelope recalculation showing far lower failure probability if failures are independent and replacement happens in ~30 minutes, but another cautions that failures are often correlated.
The product uses semi-synchronous replication: the primary waits for at least one replica ACK before commit, introducing a network hop on writes but favoring read-heavy workloads.

Local NVMe vs Networked Storage and “Unlimited IOPS”

Strong support for using local NVMe instead of cloud network volumes (EBS/Volumes) due to latency, IOPS limits, and cloud storage being “unusually slow.”
Some nuance: network-attached storage makes maintenance/drains and durability easier, especially for systems that don’t implement replication themselves.
“Unlimited IOPS” is defended as “practically unlimited” for MySQL: CPU becomes the bottleneck long before the physical NVMe IOPS limit is hit.

IOPS Limits, SSD Latency, and Hardware Differences

Several fio benchmarks are shared comparing random writes vs fsync, O_DIRECT vs buffered IO, consumer vs enterprise NVMe.
Key observations:
- Raw random writes can be tens of microseconds; durable sync writes are often ~250–300µs on consumer drives and much faster on enterprise drives with power-loss protection.
- Enterprise SSDs may acknowledge fsync before flushing to flash, relying on capacitors to guarantee durability on power loss.
- NVMe performance varies widely by device class and power-saving state; numbers in the article are broadly plausible but depend heavily on hardware and configuration.

SQLite + NVMe vs Client-Server Databases

One subthread promotes SQLite-on-NVMe as a pattern: avoid the network hop, get microsecond-scale operations, and rely on a single writer.
Counterarguments:
- Multi-writer scenarios and multiple webservers rapidly complicate SQLite usage; Postgres/MySQL are easier once you need a shared database.
- Local Postgres on the same host, using Unix sockets, is common and often “fast enough” while preserving scaling options.
- Some argue SQLite’s single-writer constraint is manageable for mostly-read workloads; others say you’ll hit that limit earlier than you think.
There is back-and-forth on whether IPC/network overhead is negligible compared to query execution; opinions differ on how much optimization this really buys in web apps.

Cloud Operations, Local SSD Reliability, and Drains

Prior bad experiences with GCP Local SSD (bad blocks) are contrasted with more recent reports of no such issues in testing.
Local SSD setups rely on higher-level replication (e.g., MySQL semi-sync across AZs) plus orchestration to rapidly detect and replace failing nodes.
Commenters highlight cloud “events”/drains (e.g., EC2 termination for maintenance) as a major operational risk for local-only storage: miss the event and local data disappears.
Some note that for many orgs, the complexity of scripting automatic rebuilds on wiped local disks makes network-attached storage (EBS, etc.) more attractive.

Cloud IOPS Throttling and Economics

IOPS limits on EBS-type volumes are explained as packet/operation rate limits, distinct from raw bandwidth, with both volume-level and instance-level caps.
Moving to local NVMe removes artificial IOPS caps but trades off the elasticity of EBS and its ability to survive instance resizes or failures transparently.
There’s curiosity about whether local NVMe is not only a latency win but also a throughput-per-dollar win; consensus is that it depends on workload and scaling patterns.

Educational, Historical, and Corrective Notes

Many see the article as ideal teaching material for high school/university courses on storage and latency; several plan to link it in classes or to family.
Old mainframe/tape and COBOL anecdotes underline how physical device behavior (e.g., tape overshoot, drum memories) shaped algorithms and access patterns.
One commenter challenges specific HDD numbers (e.g., average rotational latency) and offers more detailed track-count estimates, pointing to an in-depth HDD performance paper.
Some minor nitpicks appear (e.g., missing intermediate technologies between tape and HDD), but they don’t detract from broad praise for clarity and visuals.

Related topics