2026-03-04

10% of Firefox crashes are caused by bitflips

How Firefox Is Attributing Crashes to Bitflips

Firefox added a post-crash memory tester that runs on user machines; code is public (Rust runner + separate memtest crate).
Described techniques include:
- Writing known bit patterns to RAM and reading back to detect flips.
- Using “magic” sentinel values in data structures and checking whether they differ by only one or a few bits.
Reported measurement: ~5% of crashes flagged as “potentially” due to bad/flaky memory; author then extrapolates up to ~10–15% with a “conservative heuristic,” which is not fully explained.
Several commenters note that “potential” and the missing details make the true rate unclear.

Skepticism About the 10–15% Claim

Some find 10% of crashes from hardware defects “huge” and hard to believe, suspecting biased telemetry (e.g., small number of very bad machines).
Others criticize the extrapolation from 5% to 10% as unsupported handwaving.
Concerns that rare races, allocator or kernel bugs, or Firefox-specific issues could be misclassified as hardware faults.
Counter‑argument: large-scale crash triage in other systems (OSes, games, Go toolchain) also reveals a nontrivial tail of crashes best explained by memory or CPU faults.

User Reports and Comparative Behavior

Mixed experiences: some users see Firefox crash frequently (often on exit or under high tab count), others report near-zero crashes over years.
Multiple anecdotes of Firefox being the first app to fail on machines later diagnosed with bad RAM or misconfigured/overclocked memory.
Others claim Chromium-based browsers crash less on the same hardware, suggesting Firefox might simply be buggier or more memory-hungry.
It’s noted that crashes are concentrated on faulty machines, so “10% of crashes” does not mean 10% of users are impacted.

Hardware, ECC, and Bitflip Context

Commenters emphasize that bitflips can arise from marginal RAM, heat, aging, PSU issues, or misconfiguration, not only cosmic rays.
ECC RAM and CPU cache ECC significantly reduce or surface errors but don’t eliminate them; many consumer systems lack full ECC support.
DDR5’s on-die “ECC” is distinguished from system-wide ECC; seen as improving yield/error rates but not equivalent to traditional ECC DIMMs.

Mitigations and Open Questions

Suggestions:
- Run analysis locally and inform users when memory appears flaky.
- Map out bad RAM regions in the OS.
- Add redundancy/checksums for critical in-memory data.
Some argue engineering around bad hardware isn’t worthwhile except in safety‑critical systems; others say robustness to hardware faults is increasingly important.
Several commenters express interest in comparable data from Chrome and in a proper, detailed write‑up of Firefox’s methodology.

Related topics