How NASA built Artemis II’s fault-tolerant computer
Agile/DevOps vs Deterministic Space Systems
- Several commenters contrast modern Agile/DevOps practices with the discipline needed for deterministic, safety‑critical systems.
- Some argue Agile can be compatible with strong architecture and reliability; others say in practice it degrades rigor and obscures worst‑case behavior.
- There’s debate over whether “agile” is meaningful at all versus just a buzzword used to justify rushed, low‑quality work.
Redundancy and Fail-Silent Architecture
- The article’s “fail-silent” quad-redundant design attracts attention: pairs of CPUs detect their own errors and go silent, while a higher-level system picks the first healthy channel.
- Commenters contrast this with classic triple-voting systems, noting the different trust model (self-detection vs external voting).
- Several question what happens if both CPUs in a pair produce the same wrong result; responses note this is extremely unlikely but nonzero.
Hardware, RTOS, and Radiation Hardening
- Discussion mentions rad-hardened PowerPC (RAD750-class) CPUs, small RAM, and RTOSes like INTEGRITY-178 and VxWorks.
- Time-Triggered Ethernet and ARINC-style scheduling are framed as long-established in aerospace/automotive safety systems.
- Others note rad-hard processes lag commercial nodes by many generations and rely heavily on redundancy.
Backup Flight Software and Dissimilar Redundancy
- A detailed comment describes Orion’s separate Backup Flight Software stack using a different CPU, OS, and NASA’s cFS framework.
- This “dissimilar redundancy” is praised for avoiding common-mode software failures, though it’s noted even independent teams can replicate the same design bug.
Skepticism, Cost, and Comparisons
- Some see the system as overengineered and bureaucratic, “throwing money at redundancy,” and point to Artemis’ cost and schedule issues.
- Others stress that human-rated spaceflight demands extreme reliability and can’t be compared to web apps or even unmanned systems.
Reliability Culture vs Modern Software
- Multiple comments lament the perceived decline in software quality in mainstream development.
- There’s reflection on different “good enough” standards: acceptable for CRUD apps but not for life-critical guidance systems.