2025-08-27

The Therac-25 Incident (2021)

Therac-25 as a systemic failure, not a lone “bug”

Many comments stress that Therac-25 was not just “bad code” but a system failure: missing hardware interlocks, weak processes, slow incident escalation, poor field feedback, and bad safety assumptions.
Older models had mechanical interlocks and even the same fault, but a physical fuse prevented harm; removing those interlocks without a new safety concept was seen as the key blunder.
Several people argue there is almost never a single “root cause”; instead multiple defenses fail (“Swiss cheese model”).

Software vs hardware, and the role of independent failsafes

Strong emphasis that safety engineering should assume software will fail and use independent hardware protections (interlocks, radiation sensors, physical limits).
Electromechanical failsafes are praised because their design is orthogonal to software and their failure is harder to ignore than on‑screen errors.
Examples from industrial automation and aviation reinforce the idea: hard‑wired e‑stops, independent instruments, and formal failure analysis (e.g., required at Boeing).

“Most deadly bug?” – other catastrophic software-related failures

Candidates mentioned:
- Boeing 737 MAX / MCAS (hundreds of deaths; debate over “bug” vs bad design, sensor reliance, and training avoidance).
- Air France 447 control input handling.
- London Ambulance dispatch collapse in the 90s.
- UK Post Office Horizon scandal (false accounting, bankruptcies, suicides).
- Patriot missile timing error in 1991.
- Alleged AI targeting systems in warfare.
Several note gray areas: where bad policy, concealment, or economics matter more than pure coding faults.

Process, culture, and developer quality

One camp: quality is primarily the result of process, feedback loops, and organizational culture (reporting incidents, fixing them, documenting, independent QA, regulation).
Another camp: good developers are a necessary precondition; no process can compensate for uniformly poor engineers.
Many settle on a combined view: talent, process, and a culture of caring about quality are all required, especially for safety‑critical systems.

AI, “vibe‑coding,” and future Therac-style incidents

Strong concern that LLM‑generated, untested code and “vibe‑coding” culture will recreate Therac‑style failures.
A cited LLM‑induced outage is seen as a warning; people fear agentic systems being attached to real hardware or medical devices.

Education, regulation, and ethics

Many were taught Therac‑25 (and analogs like Tacoma Narrows, Hyatt walkway) in CS/engineering ethics; others never saw it, or saw classmates treat it as a joke.
Some point to modern standards (e.g., medical software standards and FDA scrutiny) as reasons a Therac‑25‑level incident is now less likely, while others doubt process alone can prevent failures without ethical, empowered engineers.

Related topics