Air traffic failure caused by two locations 3600nm apart sharing 3-letter code

Abbreviation confusion (“nm”)

  • Many readers initially interpreted “3600nm” as nanometers, not nautical miles.
  • Discussion over correct aviation notation: some say “NM” or “nmi” are standard; others report widespread real-world use of lowercase “nm” in cockpits and marine systems.
  • Several note the irony of an article about identifier collisions itself using an ambiguous unit abbreviation.
  • Broader gripe that aviation still mixes legacy units (feet, inches of mercury, gallons) and inconsistent conventions.

Failure mode and safety trade‑offs

  • The system correctly detected inconsistent flight-plan data and refused to propagate it, but then shut down entirely (including the backup, which ran the same code).
  • One camp argues full shutdown is appropriate for safety:
    • If a “valid” plan can’t be processed, either upstream data is untrustworthy or the system itself is faulty.
    • In both cases, continued automatic processing might produce undetected unsafe states, so forcing manual control is safer.
  • Others argue it’s unreasonable for a single corrupt plan to halt a national system; better to:
    • Reject or flag the individual plan.
    • Continue tracking the physical aircraft via radar/transponder.
    • Use manual handling only for the problematic flight.

Identifiers and waypoint code collisions

  • Core technical trigger: two distinct navigation locations (Deauville VOR in France and Devil’s Lake VOR in the US) share the three-letter code “DVL”.
  • Some developers would have assumed three-letter identifiers are globally unique; aviation practitioners note they’re only regionally unique and long known to collide.
  • Suggestions include:
    • Namespacing codes by issuing authority.
    • Using surrogate keys internally while still accepting non-unique human-facing codes.
    • Moving toward globally unique waypoints, though commenters note enormous retrofit cost.

Robustness, DoS, and input validation

  • Multiple comments frame this as a denial-of-service vulnerability: one crafted or unlucky flight plan can halt all automated processing.
  • Proposed mitigations:
    • Harden parsers to treat weird but parseable plans as “reject/flag” rather than fatal.
    • Fuzzing and better test suites around edge cases like duplicate waypoints, implicit segments, and long overflight routes.
  • Some see the system as “over‑vouchsafing” (failing too hard), given other safety nets (procedural separation, TCAS, etc.), especially over oceanic tracks with limited radar.

Incident management and operational issues

  • Excerpts from UK CAA reports highlight non-software contributors to the length and impact of the outage:
    • Key engineer was on-call offsite; physical presence was required for a full restart.
    • Escalation to higher-level engineers and the vendor was slow.
    • The Level 3 engineer didn’t recognize the fault message and needed vendor help.
    • System connectivity and data status documentation were unclear.
    • A password-database placement issue delayed restarting one server, as correct credentials couldn’t be validated.
  • Some argue this shows remote-only staffing is insufficient for critical infrastructure; others focus on the need for architectures that support full remote recovery.

Broader statistical and risk discussion

  • A long subthread debates whether shutting down air traffic truly results in “zero excess deaths.”
  • One side cites research around 9/11 suggesting increased road injuries (and likely deaths) when people substituted driving for flying.
  • The other side emphasizes lack of statistically significant excess mortality directly attributable to the airspace shutdown and stresses careful use of “excess deaths” vs raw increases.
  • Meta-point: in safety discussions, statistical significance, causality, and practical risk modeling can diverge.

Software engineering lessons and meta‑discussion

  • Recurring themes:
    • “Falsehoods programmers believe about identifiers” – assuming uniqueness or invariance of human-generated keys.
    • Desire for type systems with physical units and richer invariants; mentions of languages and libraries that support units, but note that mainstream stacks rarely enforce this.
    • Debate over bug-tracker hygiene: whether to keep very old/low-priority bugs open vs close as “won’t fix,” balancing honesty, triage overhead, and the value of long-lived records.
    • Comparisons to chaos engineering (e.g., Netflix’s Chaos Monkey) and periodic synthetic failures to keep fallback paths realistic and well-practiced.