Multiple airlines disrupted due to Microsoft Azure outage
Scope and impact of the outage
- Commenters report a “worldwide IT outage” far beyond airlines: ATMs, supermarkets, satellite TV, banks, hospitals, train companies, airports (Berlin BER, Melbourne, Palma de Mallorca, ALB, others), and UK retailers’ tills and card machines.
- Some hospitals reportedly delayed surgeries; multiple airports are reverting to manual / paper-based check-in and experiencing long queues.
- Several note that Linux-based systems and Mac endpoints in their environments appear unaffected.
CrowdStrike update and Windows dependency
- Core narrative: a bad CrowdStrike endpoint update causes Windows kernel panics/BSODs (often involving
csagent.sys), leading to boot loops on enterprise Windows machines. - Mac and Linux clients are said to be unaffected or less affected because they do not run this code in kernel mode.
- CrowdStrike is described as widely used enterprise anti‑malware / “endpoint security,” especially in regulated sectors (government, finance, travel), often to satisfy compliance checkboxes.
- Some characterize such tools as de‑facto “malware” that introduces risk while offering mainly audit comfort.
Relationship to Azure outage
- Debate on whether the Azure incident and the CrowdStrike issue are independent:
- Some argue they are separate events; others think Azure’s problems were triggered by massive Windows VM failures or CrowdStrike use within Azure infrastructure.
- The notion that “too many black swans on the same day” suggests a link; later a commenter claims confirmation that the issues are related.
- Technical guesses include crash-looping VMs overloading orchestration systems, and dependencies on Windows-based AD or storage.
Centralization, monoculture, and resilience
- Many see this as a lesson about:
- Windows desktop/server monoculture.
- Reliance on single security vendors.
- Heavy concentration on one cloud (especially Azure).
- Arguments:
- Pro‑cloud: on‑prem environments are usually less reliable; cloud offers better DR.
- Critical: cloud failures are highly correlated, increasing blast radius; on‑prem and heterogeneous stacks (including Linux) showed more resilience here.
- Calls for more decentralization, hybrid/on‑prem resurgence, and “de‑monopolizing” the desktop OS market.
Deployment, QA, and rollout practices
- Strong criticism that an update capable of bricking vast numbers of machines escaped testing.
- Multiple people call out lack of canary or phased rollouts for such a critical kernel-level component, and the dangers of centralized automatic updates without robust safeguards.
Side discussion: blocking of archive.ph
- Brief tangent: some users in Italy report archive.ph being blocked, apparently via government–ISP collaboration; others can access it, suggesting ISP-level or DNS-based blocking.
- Workarounds mentioned include alternative DNS, VPN, or Tor; motivations (copyright vs child abuse material) are reported as unclear/inconsistent in the notices.