Multiple airlines disrupted due to Microsoft Azure outage

Scope and impact of the outage

  • Commenters report a “worldwide IT outage” far beyond airlines: ATMs, supermarkets, satellite TV, banks, hospitals, train companies, airports (Berlin BER, Melbourne, Palma de Mallorca, ALB, others), and UK retailers’ tills and card machines.
  • Some hospitals reportedly delayed surgeries; multiple airports are reverting to manual / paper-based check-in and experiencing long queues.
  • Several note that Linux-based systems and Mac endpoints in their environments appear unaffected.

CrowdStrike update and Windows dependency

  • Core narrative: a bad CrowdStrike endpoint update causes Windows kernel panics/BSODs (often involving csagent.sys), leading to boot loops on enterprise Windows machines.
  • Mac and Linux clients are said to be unaffected or less affected because they do not run this code in kernel mode.
  • CrowdStrike is described as widely used enterprise anti‑malware / “endpoint security,” especially in regulated sectors (government, finance, travel), often to satisfy compliance checkboxes.
  • Some characterize such tools as de‑facto “malware” that introduces risk while offering mainly audit comfort.

Relationship to Azure outage

  • Debate on whether the Azure incident and the CrowdStrike issue are independent:
    • Some argue they are separate events; others think Azure’s problems were triggered by massive Windows VM failures or CrowdStrike use within Azure infrastructure.
    • The notion that “too many black swans on the same day” suggests a link; later a commenter claims confirmation that the issues are related.
  • Technical guesses include crash-looping VMs overloading orchestration systems, and dependencies on Windows-based AD or storage.

Centralization, monoculture, and resilience

  • Many see this as a lesson about:
    • Windows desktop/server monoculture.
    • Reliance on single security vendors.
    • Heavy concentration on one cloud (especially Azure).
  • Arguments:
    • Pro‑cloud: on‑prem environments are usually less reliable; cloud offers better DR.
    • Critical: cloud failures are highly correlated, increasing blast radius; on‑prem and heterogeneous stacks (including Linux) showed more resilience here.
    • Calls for more decentralization, hybrid/on‑prem resurgence, and “de‑monopolizing” the desktop OS market.

Deployment, QA, and rollout practices

  • Strong criticism that an update capable of bricking vast numbers of machines escaped testing.
  • Multiple people call out lack of canary or phased rollouts for such a critical kernel-level component, and the dangers of centralized automatic updates without robust safeguards.

Side discussion: blocking of archive.ph

  • Brief tangent: some users in Italy report archive.ph being blocked, apparently via government–ISP collaboration; others can access it, suggesting ISP-level or DNS-based blocking.
  • Workarounds mentioned include alternative DNS, VPN, or Tor; motivations (copyright vs child abuse material) are reported as unclear/inconsistent in the notices.