CrowdStrike accepting the PwnieAwards for "most epic fail" at defcon
Reactions to accepting the PwnieAward
- Many see showing up and accepting the “most epic fail” award as the least-bad PR option: refusing would look evasive, attending allows public contrition and a reminder to staff.
- Others call it “tone deaf” and trivializing a catastrophe; they view it as laughing off a disaster that caused global disruption.
- Some note the acceptance speech came across as sober and self‑critical, not jokey; critics respond that context (a fun con talk, applause, trophy) makes it inappropriate regardless of tone.
Human impact and seriousness of the outage
- Commenters describe severe real‑world impact: grounded flights, hospital and ER disruptions, 911 outages, pharmacy issues, lost business and productivity.
- Debate over deaths: some are “certain” people died indirectly (delayed care, emergency stress), others say no concrete evidence has surfaced and stress that hospitals have downtime procedures.
- Several note that even “just” elevated stress and missed life events (funerals, last goodbyes, surgeries) are serious harms.
Liability, lawsuits, and contracts
- Many ask why there are few visible lawsuits given claimed losses in the billions.
- CrowdStrike contracts reportedly cap liability to low millions; some argue that won’t withstand “gross negligence” claims, especially from insurers.
- Delta’s suit and CS’s public response are discussed: CS points to contractual caps, hints at aggressive discovery into Delta’s IT practices, and suggests Delta’s prolonged outage was partly its own fault.
- Some expect insurers and reinsurers to be the main drivers of any serious reckoning, e.g., by surcharging or refusing coverage when CS is in the stack.
Responsibility: CrowdStrike vs customers
- Strong consensus that CS’s process was egregious: an update that crashes essentially 100% of target Windows systems implies fundamental testing and rollout failures.
- Key detail: the “rapid response” update apparently bypassed customers’ usual staged rollout controls, leaving them unable to canary it.
- Others argue enterprises also bear blame for:
- Allowing a third‑party kernel driver to be a single point of failure on critical systems.
- Not designing fallback procedures and “analog” continuity plans robust enough for such outages.
- Over‑relying on cloud and endpoint tools to satisfy auditors and insurers, not genuine risk analysis.
Software vs civil engineering and calls for accountability
- Large sub‑thread compares software to civil engineering:
- One side: bridges have clear standards, licensing, and personal liability; software should evolve similar norms, especially for life‑critical systems.
- Opposing view: software changes too fast, is vastly more complex, and is attacked continuously; perfect safety is impossible and over‑regulation would cripple competitiveness.
- Some advocate for a professionalized “real engineering” tier with licenses and sign‑off liability for safety‑critical code; others warn it would mostly create rent‑seeking gatekeepers and push innovation offshore.
Security tooling, SPOFs, and industry incentives
- Many criticize the entire model of managed endpoint security:
- Closed‑source kernel code parsing untrusted input is seen as inherently dangerous.
- Centralized products that can remotely brick all endpoints are called “security single points of failure.”
- Commenters note that many organizations deploy such tools mainly to tick compliance/insurance boxes; the risk of catastrophic vendor failure was underappreciated.
- Some argue that if a system is truly life‑critical, running networked Windows with third‑party kernel agents is itself negligent, regardless of CS’s bug.
What consequences should follow
- Views range from:
- “Nuke the company” / bankrupt and reconstitute it as a warning,
- To “fix the processes, don’t scapegoat individuals,” similar to how some large outages at other providers were handled.
- Skeptics doubt meaningful change will occur without:
- Legal liability that survives EULAs and caps.
- Insurance pressure that makes unsafe stacks uninsurable.
- Cultural shift away from “move fast and break things” toward genuine engineering discipline.