CrowdStrike ex-employees: 'Quality control was not part of our process'

Overall Theme: Speed vs. Quality in a Critical Security Product

  • Many commenters see the outage as strong evidence that velocity was prioritized over quality, especially for “Rapid Response” content.
  • The idea that “quality control wasn’t part of the process” matches multiple readers’ experience of modern tech culture: move fast, cut QA/SDET, let developers absorb testing.
  • Others caution that a single catastrophic event doesn’t prove chronic underinvestment without more data, but agree basic safeguards were clearly missing.

Debate over Ex-Employee Testimony

  • Some dismiss the article’s reliance on former employees, arguing they may be disgruntled, biased, or far from kernel work (e.g., UX).
  • Others counter that:
    • The RCA already confirms serious process failures.
    • Multiple ex-employees across roles reporting consistent issues is meaningful signal.
    • Corporate PR has its own, stronger bias.
  • Several note explicit examples from the article where ex-employee claims about product behavior are weakly or inconsistently rebutted by the company.

Technical and Process Failures

  • Key points drawn from the RCA and discussion:
    • Rapid Response content bypassed the staged rollout and dogfooding used for full sensor releases.
    • A validator bug allowed malformed content through, crashing a kernel driver that poorly handled invalid input.
    • Configuration parsing in a kernel module, lack of bounds checks, and insufficient test coverage are seen as fundamental engineering failures.
    • Commenters stress that even “data” updates can be as dangerous as code and must be treated as untrusted input.

Previous Linux Incident and Failure to Generalize

  • A prior Linux bricking incident is discussed: some blame an upstream kernel regression; others argue the lesson should still have been “never push globally without strong testing and rollback.”
  • Point made that you don’t just fix the specific failure, you harden against the entire class of risks.

Industry Culture, Regulation, and Accountability

  • Many say this is typical of large software orgs: weak QA, hero culture, incentives to hide problems rather than prevent them.
  • Comparisons are drawn to aviation, building codes, and financial trading systems where regulation, independent postmortems, and professional licensing enforce quality.
  • Several advocate similar regulation for critical software and even licensure for software engineers working on safety/security-critical systems.

Security Tool Data Collection and Secrets

  • A side thread highlights that the macOS agent sends environment variables (including secrets) to a cloud SIEM:
    • Some say this is standard for EDR/SIEM and that the SIEM or customer should mask sensitive data.
    • Others argue plaintext secrets in centralized logs are a serious design and compliance problem, especially under regimes like PCI and GDPR.

Impact, Market, and Alternatives

  • Anecdotes describe significant real-world harm (e.g., delayed surgeries) beyond financial loss.
  • Commenters note the outage is effectively a massive self-inflicted denial of service.
  • Despite the incident, the company’s market position remains strong, attributed to compliance and insurer pressure and a lack of clear drop-in alternatives.
  • Alternatives mentioned: Microsoft Defender/Defender for Endpoint and Sentinel, SentinelOne, Carbon Black, or in-house capability—though insurance and regulations often require third-party EDR.