CrowdStrike ex-employees: 'Quality control was not part of our process'
Overall Theme: Speed vs. Quality in a Critical Security Product
- Many commenters see the outage as strong evidence that velocity was prioritized over quality, especially for “Rapid Response” content.
- The idea that “quality control wasn’t part of the process” matches multiple readers’ experience of modern tech culture: move fast, cut QA/SDET, let developers absorb testing.
- Others caution that a single catastrophic event doesn’t prove chronic underinvestment without more data, but agree basic safeguards were clearly missing.
Debate over Ex-Employee Testimony
- Some dismiss the article’s reliance on former employees, arguing they may be disgruntled, biased, or far from kernel work (e.g., UX).
- Others counter that:
- The RCA already confirms serious process failures.
- Multiple ex-employees across roles reporting consistent issues is meaningful signal.
- Corporate PR has its own, stronger bias.
- Several note explicit examples from the article where ex-employee claims about product behavior are weakly or inconsistently rebutted by the company.
Technical and Process Failures
- Key points drawn from the RCA and discussion:
- Rapid Response content bypassed the staged rollout and dogfooding used for full sensor releases.
- A validator bug allowed malformed content through, crashing a kernel driver that poorly handled invalid input.
- Configuration parsing in a kernel module, lack of bounds checks, and insufficient test coverage are seen as fundamental engineering failures.
- Commenters stress that even “data” updates can be as dangerous as code and must be treated as untrusted input.
Previous Linux Incident and Failure to Generalize
- A prior Linux bricking incident is discussed: some blame an upstream kernel regression; others argue the lesson should still have been “never push globally without strong testing and rollback.”
- Point made that you don’t just fix the specific failure, you harden against the entire class of risks.
Industry Culture, Regulation, and Accountability
- Many say this is typical of large software orgs: weak QA, hero culture, incentives to hide problems rather than prevent them.
- Comparisons are drawn to aviation, building codes, and financial trading systems where regulation, independent postmortems, and professional licensing enforce quality.
- Several advocate similar regulation for critical software and even licensure for software engineers working on safety/security-critical systems.
Security Tool Data Collection and Secrets
- A side thread highlights that the macOS agent sends environment variables (including secrets) to a cloud SIEM:
- Some say this is standard for EDR/SIEM and that the SIEM or customer should mask sensitive data.
- Others argue plaintext secrets in centralized logs are a serious design and compliance problem, especially under regimes like PCI and GDPR.
Impact, Market, and Alternatives
- Anecdotes describe significant real-world harm (e.g., delayed surgeries) beyond financial loss.
- Commenters note the outage is effectively a massive self-inflicted denial of service.
- Despite the incident, the company’s market position remains strong, attributed to compliance and insurer pressure and a lack of clear drop-in alternatives.
- Alternatives mentioned: Microsoft Defender/Defender for Endpoint and Sentinel, SentinelOne, Carbon Black, or in-house capability—though insurance and regulations often require third-party EDR.