Decisions that eroded trust in Azure – by a former Azure Core engineer

Perceived Azure Reliability vs Competitors

  • Many commenters report Azure as noticeably less reliable than AWS and GCP: more provider-caused incidents, opaque or delayed RCAs, and frequent “eventual consistency”/race-condition style glitches.
  • Several SREs running multi-cloud say a large majority of provider-originated incidents come from Azure, even on simple VM/LB/Kubernetes workloads.
  • A minority report long-term, trouble-free use for basic VM / DB / simple app workloads and argue all clouds have serious issues.

Security and Architecture Concerns

  • The Azure Instance Metadata Service design drew heavy criticism: running on the host side, mixing tenant data in shared memory, and being accessible over unauthenticated HTTP is seen as a big multi-tenant risk and highly SSRF-prone.
  • Manual “break glass” / “digital escort” access to production, including for sensitive government workloads, is widely seen as a red flag; some cite linked investigative reporting as evidence of national-security relevance.
  • Others note AWS/GCP have similar “break glass” concepts, but with stricter scoping and auditing.

Org Culture, Management, and Escalation

  • Recurrent themes: chronic understaffing, high churn, title inflation, weak ownership, and extreme risk aversion that blocks refactoring (“too risky to change anything”).
  • Several current/former big-tech engineers say raising systemic risks up the chain often gets you ignored, labeled difficult, or pushed out; some see emailing the board as naïve but understandable.
  • Commenters tie this to broader corporate incentives: feature velocity and short-term metrics trump reliability and security.

Language, Technical Debt, and Refactoring

  • Debate over the “rewrite it in Rust” mandate: many see language choice as secondary to organizational dysfunction and lack of testing; others argue memory-safe languages materially reduce certain classes of bugs.
  • Heavy Rust crate dependency trees are criticized as a supply-chain risk; others note many crates are maintained by a small number of known teams.

Customer Experiences on Azure

  • Numerous anecdotes: flaky AKS, failing disk attachments, random 400s on identical API calls, slow or broken managed Postgres, APIM outages, Power Platform instability, and mismatched or AI-ish docs.
  • One especially worrying report: Azure OpenAI returning other customers’ prompts/responses under load; commenters treat this as a severe isolation failure.
  • A few users say core VMs/LBs and newer managed offerings (e.g., Postgres flexible server, Functions for simple use cases) work acceptably.

Why Azure Still Wins Deals

  • Explanations center on sales and contracts: bundling with Office/Entra, generous credits, “multi-cloud” narratives, and non-technical executives seeking a “safe” big vendor.
  • Many argue Microsoft’s strength is procurement and lock-in, not technical excellence; switching away is seen as costly and politically hard.

Reactions to the Article and Author

  • Some view the series as a credible, much-needed whistleblow that explains long-standing pain with Azure.
  • Others criticize the tone as dramatized or egocentric, and see public airing + board escalation as career suicide and evidence of poor “organizational skills.”
  • Several note these dynamics (technical debt, manual ops, ignored risk) are common in other large organizations, not unique to Azure.