Decisions that eroded trust in Azure – by a former Azure Core engineer
Perceived Azure Reliability vs Competitors
- Many commenters report Azure as noticeably less reliable than AWS and GCP: more provider-caused incidents, opaque or delayed RCAs, and frequent “eventual consistency”/race-condition style glitches.
- Several SREs running multi-cloud say a large majority of provider-originated incidents come from Azure, even on simple VM/LB/Kubernetes workloads.
- A minority report long-term, trouble-free use for basic VM / DB / simple app workloads and argue all clouds have serious issues.
Security and Architecture Concerns
- The Azure Instance Metadata Service design drew heavy criticism: running on the host side, mixing tenant data in shared memory, and being accessible over unauthenticated HTTP is seen as a big multi-tenant risk and highly SSRF-prone.
- Manual “break glass” / “digital escort” access to production, including for sensitive government workloads, is widely seen as a red flag; some cite linked investigative reporting as evidence of national-security relevance.
- Others note AWS/GCP have similar “break glass” concepts, but with stricter scoping and auditing.
Org Culture, Management, and Escalation
- Recurrent themes: chronic understaffing, high churn, title inflation, weak ownership, and extreme risk aversion that blocks refactoring (“too risky to change anything”).
- Several current/former big-tech engineers say raising systemic risks up the chain often gets you ignored, labeled difficult, or pushed out; some see emailing the board as naïve but understandable.
- Commenters tie this to broader corporate incentives: feature velocity and short-term metrics trump reliability and security.
Language, Technical Debt, and Refactoring
- Debate over the “rewrite it in Rust” mandate: many see language choice as secondary to organizational dysfunction and lack of testing; others argue memory-safe languages materially reduce certain classes of bugs.
- Heavy Rust crate dependency trees are criticized as a supply-chain risk; others note many crates are maintained by a small number of known teams.
Customer Experiences on Azure
- Numerous anecdotes: flaky AKS, failing disk attachments, random 400s on identical API calls, slow or broken managed Postgres, APIM outages, Power Platform instability, and mismatched or AI-ish docs.
- One especially worrying report: Azure OpenAI returning other customers’ prompts/responses under load; commenters treat this as a severe isolation failure.
- A few users say core VMs/LBs and newer managed offerings (e.g., Postgres flexible server, Functions for simple use cases) work acceptably.
Why Azure Still Wins Deals
- Explanations center on sales and contracts: bundling with Office/Entra, generous credits, “multi-cloud” narratives, and non-technical executives seeking a “safe” big vendor.
- Many argue Microsoft’s strength is procurement and lock-in, not technical excellence; switching away is seen as costly and politically hard.
Reactions to the Article and Author
- Some view the series as a credible, much-needed whistleblow that explains long-standing pain with Azure.
- Others criticize the tone as dramatized or egocentric, and see public airing + board escalation as career suicide and evidence of poor “organizational skills.”
- Several note these dynamics (technical debt, manual ops, ignored risk) are common in other large organizations, not unique to Azure.