AWS multiple services outage in us-east-1
Immediate symptoms & root cause
- Many reported simultaneous failures across DynamoDB, RDS Proxy, Lambda, SES, SQS, Managed Kafka, STS, IAM, EKS visibility, and AWS console sign-in, primarily in us-east-1.
- Early debugging by users showed
dynamodb.us-east-1.amazonaws.comnot resolving; manually forcing it to an IP restored access for some. - AWS later confirmed the issue was “related to DNS resolution of the DynamoDB API endpoint in US-EAST-1,” followed by a statement that the “underlying DNS issue has been fully mitigated,” though backlogs and throttling persisted (e.g., EC2 launches).
Blast radius across the internet
- A large number of external services were degraded or down: Docker Hub, npm/pnpm, Vercel, Twilio, Slack, Signal, Zoom, Jira/Confluence/Bitbucket, Atlassian StatusPage, Coinbase, payment providers, AI services, messaging tools, status pages themselves, and even consumer apps (Ring, Alexa, Robinhood, gaming, media, banking).
- Many organizations in other AWS regions (EU, APAC) saw secondary failures via IAM/STS, control planes, or dependencies on third‑party vendors hosted in us-east-1.
us-east-1 as systemic weak point
- Commenters repeatedly describe us-east-1 as historically the least stable and also uniquely central: many “global” control planes (IAM writes, Organizations, Route53 control, CloudFront/ACM, some consoles) still depend on it.
- This leads to the perception that “you can’t fully escape us-east-1” even if workloads are elsewhere, and that outages there can have global effects.
Architecture, redundancy, and reality
- Many note AWS services are layered on a few core primitives (DynamoDB, S3, EC2, Lambda), so failure in one plus DNS can cascade widely; cyclic or hidden dependencies are suspected.
- There is broad agreement that true multi‑region or multi‑cloud HA with strong consistency is difficult and costly (active‑active RDBMS, CAP tradeoffs, data replication, traffic charges, app redesign).
- Some argue most businesses don’t need extreme nines and should pragmatically accept rare regional outages; others counter that critical systems (finance, infra, security) must build independent DR across providers.
Self‑hosting and alternative providers
- Several report long, uneventful uptime on bare metal or low‑cost providers (e.g., Hetzner, Netcup), often at a fraction of AWS cost; some note that even simple on‑prem setups or Raspberry Pis outlived multiple us-east-1 incidents.
- Skeptics reply that managed services (especially databases) and global scale justify AWS’s complexity and price; running equivalent HA stacks yourself requires serious ops expertise.
SLAs, status pages, and incentives
- Commenters are cynical about cloud SLAs and compensation (typically credits, no real liability) and about status pages that lag reality or remain misleadingly green.
- Several emphasize that a key “benefit” of AWS is political: when a hyperscaler fails, everyone is down together and blame is deflected from internal teams, which strongly shapes executive preferences.