2025-10-20

AWS multiple services outage in us-east-1

Immediate symptoms & root cause

Many reported simultaneous failures across DynamoDB, RDS Proxy, Lambda, SES, SQS, Managed Kafka, STS, IAM, EKS visibility, and AWS console sign-in, primarily in us-east-1.
Early debugging by users showed dynamodb.us-east-1.amazonaws.com not resolving; manually forcing it to an IP restored access for some.
AWS later confirmed the issue was “related to DNS resolution of the DynamoDB API endpoint in US-EAST-1,” followed by a statement that the “underlying DNS issue has been fully mitigated,” though backlogs and throttling persisted (e.g., EC2 launches).

Blast radius across the internet

A large number of external services were degraded or down: Docker Hub, npm/pnpm, Vercel, Twilio, Slack, Signal, Zoom, Jira/Confluence/Bitbucket, Atlassian StatusPage, Coinbase, payment providers, AI services, messaging tools, status pages themselves, and even consumer apps (Ring, Alexa, Robinhood, gaming, media, banking).
Many organizations in other AWS regions (EU, APAC) saw secondary failures via IAM/STS, control planes, or dependencies on third‑party vendors hosted in us-east-1.

us-east-1 as systemic weak point

Commenters repeatedly describe us-east-1 as historically the least stable and also uniquely central: many “global” control planes (IAM writes, Organizations, Route53 control, CloudFront/ACM, some consoles) still depend on it.
This leads to the perception that “you can’t fully escape us-east-1” even if workloads are elsewhere, and that outages there can have global effects.

Architecture, redundancy, and reality

Many note AWS services are layered on a few core primitives (DynamoDB, S3, EC2, Lambda), so failure in one plus DNS can cascade widely; cyclic or hidden dependencies are suspected.
There is broad agreement that true multi‑region or multi‑cloud HA with strong consistency is difficult and costly (active‑active RDBMS, CAP tradeoffs, data replication, traffic charges, app redesign).
Some argue most businesses don’t need extreme nines and should pragmatically accept rare regional outages; others counter that critical systems (finance, infra, security) must build independent DR across providers.

Self‑hosting and alternative providers

Several report long, uneventful uptime on bare metal or low‑cost providers (e.g., Hetzner, Netcup), often at a fraction of AWS cost; some note that even simple on‑prem setups or Raspberry Pis outlived multiple us-east-1 incidents.
Skeptics reply that managed services (especially databases) and global scale justify AWS’s complexity and price; running equivalent HA stacks yourself requires serious ops expertise.

SLAs, status pages, and incentives

Commenters are cynical about cloud SLAs and compensation (typically credits, no real liability) and about status pages that lag reality or remain misleadingly green.
Several emphasize that a key “benefit” of AWS is political: when a hyperscaler fails, everyone is down together and blame is deflected from internal teams, which strongly shapes executive preferences.

Related topics