AWS multiple services outage in us-east-1

Immediate symptoms & root cause

  • Many reported simultaneous failures across DynamoDB, RDS Proxy, Lambda, SES, SQS, Managed Kafka, STS, IAM, EKS visibility, and AWS console sign-in, primarily in us-east-1.
  • Early debugging by users showed dynamodb.us-east-1.amazonaws.com not resolving; manually forcing it to an IP restored access for some.
  • AWS later confirmed the issue was “related to DNS resolution of the DynamoDB API endpoint in US-EAST-1,” followed by a statement that the “underlying DNS issue has been fully mitigated,” though backlogs and throttling persisted (e.g., EC2 launches).

Blast radius across the internet

  • A large number of external services were degraded or down: Docker Hub, npm/pnpm, Vercel, Twilio, Slack, Signal, Zoom, Jira/Confluence/Bitbucket, Atlassian StatusPage, Coinbase, payment providers, AI services, messaging tools, status pages themselves, and even consumer apps (Ring, Alexa, Robinhood, gaming, media, banking).
  • Many organizations in other AWS regions (EU, APAC) saw secondary failures via IAM/STS, control planes, or dependencies on third‑party vendors hosted in us-east-1.

us-east-1 as systemic weak point

  • Commenters repeatedly describe us-east-1 as historically the least stable and also uniquely central: many “global” control planes (IAM writes, Organizations, Route53 control, CloudFront/ACM, some consoles) still depend on it.
  • This leads to the perception that “you can’t fully escape us-east-1” even if workloads are elsewhere, and that outages there can have global effects.

Architecture, redundancy, and reality

  • Many note AWS services are layered on a few core primitives (DynamoDB, S3, EC2, Lambda), so failure in one plus DNS can cascade widely; cyclic or hidden dependencies are suspected.
  • There is broad agreement that true multi‑region or multi‑cloud HA with strong consistency is difficult and costly (active‑active RDBMS, CAP tradeoffs, data replication, traffic charges, app redesign).
  • Some argue most businesses don’t need extreme nines and should pragmatically accept rare regional outages; others counter that critical systems (finance, infra, security) must build independent DR across providers.

Self‑hosting and alternative providers

  • Several report long, uneventful uptime on bare metal or low‑cost providers (e.g., Hetzner, Netcup), often at a fraction of AWS cost; some note that even simple on‑prem setups or Raspberry Pis outlived multiple us-east-1 incidents.
  • Skeptics reply that managed services (especially databases) and global scale justify AWS’s complexity and price; running equivalent HA stacks yourself requires serious ops expertise.

SLAs, status pages, and incentives

  • Commenters are cynical about cloud SLAs and compensation (typically credits, no real liability) and about status pages that lag reality or remain misleadingly green.
  • Several emphasize that a key “benefit” of AWS is political: when a hyperscaler fails, everyone is down together and blame is deflected from internal teams, which strongly shapes executive preferences.