AWS outage shows internet users 'at mercy' of too few providers, experts say
Scale and Centralization of AWS
- Commenters highlight how much traffic runs through AWS (and CloudFront/Cloudflare), arguing this concentrates systemic risk in a few “sheds in Virginia.”
- Some see this as basic economics: low distribution cost → power-law winners (AWS/Azure/GCP).
- Others note that many non-cloud options still exist (colo, bare metal, VPS), and centralization is as much lock‑in and marketing as pure technical merit.
Nature of the Outage (us-east-1)
- Many stress it was not a total regional blackout: existing EC2/Fargate workloads mostly kept running; control planes and some “global” services failed.
- IAM, STS, Lambda, SQS, DynamoDB, EC2 launches, and CloudWatch visibility were common pain points.
- Several teams discovered hidden dependencies on us-east-1 endpoints (e.g., IAM), even for workloads in other regions.
Lock-In, Data Gravity, and Cost
- Large datasets (terabytes to hundreds of terabytes in S3) are cited as the main practical lock-in, not compute.
- Cross-region or multi-cloud replication is considered prohibitively expensive for many, especially due to storage and egress.
- Some mention that competitors or AWS will sometimes eat egress fees for migrations, but ongoing duplication cost and complexity remain.
Multi-Cloud / Multi-Region Resilience
- Broad agreement that true multi-cloud resilience is rare: cognitive overhead, provider differences, orchestration pain, and data consistency issues.
- Cross-region designs are also hard: stateful systems, eventual consistency, and replay/merge of writes after failover.
- Many companies consciously accept rare regional outages as a business tradeoff; others argue they misjudge risk and never properly test failover.
Containers and Cloud Lock-In
- One view: Docker normalized “just ship a container and let the cloud handle storage/infra,” encouraging deeper reliance on proprietary services.
- Counterview: containers are orthogonal to storage, reduce host-management toil, and actually make it easier to move between clouds or on-prem.
Alternatives and On-Prem
- Some advocate VPS/local providers or colo to reduce correlated failures and costs, but acknowledge higher operational burden.
- Others share that on-prem/colo setups often had more and longer outages due to limited in-house expertise and slower incident response.
Policy, “Experts,” and Systemic Risk
- Several criticize media “experts” as non-technical policy or legal figures; others defend their role in assessing geopolitical/systemic dependence on foreign hyperscalers.
- A recurring theme: AWS is likely still more reliable than most alternatives; the real issue is how customers architect and test their systems.