Cloudflare Global Network experiencing issues
Outage scope and symptoms
- Users worldwide report widespread 500/5xx errors from multiple Cloudflare POPs (London, Manchester, Warsaw, Sydney, Singapore, US, etc.), often with Cloudflare’s own error page explicitly blaming itself.
- Behavior is flappy: services go up/down repeatedly over ~30–60 minutes; different regions and products (proxy, DNS, Turnstile, WARP, dashboard) are affected unevenly.
- Many major sites and SaaS tools are down or degraded: X/Twitter, ChatGPT, Claude, Supabase, npmjs, uptime monitors, down-checker sites, some government and transport sites, and status pages themselves.
- Cloudflare challenges/Turnstile failures block access and logins even to sites not otherwise proxied by Cloudflare, including Cloudflare’s own dashboard.
Speculation on root cause
- Users speculate about:
- A control plane or routing/BGP issue propagating bad config globally.
- A DNS or network-layer failure (“Cloudflare Global Network” component shows as offline).
- Possible link to scheduled maintenance.
- A large DDoS (especially in light of recent Azure/AWS issues), though several point out there is no evidence yet; others expect a postmortem to clarify.
- Some note WARP/Access-specific messages on the status page and wonder if internal routing or VPN-related changes backfired.
Status pages and communication
- Status page lagged incident by tens of minutes; initially showed all green except minor items and maintenance, prompting criticism that status pages are “marketing” and legally constrained.
- Others argue fully automated, accurate status pages at this scale are effectively impossible; a human always has to interpret noisy signals.
Developer experience and “phewphoria”
- Many initially blamed their own deployments, restarted servers, or feared misconfigurations before discovering it was Cloudflare.
- Discussion coins or refines a feeling of relief when it’s not your fault (“phewphoria”), but some prefer problems they caused themselves because they can at least fix them.
- Management pressure and SLA expectations resurface; teams use global outages as leverage to justify redundancy work or to calm executives.
Centralization, risk, and tradeoffs
- Strong concern that Cloudflare (plus AWS/Azure) has become a systemic single point of failure; outages now feel like “turning off the internet.”
- Counterpoint: many small and medium sites need Cloudflare-like DDoS protection and bot filtering (especially against AI scrapers), and are still better off with occasional global CF outages than constant bespoke defense.
- Debate over:
- Using Cloudflare as both registrar, DNS, and CDN (hard to escape during outages).
- Having fallbacks: alternative CDNs (e.g., Bunny), on-prem or VPS setups, multi-CDN/multi-cloud, separate status-page hosting.
- Whether most sites actually need Cloudflare versus simpler hosting, caching, and local WAFs.
Broader lessons
- Outage reinforces:
- The fragility created by centralizing so much traffic and security behind one provider.
- The difficulty of avoiding single points of failure in practice, even for “multi-cloud” setups that still bottleneck through Cloudflare.
- The informal role of HN as a de facto, independent “status page” for major internet incidents.