2026-05-20

Incident Report: Railway Blocked by Google Cloud [resolved]

Incident and suspected cause

Railway’s outage traced to its GCP account being put into a “restricted” state; a GCP project was reportedly deleted without warning, removing CloudSQL, overflow VMs, and API access.
Railway reps say they had prior assurances from Google after an earlier auto‑rate‑limit incident that this wouldn’t recur, and that restoration took minutes once a bug was filed, but damage to customers lasted hours.
Exact trigger is still unclear in the thread (possibilities floated: abuse reports, payment issues, anti‑fraud/AI systems, or customer workloads), and several commenters stress we only see one side.

Railway architecture and “not a cloud on a cloud”

Railway markets itself as owning its own metal and not “building a cloud on another cloud.”
Commenters discover core databases and some networking still depended on GCP, contradicting that narrative in their view.
Railway explains they exited most compute to their own DCs (plus AWS), but left DBs on CloudSQL for HA/replication and to avoid circular dependency on their own infra; in hindsight this became the critical single point of failure.
Some see this as understandable technical tradeoff; others call it deceptive or dangerously backwards (DB last to migrate).

Trust in GCP: bans and support

Many recount prior GCP suspensions (including smaller accounts and a Korean government org) and the UniSuper incident where a misconfig deleted a whole private cloud subscription.
General themes: aggressive automated enforcement, weak human support/CSM effectiveness, and fear that even high‑spend accounts can be “auto‑yeeted.”
A minority report good GCP relationships and argue such blow‑ups usually follow earlier warning signs or poor account hygiene.

Redundancy, multi‑cloud, and backups

Strong chorus: “all eggs in one basket” is risky, especially for critical control‑plane components like auth, DNS, and primary DBs.
Others counter that true multi‑cloud is extremely rare, complex, and often unjustified for startups; you typically “start with one egg.”
Several emphasize off‑provider backups and separate billing entities; 3‑2‑1 backup interpreted as “different accounts/providers,” not just extra buckets.
Discussion notes that shutting an account/subscription can be a global single point of failure despite multi‑region setups.

Comparisons to AWS/Azure and other hosts

Many say they’ve never seen AWS/Azure silently nuke accounts at this scale; AWS is criticized for regional outages (especially us‑east‑1) but praised for warnings and softer enforcement.
Some note other providers (Hetzner, OVH) are also aggressive on KYC/abuse; AWS/Azure are framed as the safer outliers for account risk.
Alternatives floated: Render, Vercel, Fly.io, DigitalOcean, Hetzner, Coolify, self‑hosted/colo, even rsync‑style offsite storage.

User impact and reactions

Hobbyists and small customers experienced long downtime, invalid TLS certs, and 502s; some had to manually redeploy even after Railway marked the incident “resolved.”
New and existing customers describe this as a “wake‑up call”; several immediately migrated to other platforms, saying trust is broken.
Others express continued sympathy for Railway but resolve not to run serious businesses on such a young platform.

Abuse, anti‑fraud, and free tiers

Some operators complain about heavy spam/abuse from Railway IPs and say its abuse prevention is weak.
Railway previously acknowledged internal anti‑fraud misfires that “hard killed” legitimate workloads.
Broader debate: free/cheap compute inevitably attracts abuse; strict KYC and anti‑fraud reduce that but hurt growth and UX.

Related topics