Incident Report: Railway Blocked by Google Cloud [resolved]
Incident and suspected cause
- Railway’s outage traced to its GCP account being put into a “restricted” state; a GCP project was reportedly deleted without warning, removing CloudSQL, overflow VMs, and API access.
- Railway reps say they had prior assurances from Google after an earlier auto‑rate‑limit incident that this wouldn’t recur, and that restoration took minutes once a bug was filed, but damage to customers lasted hours.
- Exact trigger is still unclear in the thread (possibilities floated: abuse reports, payment issues, anti‑fraud/AI systems, or customer workloads), and several commenters stress we only see one side.
Railway architecture and “not a cloud on a cloud”
- Railway markets itself as owning its own metal and not “building a cloud on another cloud.”
- Commenters discover core databases and some networking still depended on GCP, contradicting that narrative in their view.
- Railway explains they exited most compute to their own DCs (plus AWS), but left DBs on CloudSQL for HA/replication and to avoid circular dependency on their own infra; in hindsight this became the critical single point of failure.
- Some see this as understandable technical tradeoff; others call it deceptive or dangerously backwards (DB last to migrate).
Trust in GCP: bans and support
- Many recount prior GCP suspensions (including smaller accounts and a Korean government org) and the UniSuper incident where a misconfig deleted a whole private cloud subscription.
- General themes: aggressive automated enforcement, weak human support/CSM effectiveness, and fear that even high‑spend accounts can be “auto‑yeeted.”
- A minority report good GCP relationships and argue such blow‑ups usually follow earlier warning signs or poor account hygiene.
Redundancy, multi‑cloud, and backups
- Strong chorus: “all eggs in one basket” is risky, especially for critical control‑plane components like auth, DNS, and primary DBs.
- Others counter that true multi‑cloud is extremely rare, complex, and often unjustified for startups; you typically “start with one egg.”
- Several emphasize off‑provider backups and separate billing entities; 3‑2‑1 backup interpreted as “different accounts/providers,” not just extra buckets.
- Discussion notes that shutting an account/subscription can be a global single point of failure despite multi‑region setups.
Comparisons to AWS/Azure and other hosts
- Many say they’ve never seen AWS/Azure silently nuke accounts at this scale; AWS is criticized for regional outages (especially us‑east‑1) but praised for warnings and softer enforcement.
- Some note other providers (Hetzner, OVH) are also aggressive on KYC/abuse; AWS/Azure are framed as the safer outliers for account risk.
- Alternatives floated: Render, Vercel, Fly.io, DigitalOcean, Hetzner, Coolify, self‑hosted/colo, even rsync‑style offsite storage.
User impact and reactions
- Hobbyists and small customers experienced long downtime, invalid TLS certs, and 502s; some had to manually redeploy even after Railway marked the incident “resolved.”
- New and existing customers describe this as a “wake‑up call”; several immediately migrated to other platforms, saying trust is broken.
- Others express continued sympathy for Railway but resolve not to run serious businesses on such a young platform.
Abuse, anti‑fraud, and free tiers
- Some operators complain about heavy spam/abuse from Railway IPs and say its abuse prevention is weak.
- Railway previously acknowledged internal anti‑fraud misfires that “hard killed” legitimate workloads.
- Broader debate: free/cheap compute inevitably attracts abuse; strict KYC and anti‑fraud reduce that but hurt growth and UX.