2026-02-05

Don't rent the cloud, own instead

Risk, reliability, and disaster planning

Multiple commenters ask how a single in‑office data center handles disasters: fire, flooding, power failure, earthquakes.
Past incidents (e.g., OVH fire, burst pipes) are cited to argue that “one DC” without geographic redundancy is inherently fragile; many say “you need at least two.”
Some note comma’s workloads are offline training rather than user-facing, so weeks of downtime may be tolerable if offsite backups exist.
Others question humidity and “outside air” cooling, pointing to ASHRAE guidelines and long‑term hardware damage from dust, static, and moisture.

Cloud vs on‑prem economics

Repeated theme: at large, steady GPU/HPC scale, on‑prem is dramatically cheaper than hyperscale cloud (10–20× is mentioned).
Counterpoint: risk‑adjusted and bureaucracy‑adjusted costs often favor opex cloud, especially for public sector and mid‑sized enterprises that struggle to get capex approved.
Several note cloud TCO calculators heavily overestimate on‑prem costs and assume very high hardware prices and labor. Others argue many orgs undercount real on‑prem work (24/7, spares, security, audits).
Capex vs opex is framed as partly accounting/political: recurring SaaS and cloud line items are often easier to approve than a big one‑time spend, regardless of pure math.

Colocation, bare metal, and “managed private cloud”

Many suggest intermediate options: colocation with owned servers, rented dedicated servers (e.g., Hetzner/OVH), or third‑party “managed private cloud” on bare metal.
These are described as giving 50–90% of the savings of full on‑prem with far less operational burden, especially if paired with Kubernetes or similar orchestration.
Real‑world anecdotes: multi‑rack colos saving millions vs cloud; others saying colo in expensive cities can approach cloud pricing.

Operational complexity and skills

One camp insists running servers/colos is “not that hard” and that cloud operational work (APIs, managed services, outages) is comparably complex.
The other camp highlights hidden work: 24/7 on‑call, hardware failures, backups, DB management, security hardening, audits, and the pain when senior infra people leave.
Several point out that you don’t escape ops by using cloud—you just shift it from racking to managing complex cloud stacks and proprietary services.

Startups, scale, and lock‑in

Common model described: start on cloud to validate product; consider bare metal/colo/on‑prem only once infra spend is in the “multiple FTEs per year” range.
Some warn that easy cloud onboarding plus proprietary managed services create lock‑in, making later migration very hard and expensive.
For “compute‑native” companies (ML training, HPC), on‑prem or colo is seen as a core competency and a major competitive lever; for most SaaS or line‑of‑business apps, the risk of running a DC is viewed as unjustified.

Engineering culture, incentives, and sovereignty

Supporters of owning hardware stress: deeper technical skills, better optimization incentives when compute is fixed, and psychological benefits of control.
Skeptics argue many orgs don’t have the talent or desire; they should focus on product, not “building their own Jira and their own data center.”
EU commenters note sovereignty and US CLOUD Act concerns as an additional driver for on‑prem, EU clouds, or research HPC, especially for health/financial data.

Related topics