2024-10-11

$2 H100s: How the GPU Rental Bubble Burst

Title & pricing confusion

Many readers initially misread “$2 H100s” as a purchase price, not rental, and called the original title click‑baity.
Once clarified as ~$2/hour rental, most agreed this is still a sharp drop from ~$8/hour and notable for the market.

Economics of $2/hr (and below)

Back‑of‑the‑envelope math: at ~$50k per H100, $2/hr implies ~3–3.5 years to break even assuming near‑full utilization, not counting power, networking, rack space, labor, or financing.
Some argue $2/hr (or promotional ~$1–1.5/hr) must be loss‑making unless hardware or space/power is effectively “free,” or the provider is burning VC money.
Others counter that once hardware is bought, sunk cost is sunk: renting at or near marginal cost (power, ops) is rational to minimize losses versus idling or fire‑selling.

Capacity, clustering, and real constraints

Key distinction: single PCIe H100s vs tightly networked SXM/Infiniband clusters. The latter (for large‑scale training) still command higher prices.
Comments stress that “$2/hr H100” often means constraints: PCIe cards, weaker networking, spot‑style capacity, beta‑quality uptime, or limited user access.
Some say the real money is in renting whole, reliable multi‑node clusters; cheap single‑GPU time is a side effect of over‑reserved capacity.

Bubble, AI winter, and Nvidia

Opinions split on whether this signals an AI infra bubble bursting or just a normal boom‑bust supply cycle like oil or crypto mining.
Some are short or cautious on Nvidia at its current valuation; others argue demand will rebound as cheaper compute spurs more fine‑tuning and inference.
Several note that next‑gen GPUs (Blackwell, MI300X, etc.) and ongoing foundational‑model races will keep aggregate compute demand high.

Open models, training vs. inference

Broad agreement that open‑weights models (e.g., Llama families) reduce the need for most companies to train from scratch.
Thread consensus: only a small number of teams globally need very large H100 clusters for new foundation models; most just fine‑tune and serve inference.
This shift depresses demand for top‑end training clusters while expanding use of smaller, cheaper GPU setups.

Infrastructure, reliability, and cloud vs. bare metal

Debate over whether low‑cost providers are inherently unreliable versus “good enough” for checkpointed training workloads.
Some emphasize that hyperscaler‑style uptime and 45kW+ racks are extremely expensive; others say cloud reliability is oversold and many on‑prem setups could match it.
Power cost per H100 is relatively small compared to capex and depreciation; rack density, networking, and financing dominate economics.

Related topics