2025-02-14

We were wrong about GPUs

Nvidia, Virtualization, and Why GPUs Were Hard on Fly

Several comments dig into Fly’s technical story: Nvidia’s vGPU licensing and “phone‑home” checks don’t mesh with Fly’s fast‑start microVM model.
MIG is described as paravirtualized and tied to Nvidia’s userland stack, not clean PCI devices, making secure cross‑VM sharing difficult without heavy custom work.
Ideas like virtio‑cuda, using Nvidia’s vCS via QEMU, or disaggregated emulation are discussed, but generally seen as high‑maintenance and possibly in conflict with Nvidia’s terms.
Some argue QEMU startup cost is overstated and that Fly’s Cloud Hypervisor work essentially rebuilt similar VFIO‑style plumbing.

Mismatch Between Fly’s Users and GPU Demand

A recurring theme: Fly’s core audience wants a PaaS‑like “git push” DX, not low‑level GPU primitives.
Commenters say GPU buyers either want: a) big, dedicated clusters for heavy training/inference, or b) fully managed LLM APIs. Fly sits awkwardly between.
People note that customers who pay hyperscaler‑level GPU prices usually prefer hyperscalers or specialist GPU clouds, not a mid‑tier app platform.

Cost, Reliability, and Alternatives

Hobbyists and small teams largely find Fly (and its GPUs) too expensive versus homelabs, cheap VPSes, or dedicated servers; GPU marketplaces like Runpod, Vast, Voltage Park, and others are frequently cited.
Some praise Fly’s GPU DX (fast on‑demand machines, simple CLI) but say ongoing costs and storage pricing make continuous or casual use hard to justify.
There is skepticism about Fly’s overall reliability history; Fly staff claim it has improved and emphasize autosuspend/auto‑stop as key to cost control.

Do Developers Want GPUs or Just LLMs?

Many agree with the article’s claim that most developers “want LLMs, not GPUs”: they’d rather call OpenAI/Anthropic/Cloudflare Workers AI than manage drivers, models, and cold starts.
Others push back, citing non‑LLM GPU use (vision, “classic” ML, data science) and open‑source LLM self‑hosting as real but more niche workloads.
There’s broad agreement that GPU serverless suffers from long cold starts and that today’s API pricing and performance are “good enough” for many apps.

Fly’s Positioning and Takeaways

Several commenters say the outcome was predictable: Fly’s brand and DX attract app developers, not infra buyers; succeeding in GPUs would require a different product and focus.
Others think Fly exited too early, arguing demand for simpler private LLM and ML pipelines is only beginning.
The candid “we were wrong” post is widely respected, but many frame this as a classic product‑market fit miss, not a verdict on cloud GPUs in general.

Related topics