We were wrong about GPUs

Nvidia, Virtualization, and Why GPUs Were Hard on Fly

  • Several comments dig into Fly’s technical story: Nvidia’s vGPU licensing and “phone‑home” checks don’t mesh with Fly’s fast‑start microVM model.
  • MIG is described as paravirtualized and tied to Nvidia’s userland stack, not clean PCI devices, making secure cross‑VM sharing difficult without heavy custom work.
  • Ideas like virtio‑cuda, using Nvidia’s vCS via QEMU, or disaggregated emulation are discussed, but generally seen as high‑maintenance and possibly in conflict with Nvidia’s terms.
  • Some argue QEMU startup cost is overstated and that Fly’s Cloud Hypervisor work essentially rebuilt similar VFIO‑style plumbing.

Mismatch Between Fly’s Users and GPU Demand

  • A recurring theme: Fly’s core audience wants a PaaS‑like “git push” DX, not low‑level GPU primitives.
  • Commenters say GPU buyers either want: a) big, dedicated clusters for heavy training/inference, or b) fully managed LLM APIs. Fly sits awkwardly between.
  • People note that customers who pay hyperscaler‑level GPU prices usually prefer hyperscalers or specialist GPU clouds, not a mid‑tier app platform.

Cost, Reliability, and Alternatives

  • Hobbyists and small teams largely find Fly (and its GPUs) too expensive versus homelabs, cheap VPSes, or dedicated servers; GPU marketplaces like Runpod, Vast, Voltage Park, and others are frequently cited.
  • Some praise Fly’s GPU DX (fast on‑demand machines, simple CLI) but say ongoing costs and storage pricing make continuous or casual use hard to justify.
  • There is skepticism about Fly’s overall reliability history; Fly staff claim it has improved and emphasize autosuspend/auto‑stop as key to cost control.

Do Developers Want GPUs or Just LLMs?

  • Many agree with the article’s claim that most developers “want LLMs, not GPUs”: they’d rather call OpenAI/Anthropic/Cloudflare Workers AI than manage drivers, models, and cold starts.
  • Others push back, citing non‑LLM GPU use (vision, “classic” ML, data science) and open‑source LLM self‑hosting as real but more niche workloads.
  • There’s broad agreement that GPU serverless suffers from long cold starts and that today’s API pricing and performance are “good enough” for many apps.

Fly’s Positioning and Takeaways

  • Several commenters say the outcome was predictable: Fly’s brand and DX attract app developers, not infra buyers; succeeding in GPUs would require a different product and focus.
  • Others think Fly exited too early, arguing demand for simpler private LLM and ML pipelines is only beginning.
  • The candid “we were wrong” post is widely respected, but many frame this as a classic product‑market fit miss, not a verdict on cloud GPUs in general.