Show HN: Attaching to a virtual GPU over TCP

How it works

  • Client runs code on a CPU-only Linux machine; a userspace library intercepts CUDA / GPU API calls.
  • Intercepted calls and GPU commands are forwarded over TCP to a remote GPU, making the local system believe a GPU is directly attached.
  • No kernel drivers or eBPF are involved; it’s LD_PRELOAD-style API remoting, not PCIe tunneling.
  • Only GPU-related state lives remotely; files, packages, and CPU execution stay on the client instance.

Performance, latency, and bottlenecks

  • Network adds latency, especially for GPU VRAM ↔ RAM transfers and training workloads.
  • Provider claims inference performance (e.g., BERT) is close to local, with training slower but being optimized.
  • Concerns that datasets larger than VRAM would cause heavy I/O each epoch; creators acknowledge and say they’ve added optimizations.
  • Some say 10–30 MB/s per GPU is typical for their workloads, so 10 Gbps links may suffice.
  • Gaming and highly interactive graphics are considered impractical due to latency.

Use cases discussed

  • On-demand GPUs for development without paying for GPU time when only doing CPU work.
  • Transparent acceleration for GUI and desktop apps (Matlab, CAD, Blender render farms).
  • ML inference, some training, video transcoding, and potentially hash cracking.
  • Potential for finer-grained GPU sharing or fractional GPUs.

Comparisons and related tools

  • Compared to AWS GPU instances, Nitro, TPUs, Ray, Juice Labs, rcuda, qCUDA, virtio-gpu, and other GPU-over-network systems.
  • Some see it as a more generic or more transparent alternative to cluster frameworks.

Pricing, access, and hosting

  • Currently beta and free (T4 GPUs); future pay-as-you-go model planned.
  • A100/H100 not yet generally available; pricing “to be determined.”
  • No self-hosting yet, but strongly requested and considered for the future.

Limitations and open questions

  • Linux-only for now; no Vulkan/OpenGL or DirectX; partial library support (tested with PyTorch/HF; issues with some others).
  • Behavior and cleanup on GPU reset, VRAM residue, and network instability are raised but not fully resolved.
  • Some skepticism about real-world value for serious training workloads and concerns about subscription-style “cloud everything” models.