Show HN: Attaching to a virtual GPU over TCP
How it works
- Client runs code on a CPU-only Linux machine; a userspace library intercepts CUDA / GPU API calls.
- Intercepted calls and GPU commands are forwarded over TCP to a remote GPU, making the local system believe a GPU is directly attached.
- No kernel drivers or eBPF are involved; it’s LD_PRELOAD-style API remoting, not PCIe tunneling.
- Only GPU-related state lives remotely; files, packages, and CPU execution stay on the client instance.
Performance, latency, and bottlenecks
- Network adds latency, especially for GPU VRAM ↔ RAM transfers and training workloads.
- Provider claims inference performance (e.g., BERT) is close to local, with training slower but being optimized.
- Concerns that datasets larger than VRAM would cause heavy I/O each epoch; creators acknowledge and say they’ve added optimizations.
- Some say 10–30 MB/s per GPU is typical for their workloads, so 10 Gbps links may suffice.
- Gaming and highly interactive graphics are considered impractical due to latency.
Use cases discussed
- On-demand GPUs for development without paying for GPU time when only doing CPU work.
- Transparent acceleration for GUI and desktop apps (Matlab, CAD, Blender render farms).
- ML inference, some training, video transcoding, and potentially hash cracking.
- Potential for finer-grained GPU sharing or fractional GPUs.
Comparisons and related tools
- Compared to AWS GPU instances, Nitro, TPUs, Ray, Juice Labs, rcuda, qCUDA, virtio-gpu, and other GPU-over-network systems.
- Some see it as a more generic or more transparent alternative to cluster frameworks.
Pricing, access, and hosting
- Currently beta and free (T4 GPUs); future pay-as-you-go model planned.
- A100/H100 not yet generally available; pricing “to be determined.”
- No self-hosting yet, but strongly requested and considered for the future.
Limitations and open questions
- Linux-only for now; no Vulkan/OpenGL or DirectX; partial library support (tested with PyTorch/HF; issues with some others).
- Behavior and cleanup on GPU reset, VRAM residue, and network instability are raised but not fully resolved.
- Some skepticism about real-world value for serious training workloads and concerns about subscription-style “cloud everything” models.