2024-09-08

Serving AI from the Basement – 192GB of VRAM Setup

Hardware setup & use cases

Many commenters compare the 8×3090 / 192GB VRAM rig to mining setups and their own smaller home labs (2–4 GPUs, sometimes 16×3090 across nodes).
Primary motivations discussed: data privacy, avoiding closed models, running large LLMs (e.g., 70B and up, experiments with 405B via CPU+GPU memory), and training/finetuning private models.
Some ask what practical tasks justify such a rig beyond tinkering; others see it as a learning platform and potential basis for tutorials or even commercial offerings.

Power, circuits, cooling, and noise

A major thread is residential power constraints: typical 120V/15A circuits in the US vs 230–240V/16A+ in Europe.
Several describe tripping breakers with just 2–3 high-end GPUs and stress the need for dedicated 30–50A 240V circuits and PDUs.
Debate over DIY electrical work: some say it’s straightforward with basic code knowledge; others emphasize fire/electrocution risks, code/permit issues, landlord restrictions, and insurance complications.
Cooling and heat reuse come up: using basement placement, AC vents, hybrid water heaters, or simply treating the rig as space heating in winter. Summer cooling costs can double the effective power cost.

Cost vs cloud and utilization

Some argue rigs like this only make sense if heavily utilized; otherwise, renting 8×A100/H100 or 3090s on GPU clouds (e.g., RunPod) may be cheaper and simpler.
Others report substantial monthly savings vs cloud for continuous workloads, claiming the hardware “pays for itself” over months, even with high power bills.

Multi-GPU connectivity: NVLink, PCIe, risers

Discussion on NVLink: it’s crucial for fast multi-GPU communication, especially in training; some say its benefit for inference is smaller and benchmarks are “unclear.”
PCIe bifurcation (x16 → 2×x8) is highlighted as acceptable on PCIe 4.0; claimed performance loss is small for most workloads.
PCIe risers common in crypto are seen as problematic for LLMs due to limited bandwidth and reliability; SAS/MCIO adapters, redrivers, and retimers are favored.

Mac Studio / Apple Silicon vs multi-GPU boxes

Mac Studio with large unified RAM is mentioned as a low-power alternative, suitable for 70B models in quantized form but too slow or insufficient for massive models or training from scratch.
Consensus: an 8×3090 setup vastly outperforms a single Mac for heavy ML, at the cost of far higher power use and complexity.

Distributed GPU / blockchain ideas

Some propose blockchain-based or P2P GPU sharing networks; others counter that blockchains add unnecessary overhead and that simple job-scheduling/payment platforms or existing projects (e.g., distributed inference networks) are more appropriate.

Related topics