Big GPUs don't need big PCs

Thin Clients, Mini PCs, and Desktop Tradeoffs

  • Many commenters already use small x86 mini PCs, cheap laptops, or Mac Minis as “terminals” and keep powerful desktops or GPU boxes elsewhere (often in a closet) for heavy work.
  • For everyday tasks (web, office, video, light gaming), people say $200–$300 mini PCs or low-end Mac Minis are more than sufficient, with huge wins in power (~6W idle), space, and noise.
  • Others note that full-size desktops with high TDP CPUs and large coolers remain much faster for sustained CPU-bound workloads (e.g., big test suites, heavy compiles), and quieter under full load than thermally constrained minis.
  • Remote workflows work well for some (RDP, remote dev), but others find IDEs and graphics-heavy tools don’t always remote cleanly and point out cost/complexity of owning both a “terminal” and a “server.”

Local LLMs and GPU-Centric Rigs

  • Several people are thinking in terms of “cheapest host that can feed a big GPU,” especially for local LLM inference.
  • There’s disagreement on memory needs: some argue 128GB+ addressable for the GPU is “essential,” others say many strong open models run fine in 32–96GB VRAM.
  • One camp sees local GPUs as mostly about privacy and censorship avoidance; another argues open/small models are still inferior to top hosted models and rarely worth the hardware cost.
  • Counterarguments highlight benefits of local GPUs: fine-tuning, higher throughput, running image/audio/video models, resale value, and “psychological freedom” to use lots of tokens without per-call charges.
  • Power cost vs cloud is debated, with residential electricity often making local inference uneconomical, but rooftop solar changes the calculus for some.

PCIe Bandwidth, Multi-GPU Scaling, and Switches

  • A key practical takeaway: for single-user LLM inference, PCIe bandwidth is rarely the bottleneck. Once weights and KV cache are on the GPU, traffic is small; even x1 links can be enough.
  • This makes Pi 5 or very low-end hosts paired with high-end GPUs surprisingly viable, provided BAR/ResizeBAR and other quirks are handled.
  • Multi-GPU disappointment is called “expected”: many frameworks split by layers (pipeline parallel), leaving GPUs idle in sequence; true tensor parallelism needs more lanes and better interconnects (NVLink, fast PCIe P2P, RDMA).
  • Past crypto-mining boards are cited as precedent for many-GPU, few-CPU-lane systems, but their x1-per-GPU design is only suitable for very bandwidth-light workloads.

Memory, Form Factors, and Future Integration

  • There’s recurring desire for GPUs with DIMM/CAMM memory slots or even “GPU sockets,” but others point out huge bandwidth differences (DDR vs GDDR/HBM), signal integrity, and stability challenges.
  • Some envision PCIe meshes or cheap switches allowing GPU-to-GPU DMA without heavy dependence on a host CPU; existing switches exist but are currently prohibitively expensive for hobbyists.
  • Many expect more CPU+GPU-in-one-package designs (Apple-style SoCs, AMD Strix Halo, NVIDIA Grace/GB10), with large shared memory pools, to become increasingly common.
  • A few go further, imagining GPU-like boards that are essentially standalone computers with Ethernet and minimal host needs, potentially backed by future “high bandwidth flash” instead of DRAM.

Other Notes

  • People ask about ARM/Pi gaming benchmarks and CPU-heavy features like constrained decoding, where CPU load can still spike.
  • There’s mention of community tools/sites tracking best GPU value for local LLMs, with feedback about data quality and used-market pricing.
  • Some meta-discussion appears about the article author’s frequent presence on HN, with practical suggestions for hiding specific domains via custom CSS.