25L Portable NV-linked Dual 3090 LLM Rig
Role of RTX 3090s & NVLink
- Several commenters see the 3090 as a “sweet spot” for training: fast VRAM and last consumer gen with NVLink, making inter-GPU parameter copies significantly faster than on 4090/5090 (which are PCIe-limited).
- Others argue NVLink is not “an absolute must” for 2–few GPUs; with modern PCIe you often won’t saturate the bus, and some sources say NVLink only matters at very large GPU counts.
- One person running 14×3090s stresses optimizing for “power per token” vs raw speed, and highlights heat and noise as primary constraints.
Power, Cost, Renting & Used Market
- Back-of-the-napkin comparison: 4×3090 (~96 GB VRAM) vs a single RTX 6000 Ada (48 GB). RTX 6000 wins on training/inference speed, power draw (≈300 W vs ≈1400 W rated), and operating cost—especially with expensive electricity.
- Another commenter counters that TDP isn’t actual draw: multi-GPU inference typically uses far less than peak wattage.
- Renting via GPU marketplaces at high electricity prices can lose money with 3090s and barely break even with RTX 6000; some liken ownership vs rental to boat economics.
- Used 3090s are relatively cheap but many are ex-mining; some worry about lifespan and corroded heatsinks, others report multi-year trouble‑free use.
Build, PCIe & Cooling Concerns
- Multiple warnings about motherboard choice: some X670 boards only run the second GPU at PCIe 4.0 x4; NVLink doesn’t replace fast CPU↔GPU links, especially if offloading or swapping models.
- Case fit and airflow are recurring issues. The article’s build reportedly has GPUs resting on fans and stressed PCIe cables; commenters recommend larger HTPC/server cases, blower-style GPUs for dense packing, and sometimes moving rigs to garages.
- Splitters, riser cables, and multi-PSU setups are common in >4 GPU builds, but complicate power and heat management.
Alternatives & Experimental Hardware
- Suggestions include: single RTX 6000 Ada, second‑hand 4090s (some modded to 48 GB VRAM), SXM2 V100s with adapter boards, cheap AMD MI50s (with reliability caveats), and upcoming Intel Arc Pro B60 dual‑GPU boards (seen as too slow vs old Nvidia).
- Some criticize Nvidia’s product segmentation for driving a gray market of VRAM‑modded gaming cards and hacked drivers.
Local LLM Experience vs Hosted Models
- Owners of dual‑3090 rigs report local LLMs are fun and “sovereign,” but many feel open-weight models still lag SOTA hosted systems in quality, hallucination rate, and instruction following.
- Throughput around 20–30 tokens/s on dual 3090s is seen as acceptable; newer MoE models plus CPU offload (e.g., via llama.cpp options) can run very large models but may hurt responsiveness, especially under Ollama.
- Some keep one 3090 for lighter models and fall back to ChatGPT/hosted models for serious work.
SMB / Offline Use & Other Uses
- Commenters agree that SMBs can feasibly run offline ML/LLM boxes for sensitive data, though “serious” LLM workloads may want something bigger than this dual‑3090 rig or a small cluster.
- Outside LLMs, suggested uses include gaming, 3D rendering, fluid simulations, Plex transcoding, 3D printer monitoring, space heating (e.g., Monero mining), and even solo tabletop RPGs with an LLM DM.
Meta: Article & Site Critiques
- Critiques of the article include: ambiguous motherboard choice, misleading or non‑quantified benchmarks, reliance on older/small models, and a physically marginal build (card clearance, fan mounting, cable strain).
- The site’s UX draws complaints: copy/paste blocking (worked around by browser extensions), confusing price display, and intermittent 403 errors/changed URLs.