All You Need Is 4x 4090 GPUs to Train Your Own Model

Hardware capabilities & training scale

  • 4× RTX 4090 (24 GB each) gives 96 GB total VRAM; enough to train LLMs from scratch up to ~1B parameters per the author.
  • Other commenters argue 96 GB should support full fine-tuning of models up to ~5B parameters with techniques like gradient checkpointing.
  • Author reports ~7 days to train a 500M-parameter model on 100B tokens.
  • Parallelism is typically done via Distributed Data Parallel (DDP), not VRAM pooling via NVLink (which 4090s don’t have).

Cost, cloud vs on-prem, and ROI

  • 4× 4090s are framed as “all you need” but also “and ~$12k” in hardware, plus an expensive CPU/motherboard with enough PCIe lanes.
  • Some argue renting 4× 4090 instances is cheaper and more flexible (roughly <$500 for a ~10-day train).
  • Others note capex vs opex tradeoffs, resale value of GPUs, and desire to learn low-level quirks as reasons to own hardware.
  • GPU rental market is described as crowded, with competition, varying integrity, and occasional “scammy” behavior (e.g., overselling hardware).

Power, cooling, and electrical requirements

  • With ~450 W per 4090 and dual 1500 W PSUs, total draw can approach 3 kW.
  • Several comments insist a dedicated 20–30 A circuit is effectively required, especially in US homes.
  • Discussion compares US vs EU/UK circuits, emphasizing total wattage limits and fire risk from overcurrent.

Model, data, and software-side questions

  • Many readers are more interested in what can realistically be trained and the data/curation process than in the rig itself.
  • Training data suggestions include starting from FineWebEdu.
  • Some ask for examples of model outputs and more detail on post-training methods (e.g., RL/RLHF).

Alternatives & tradeoffs

  • Suggestions include used A100s, 3090s, lower-end 40-series (4060/4070 Ti), Tesla P40s, or just waiting for 5090 (32 GB VRAM).
  • Objections to 3090s/M4 minis/Apple Silicon: older architectures, weaker memory bandwidth, limited training support vs CUDA.

Article quality & AI co-authorship

  • Multiple commenters feel parts of the article read like AI-generated marketing copy, especially references to gaming features (e.g., DLSS 3).
  • Author confirms AI “co-authored” text; some readers find this off-putting and prefer purely human-written explanations.