2025-05-28

Deepseek R1-0528

Running R1-0528 Locally: Hardware & Performance

Full 671B/685B model is widely seen as impractical for “average” users.
Rough home-level setups discussed:
- ~768 GB DDR4/DDR5 RAM dual-socket server, CPU-only or mixed CPU+GPU, achieving ~1–1.5 tokens/s on 4-bit quantizations.
- Mac M3 Ultra with 512 GB RAM or multi-GPU rigs totaling ~500 GB VRAM for higher-speed inference.
- Some note that with huge swap you can technically run it on almost any PC, but at “one token every 10 minutes”.
Quantized/distilled variants (4-bit, 1.58-bit dynamic) can run on high-end consumer GPUs or large-RAM desktops, with users reporting 1–3 tokens/s but very strong reasoning.

Cloud Access, Cost, and Privacy

Many suggest using hosted versions (OpenRouter, EC2, vast.ai, Bedrock) instead of buying $5k–$10k hardware.
Single H100 is insufficient for full-precision R1; estimates of 6–8 GPUs or large multi-node setups.
Debate over “free” access via OpenRouter/Bittensor:
- One side: prompts and usage data are valuable and likely monetized or re-sold.
- Other side: for non-sensitive tasks (e.g., summarizing public content), the tradeoff is acceptable.

Model Quality, Info, and Benchmarks

Frustration that there’s no detailed model card, training details, or official benchmarks yet.
Some like the low-drama “quiet drop” style; others compare it to earlier Mistral torrent-era releases.
Early third-party signals (LiveCodeBench, Reddit tables) suggest parity with OpenAI’s o1/o4-mini–class models, but details and context are unclear.
Broader debate about benchmarks:
- Many think popular leaderboards are increasingly “overfitted” and unreliable.
- Preference expressed for live, contamination-resistant, or human-arena-style evaluations, plus “vibe checks.”

Open Weights vs Open Source

Strong argument that this is “open weights”, not open source:
- Weights are downloadable and MIT-licensed, but training data and full pipeline are not provided.
- Several analogies: weights as binaries, datasets/pipelines as true “source”.
Some argue for a multi-dimensional “openness score” (code, data, weights, license, etc.) instead of a binary label.
Training-data disclosure is seen as legally and practically difficult, especially given likely copyrighted and scraped content.

Platforms, Quantization & Ecosystem

OpenRouter already serves R1-0528 through multiple providers; many note cost roughly half of certain OpenAI offerings for similar capability.
Groq is discussed: extremely fast but limited model selection; hosting R1-size models would require thousands of their chips.
Community tools:
- Dynamic 1–1.6 bit quantizations reduce footprint from ~700 GB to ~185 GB, with tricks to offload MoE layers to CPU RAM while keeping core on <24 GB VRAM.

Motivations for Local LLMs & Use Cases

Reasons to run locally despite pain:
- Data privacy and regulatory needs (law, medical, finance, internal docs).
- Very cheap high-volume or always-on workloads vs API billing.
- Latency-sensitive coding autocomplete.
Concrete examples shared:
- Trading-volume signal analyzer summarizing news locally.
- Document management (auto titling/tagging) and structured extraction.
- Coding assistants using smaller DeepSeek/Qwen-based models for completion.

Market & Narrative

Some speculate about timing with Nvidia earnings and hedge-fund backing; others question whether release date materially affects markets.
Discussion that DeepSeek both relies on Nvidia hardware and may simultaneously reduce perceived need for massive, ultra-expensive GPU clusters, shifting procurement strategies and geopolitics (e.g., interest in Huawei GPUs).

Related topics