Deepseek R1-0528
Running R1-0528 Locally: Hardware & Performance
- Full 671B/685B model is widely seen as impractical for “average” users.
- Rough home-level setups discussed:
- ~768 GB DDR4/DDR5 RAM dual-socket server, CPU-only or mixed CPU+GPU, achieving ~1–1.5 tokens/s on 4-bit quantizations.
- Mac M3 Ultra with 512 GB RAM or multi-GPU rigs totaling ~500 GB VRAM for higher-speed inference.
- Some note that with huge swap you can technically run it on almost any PC, but at “one token every 10 minutes”.
- Quantized/distilled variants (4-bit, 1.58-bit dynamic) can run on high-end consumer GPUs or large-RAM desktops, with users reporting 1–3 tokens/s but very strong reasoning.
Cloud Access, Cost, and Privacy
- Many suggest using hosted versions (OpenRouter, EC2, vast.ai, Bedrock) instead of buying $5k–$10k hardware.
- Single H100 is insufficient for full-precision R1; estimates of 6–8 GPUs or large multi-node setups.
- Debate over “free” access via OpenRouter/Bittensor:
- One side: prompts and usage data are valuable and likely monetized or re-sold.
- Other side: for non-sensitive tasks (e.g., summarizing public content), the tradeoff is acceptable.
Model Quality, Info, and Benchmarks
- Frustration that there’s no detailed model card, training details, or official benchmarks yet.
- Some like the low-drama “quiet drop” style; others compare it to earlier Mistral torrent-era releases.
- Early third-party signals (LiveCodeBench, Reddit tables) suggest parity with OpenAI’s o1/o4-mini–class models, but details and context are unclear.
- Broader debate about benchmarks:
- Many think popular leaderboards are increasingly “overfitted” and unreliable.
- Preference expressed for live, contamination-resistant, or human-arena-style evaluations, plus “vibe checks.”
Open Weights vs Open Source
- Strong argument that this is “open weights”, not open source:
- Weights are downloadable and MIT-licensed, but training data and full pipeline are not provided.
- Several analogies: weights as binaries, datasets/pipelines as true “source”.
- Some argue for a multi-dimensional “openness score” (code, data, weights, license, etc.) instead of a binary label.
- Training-data disclosure is seen as legally and practically difficult, especially given likely copyrighted and scraped content.
Platforms, Quantization & Ecosystem
- OpenRouter already serves R1-0528 through multiple providers; many note cost roughly half of certain OpenAI offerings for similar capability.
- Groq is discussed: extremely fast but limited model selection; hosting R1-size models would require thousands of their chips.
- Community tools:
- Dynamic 1–1.6 bit quantizations reduce footprint from ~700 GB to ~185 GB, with tricks to offload MoE layers to CPU RAM while keeping core on <24 GB VRAM.
Motivations for Local LLMs & Use Cases
- Reasons to run locally despite pain:
- Data privacy and regulatory needs (law, medical, finance, internal docs).
- Very cheap high-volume or always-on workloads vs API billing.
- Latency-sensitive coding autocomplete.
- Concrete examples shared:
- Trading-volume signal analyzer summarizing news locally.
- Document management (auto titling/tagging) and structured extraction.
- Coding assistants using smaller DeepSeek/Qwen-based models for completion.
Market & Narrative
- Some speculate about timing with Nvidia earnings and hedge-fund backing; others question whether release date materially affects markets.
- Discussion that DeepSeek both relies on Nvidia hardware and may simultaneously reduce perceived need for massive, ultra-expensive GPU clusters, shifting procurement strategies and geopolitics (e.g., interest in Huawei GPUs).