Deepseek R1-0528

Running R1-0528 Locally: Hardware & Performance

  • Full 671B/685B model is widely seen as impractical for “average” users.
  • Rough home-level setups discussed:
    • ~768 GB DDR4/DDR5 RAM dual-socket server, CPU-only or mixed CPU+GPU, achieving ~1–1.5 tokens/s on 4-bit quantizations.
    • Mac M3 Ultra with 512 GB RAM or multi-GPU rigs totaling ~500 GB VRAM for higher-speed inference.
    • Some note that with huge swap you can technically run it on almost any PC, but at “one token every 10 minutes”.
  • Quantized/distilled variants (4-bit, 1.58-bit dynamic) can run on high-end consumer GPUs or large-RAM desktops, with users reporting 1–3 tokens/s but very strong reasoning.

Cloud Access, Cost, and Privacy

  • Many suggest using hosted versions (OpenRouter, EC2, vast.ai, Bedrock) instead of buying $5k–$10k hardware.
  • Single H100 is insufficient for full-precision R1; estimates of 6–8 GPUs or large multi-node setups.
  • Debate over “free” access via OpenRouter/Bittensor:
    • One side: prompts and usage data are valuable and likely monetized or re-sold.
    • Other side: for non-sensitive tasks (e.g., summarizing public content), the tradeoff is acceptable.

Model Quality, Info, and Benchmarks

  • Frustration that there’s no detailed model card, training details, or official benchmarks yet.
  • Some like the low-drama “quiet drop” style; others compare it to earlier Mistral torrent-era releases.
  • Early third-party signals (LiveCodeBench, Reddit tables) suggest parity with OpenAI’s o1/o4-mini–class models, but details and context are unclear.
  • Broader debate about benchmarks:
    • Many think popular leaderboards are increasingly “overfitted” and unreliable.
    • Preference expressed for live, contamination-resistant, or human-arena-style evaluations, plus “vibe checks.”

Open Weights vs Open Source

  • Strong argument that this is “open weights”, not open source:
    • Weights are downloadable and MIT-licensed, but training data and full pipeline are not provided.
    • Several analogies: weights as binaries, datasets/pipelines as true “source”.
  • Some argue for a multi-dimensional “openness score” (code, data, weights, license, etc.) instead of a binary label.
  • Training-data disclosure is seen as legally and practically difficult, especially given likely copyrighted and scraped content.

Platforms, Quantization & Ecosystem

  • OpenRouter already serves R1-0528 through multiple providers; many note cost roughly half of certain OpenAI offerings for similar capability.
  • Groq is discussed: extremely fast but limited model selection; hosting R1-size models would require thousands of their chips.
  • Community tools:
    • Dynamic 1–1.6 bit quantizations reduce footprint from ~700 GB to ~185 GB, with tricks to offload MoE layers to CPU RAM while keeping core on <24 GB VRAM.

Motivations for Local LLMs & Use Cases

  • Reasons to run locally despite pain:
    • Data privacy and regulatory needs (law, medical, finance, internal docs).
    • Very cheap high-volume or always-on workloads vs API billing.
    • Latency-sensitive coding autocomplete.
  • Concrete examples shared:
    • Trading-volume signal analyzer summarizing news locally.
    • Document management (auto titling/tagging) and structured extraction.
    • Coding assistants using smaller DeepSeek/Qwen-based models for completion.

Market & Narrative

  • Some speculate about timing with Nvidia earnings and hedge-fund backing; others question whether release date materially affects markets.
  • Discussion that DeepSeek both relies on Nvidia hardware and may simultaneously reduce perceived need for massive, ultra-expensive GPU clusters, shifting procurement strategies and geopolitics (e.g., interest in Huawei GPUs).