Promising results from DeepSeek R1 for code

DeepSeek R1 writing most of a llama.cpp PR

  • A llama.cpp PR claims ~99% of the WASM SIMD code was written by DeepSeek R1, guided by a human over a weekend.
  • Workflow was iterative: repeated re-prompts (4–8 times in hard cases), constraints like “optimize only this part,” and manual debugging and test-writing.
  • Some functions were pure translations (ARM NEON → WASM SIMD); at least one was “invented from scratch” after earlier attempts failed.
  • Commenters disagree on significance: some see a genuine milestone in practical codegen; others call it “glorified translation” and note that review/validation still require full expertise.

Chain-of-thought as the main value-add

  • Many find R1’s visible reasoning more useful than its final answers—for refactoring, bug-hunting, and understanding overlooked edge cases.
  • Several anecdotes describe wrong final answers but correct or inspiring ideas inside the CoT.
  • This is contrasted with models that hide their internal traces; some argue OpenAI hurt itself by not exposing o1’s reasoning.

Quality, limits, and “jagged frontier”

  • Experiences are mixed: some users say R1 (and its distills) match or beat o1/Claude/Qwen on coding and math; others report gaslighting, wrong assumptions, and destructive edits on complex logic.
  • Rust and bespoke APIs remain hard: models often hallucinate methods, traits, or crate names, even when given examples.
  • Consensus: LLMs excel at clear, localized tasks (ports, boilerplate, SQL, tests); they struggle with underspecified, domain-heavy or highly coupled changes.

Tools, hosting, and distill models

  • Popular setups: Ollama, LM Studio, EXO, Continue.dev, and Aider. Aider’s own releases are now ~70–80% AI-generated by line count.
  • Most people use distilled Qwen/Llama variants (e.g., 32B Q4–Q6) locally on 20–30GB machines; full 671B R1 is out of reach for most.
  • Some report API outages and latency; others route via third-party hosts.

Economic and governance debates

  • Large subthreads debate whether this heralds mass SWE displacement or just another productivity jump that creates more software and shifts roles toward “product/solution engineers.”
  • Concerns focus less on usefulness than on wages, junior hiring, and concentration of power.
  • DeepSeek’s openness and Chinese origin trigger discussion about geopolitics, motives, censorship (e.g., Taiwan queries), and the lack of any real “moat” in foundation models.