Promising results from DeepSeek R1 for code
DeepSeek R1 writing most of a llama.cpp PR
- A llama.cpp PR claims ~99% of the WASM SIMD code was written by DeepSeek R1, guided by a human over a weekend.
- Workflow was iterative: repeated re-prompts (4–8 times in hard cases), constraints like “optimize only this part,” and manual debugging and test-writing.
- Some functions were pure translations (ARM NEON → WASM SIMD); at least one was “invented from scratch” after earlier attempts failed.
- Commenters disagree on significance: some see a genuine milestone in practical codegen; others call it “glorified translation” and note that review/validation still require full expertise.
Chain-of-thought as the main value-add
- Many find R1’s visible reasoning more useful than its final answers—for refactoring, bug-hunting, and understanding overlooked edge cases.
- Several anecdotes describe wrong final answers but correct or inspiring ideas inside the CoT.
- This is contrasted with models that hide their internal traces; some argue OpenAI hurt itself by not exposing o1’s reasoning.
Quality, limits, and “jagged frontier”
- Experiences are mixed: some users say R1 (and its distills) match or beat o1/Claude/Qwen on coding and math; others report gaslighting, wrong assumptions, and destructive edits on complex logic.
- Rust and bespoke APIs remain hard: models often hallucinate methods, traits, or crate names, even when given examples.
- Consensus: LLMs excel at clear, localized tasks (ports, boilerplate, SQL, tests); they struggle with underspecified, domain-heavy or highly coupled changes.
Tools, hosting, and distill models
- Popular setups: Ollama, LM Studio, EXO, Continue.dev, and Aider. Aider’s own releases are now ~70–80% AI-generated by line count.
- Most people use distilled Qwen/Llama variants (e.g., 32B Q4–Q6) locally on 20–30GB machines; full 671B R1 is out of reach for most.
- Some report API outages and latency; others route via third-party hosts.
Economic and governance debates
- Large subthreads debate whether this heralds mass SWE displacement or just another productivity jump that creates more software and shifts roles toward “product/solution engineers.”
- Concerns focus less on usefulness than on wages, junior hiring, and concentration of power.
- DeepSeek’s openness and Chinese origin trigger discussion about geopolitics, motives, censorship (e.g., Taiwan queries), and the lack of any real “moat” in foundation models.