2025-01-28

Promising results from DeepSeek R1 for code

DeepSeek R1 writing most of a llama.cpp PR

A llama.cpp PR claims ~99% of the WASM SIMD code was written by DeepSeek R1, guided by a human over a weekend.
Workflow was iterative: repeated re-prompts (4–8 times in hard cases), constraints like “optimize only this part,” and manual debugging and test-writing.
Some functions were pure translations (ARM NEON → WASM SIMD); at least one was “invented from scratch” after earlier attempts failed.
Commenters disagree on significance: some see a genuine milestone in practical codegen; others call it “glorified translation” and note that review/validation still require full expertise.

Chain-of-thought as the main value-add

Many find R1’s visible reasoning more useful than its final answers—for refactoring, bug-hunting, and understanding overlooked edge cases.
Several anecdotes describe wrong final answers but correct or inspiring ideas inside the CoT.
This is contrasted with models that hide their internal traces; some argue OpenAI hurt itself by not exposing o1’s reasoning.

Quality, limits, and “jagged frontier”

Experiences are mixed: some users say R1 (and its distills) match or beat o1/Claude/Qwen on coding and math; others report gaslighting, wrong assumptions, and destructive edits on complex logic.
Rust and bespoke APIs remain hard: models often hallucinate methods, traits, or crate names, even when given examples.
Consensus: LLMs excel at clear, localized tasks (ports, boilerplate, SQL, tests); they struggle with underspecified, domain-heavy or highly coupled changes.

Tools, hosting, and distill models

Popular setups: Ollama, LM Studio, EXO, Continue.dev, and Aider. Aider’s own releases are now ~70–80% AI-generated by line count.
Most people use distilled Qwen/Llama variants (e.g., 32B Q4–Q6) locally on 20–30GB machines; full 671B R1 is out of reach for most.
Some report API outages and latency; others route via third-party hosts.

Economic and governance debates

Large subthreads debate whether this heralds mass SWE displacement or just another productivity jump that creates more software and shifts roles toward “product/solution engineers.”
Concerns focus less on usefulness than on wages, junior hiring, and concentration of power.
DeepSeek’s openness and Chinese origin trigger discussion about geopolitics, motives, censorship (e.g., Taiwan queries), and the lack of any real “moat” in foundation models.

Related topics