2026-06-15

Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

Overall sentiment

Very few people have fully replaced Claude/GPT with local models for daily coding.
Many use a hybrid: local open‑weights for most work, frontier models (Claude/GPT/DeepSeek/Kimi/GLM) for planning, hard bugs, or reviews.
Consensus: local is “good and getting better,” but still generally behind top cloud models for reliability, reasoning, and large-repo work.

Hardware & performance

Useful setups often involve:
- High‑VRAM GPUs (RTX 3090/4090/6000, 4×5070, Radeon 7900XTX/R9700).
- High‑RAM Apple silicon (M4/M5/Mac Studio 64–512 GB) or Strix Halo 128 GB.
Typical performance reports:
- ~40–150 tok/s decode, 400–4000 tok/s prefill on ~30–35B models, depending on MTP, quant, and concurrency.
Electricity is non‑trivial but usually much cheaper than hardware CAPEX; some back‑of‑the‑envelope TCO shows cloud APIs still cheaper for many users.

Model choices & perceived quality

Most‑praised local models for coding:
- Qwen 3.6 27B dense (often Q4–Q6 quant). Frequently compared to somewhere between Haiku and older Sonnet.
- Qwen 3.6 35B A3B (MoE) for speed; often a bit worse than 27B dense at coding.
- Gemma 4 (26–31B, including QAT variants) as a strong general model, sometimes weaker than Qwen at tool‑use.
- DeepSeek V4 Flash / ds4 on high‑end Apple or GPU rigs for reasoning‑heavy work.
Large MoE open models (Qwen 3.5 122B‑A10B, Nemotron, MiniMax) can be strong but require big RAM and are slower; some find them worse than well‑tuned 27B–35B dense.

Harnesses, tooling & workflows

Harness quality is seen as at least as important as the model:
- Popular: pi.dev, Claude Code (pointed at local endpoints), OpenCode, Crush, Aider, custom CLIs, VS Code/Copilot to local llama.cpp servers.
Effective patterns:
- Agentic workflows with strict tools, sub‑agents, or multi‑step plans.
- Frontier model writes spec/plan; local model implements.
- Multiple local agents with narrow roles (plan, schema, code, tests, review).

Limitations & pain points

Common issues:
- Loops, over‑thinking, and broken tool/edit calls, especially with aggressive quantization or poor sampling.
- Context limits and cache quirks; quality drops at long contexts despite large nominal windows.
- Slowness and fan/noise on laptops; complex repos feel “forever” compared to cloud.
- Setup complexity: drivers, quant choices, KV cache settings, batching, harness config.
Many report local models feel like a capable but junior dev; frontier models feel more like a strong mid/senior, especially for architecture and tricky bugs.

Privacy, cost, and strategy

Reasons to go local:
- Data privacy/compliance, avoiding lock‑in, fear of future price hikes or regulation, enjoyment of tinkering.
- “Good enough” for personal projects, scripting, glue code, document tasks.
Reasons not to:
- Hardware cost vs heavily subsidized $20–$100/mo frontier subs.
- Cloud models (esp. DeepSeek/Kimi/GLM) are extremely cheap per token and clearly more capable.
Emerging pattern: layered approach—local first, then cheap open‑weights APIs, then frontier only when really needed.

Related topics