A guide to local coding models
Scope and realism of “local coding model” claims
- Several commenters say the article oversells local models: running an 80B model on 128GB RAM is not comparable to the 4B–7B models people with 8–16GB can realistically use.
- For many, local models are still “toys” for serious coding: fine for small scripts, CRUD, or documentation Q&A, but they fall apart on large codebases, complex refactors, or reliable tool use.
- Others report success with 24–32B local coders (e.g. Qwen/Devstral) for targeted tasks, but not as full replacements for Claude/Codex/Gemini.
Economics: subscriptions vs hardware
- Strong thread arguing that cloud inference is currently far cheaper: a back-of-envelope calculation using a 5090 and Qwen2.5-Coder 32B suggests ~7 years of 24/7 utilization to break even with OpenRouter API pricing.
- Critics of local-only setups note hardware depreciation, electricity, and that a maxed-out Mac used as an “LLM box” can’t also devote all RAM/compute to dev tools.
- Counterpoint: current prices are heavily subsidized; people expect future “enshittification” (higher prices, lower quality), so investing in local capacity is a hedge.
- Many practitioners run a mix: $20–$100/mo on Claude/Codex/Gemini/Copilot/Cursor plus free/cheap open-weight APIs and occasional local models.
Practical use patterns and limits
- $20 plans: some can code for hours by aggressively clearing context and chunking tasks; others hit session limits within 10–60 minutes when doing agentic “auto” coding on big repos.
- Distinction between “vibecoding” (letting the model flail through entire apps) vs engineered workflows (design docs, tests, and careful review). Vibecoding burns tokens and often yields low-quality code.
- Hybrid strategies: use a “thinker” model (Opus, GPT-5.2, Gemini 3) for planning/review and a cheaper or local “executor” (GLM 4.6, Qwen) for implementation to reduce cost.
Tooling: LM Studio, Ollama, agents
- LM Studio praised as the easiest cross‑platform GUI for local models, though it’s proprietary; Ollama and llama.cpp favored by those prioritizing openness and performance.
- Claude Code/Codex/Cursor are widely seen as far ahead of open-source agentic tools (opencode, crush, etc.) due to better prompting, context/RAG, and orchestration.
- Some run Claude Code and Codex against local models via llama.cpp’s Anthropic-compatible API, or route within tools like opencode, Cline, RooCode, and KiloCode.
Philosophy: privacy, autonomy, and future trajectory
- Many value local models for privacy, offline work, and not being beholden to vendors; others see them as hobbies until open weights reliably match frontier quality.
- General expectation: local/open models are closing the gap but are still ~1 gen behind for coding; whether that’s “good enough” depends on project complexity and tolerance for slower, more hands-on workflows.