2025-12-21

A guide to local coding models

Scope and realism of “local coding model” claims

Several commenters say the article oversells local models: running an 80B model on 128GB RAM is not comparable to the 4B–7B models people with 8–16GB can realistically use.
For many, local models are still “toys” for serious coding: fine for small scripts, CRUD, or documentation Q&A, but they fall apart on large codebases, complex refactors, or reliable tool use.
Others report success with 24–32B local coders (e.g. Qwen/Devstral) for targeted tasks, but not as full replacements for Claude/Codex/Gemini.

Economics: subscriptions vs hardware

Strong thread arguing that cloud inference is currently far cheaper: a back-of-envelope calculation using a 5090 and Qwen2.5-Coder 32B suggests ~7 years of 24/7 utilization to break even with OpenRouter API pricing.
Critics of local-only setups note hardware depreciation, electricity, and that a maxed-out Mac used as an “LLM box” can’t also devote all RAM/compute to dev tools.
Counterpoint: current prices are heavily subsidized; people expect future “enshittification” (higher prices, lower quality), so investing in local capacity is a hedge.
Many practitioners run a mix: $20–$100/mo on Claude/Codex/Gemini/Copilot/Cursor plus free/cheap open-weight APIs and occasional local models.

Practical use patterns and limits

$20 plans: some can code for hours by aggressively clearing context and chunking tasks; others hit session limits within 10–60 minutes when doing agentic “auto” coding on big repos.
Distinction between “vibecoding” (letting the model flail through entire apps) vs engineered workflows (design docs, tests, and careful review). Vibecoding burns tokens and often yields low-quality code.
Hybrid strategies: use a “thinker” model (Opus, GPT-5.2, Gemini 3) for planning/review and a cheaper or local “executor” (GLM 4.6, Qwen) for implementation to reduce cost.

Tooling: LM Studio, Ollama, agents

LM Studio praised as the easiest cross‑platform GUI for local models, though it’s proprietary; Ollama and llama.cpp favored by those prioritizing openness and performance.
Claude Code/Codex/Cursor are widely seen as far ahead of open-source agentic tools (opencode, crush, etc.) due to better prompting, context/RAG, and orchestration.
Some run Claude Code and Codex against local models via llama.cpp’s Anthropic-compatible API, or route within tools like opencode, Cline, RooCode, and KiloCode.

Philosophy: privacy, autonomy, and future trajectory

Many value local models for privacy, offline work, and not being beholden to vendors; others see them as hobbies until open weights reliably match frontier quality.
General expectation: local/open models are closing the gap but are still ~1 gen behind for coding; whether that’s “good enough” depends on project complexity and tolerance for slower, more hands-on workflows.

Related topics