A guide to local coding models

Scope and realism of “local coding model” claims

  • Several commenters say the article oversells local models: running an 80B model on 128GB RAM is not comparable to the 4B–7B models people with 8–16GB can realistically use.
  • For many, local models are still “toys” for serious coding: fine for small scripts, CRUD, or documentation Q&A, but they fall apart on large codebases, complex refactors, or reliable tool use.
  • Others report success with 24–32B local coders (e.g. Qwen/Devstral) for targeted tasks, but not as full replacements for Claude/Codex/Gemini.

Economics: subscriptions vs hardware

  • Strong thread arguing that cloud inference is currently far cheaper: a back-of-envelope calculation using a 5090 and Qwen2.5-Coder 32B suggests ~7 years of 24/7 utilization to break even with OpenRouter API pricing.
  • Critics of local-only setups note hardware depreciation, electricity, and that a maxed-out Mac used as an “LLM box” can’t also devote all RAM/compute to dev tools.
  • Counterpoint: current prices are heavily subsidized; people expect future “enshittification” (higher prices, lower quality), so investing in local capacity is a hedge.
  • Many practitioners run a mix: $20–$100/mo on Claude/Codex/Gemini/Copilot/Cursor plus free/cheap open-weight APIs and occasional local models.

Practical use patterns and limits

  • $20 plans: some can code for hours by aggressively clearing context and chunking tasks; others hit session limits within 10–60 minutes when doing agentic “auto” coding on big repos.
  • Distinction between “vibecoding” (letting the model flail through entire apps) vs engineered workflows (design docs, tests, and careful review). Vibecoding burns tokens and often yields low-quality code.
  • Hybrid strategies: use a “thinker” model (Opus, GPT-5.2, Gemini 3) for planning/review and a cheaper or local “executor” (GLM 4.6, Qwen) for implementation to reduce cost.

Tooling: LM Studio, Ollama, agents

  • LM Studio praised as the easiest cross‑platform GUI for local models, though it’s proprietary; Ollama and llama.cpp favored by those prioritizing openness and performance.
  • Claude Code/Codex/Cursor are widely seen as far ahead of open-source agentic tools (opencode, crush, etc.) due to better prompting, context/RAG, and orchestration.
  • Some run Claude Code and Codex against local models via llama.cpp’s Anthropic-compatible API, or route within tools like opencode, Cline, RooCode, and KiloCode.

Philosophy: privacy, autonomy, and future trajectory

  • Many value local models for privacy, offline work, and not being beholden to vendors; others see them as hobbies until open weights reliably match frontier quality.
  • General expectation: local/open models are closing the gap but are still ~1 gen behind for coding; whether that’s “good enough” depends on project complexity and tolerance for slower, more hands-on workflows.