Local Qwen isn't a worse Opus, it's a different tool

Local models vs frontier models

  • Local Qwen and similar models are framed as different tools, not drop-in replacements for top hosted models.
  • Strengths cited: privacy, data control, predictable behavior for repetitive tasks, ability to run in air‑gapped or strict enterprise environments, and lower marginal cost for heavy personal use.
  • Weaknesses: struggle with long or complex, multi-step tasks; prone to looping, losing focus, or “giving up”; lower raw capability than Opus/GPT on hard reasoning and broad knowledge.

Performance, hardware, and cost

  • Many run Qwen 27B/35B on 3090/4090/RTX 6000 or Intel Arc GPUs with quantization; speeds from ~18–60 tok/s on consumer cards up to 130–200 tok/s on high-end hardware are reported.
  • Power use is a concern: long generations at hundreds of watts make some local setups more energy‑costly per token than cloud APIs.
  • Mixture‑of‑Experts models are noted for better tokens/sec and VRAM fit but sometimes lower quality than comparable dense models.
  • vLLM vs llama.cpp: vLLM praised for multi-user batching and production serving; llama.cpp for quick startup, flexibility, and single-user workflows. Some report vLLM significantly better for stability/looping; others find it slower and less flexible for prosumer use.

Harnesses, agents, and tool use

  • A recurring theme: harness design (agents, tool-calling, memory, routing) matters as much as the underlying model.
  • People run multiple models (small/fast vs large/slow) and route tasks based on latency and difficulty.
  • There is interest in local models as “assistants” that handle 80% of work, escalating hard cases to a cloud “big model,” but building reliable routing and self-knowledge is described as nontrivial.

Prompting styles, “vibes,” and benchmarks

  • Different models respond best to different prompting styles: some reward under-specification and creativity, others need highly precise instructions, some like structured formats (XML/JSON).
  • Users report large variability across runs and sensitivity to tiny wording changes; “magic words” and emotional tone sometimes change outcomes.
  • Benchmarks are widely viewed as weak proxies: easily gamed, often misaligned with real workflows, and ignoring harness UX. Calls for more task-specific, human-rated evaluations.

Privacy, sovereignty, and skepticism

  • Strong desire for local/open-weight ecosystems for health data, smart homes, and corporate IP, to avoid vendor lock‑in and opaque cloud policies.
  • Skeptics question ROI, the flakiness of hardware/software stacks, and article hype; others argue local capability is improving so fast that today’s limits may not hold for long.