Local Qwen isn't a worse Opus, it's a different tool
Local models vs frontier models
- Local Qwen and similar models are framed as different tools, not drop-in replacements for top hosted models.
- Strengths cited: privacy, data control, predictable behavior for repetitive tasks, ability to run in air‑gapped or strict enterprise environments, and lower marginal cost for heavy personal use.
- Weaknesses: struggle with long or complex, multi-step tasks; prone to looping, losing focus, or “giving up”; lower raw capability than Opus/GPT on hard reasoning and broad knowledge.
Performance, hardware, and cost
- Many run Qwen 27B/35B on 3090/4090/RTX 6000 or Intel Arc GPUs with quantization; speeds from ~18–60 tok/s on consumer cards up to 130–200 tok/s on high-end hardware are reported.
- Power use is a concern: long generations at hundreds of watts make some local setups more energy‑costly per token than cloud APIs.
- Mixture‑of‑Experts models are noted for better tokens/sec and VRAM fit but sometimes lower quality than comparable dense models.
- vLLM vs llama.cpp: vLLM praised for multi-user batching and production serving; llama.cpp for quick startup, flexibility, and single-user workflows. Some report vLLM significantly better for stability/looping; others find it slower and less flexible for prosumer use.
Harnesses, agents, and tool use
- A recurring theme: harness design (agents, tool-calling, memory, routing) matters as much as the underlying model.
- People run multiple models (small/fast vs large/slow) and route tasks based on latency and difficulty.
- There is interest in local models as “assistants” that handle 80% of work, escalating hard cases to a cloud “big model,” but building reliable routing and self-knowledge is described as nontrivial.
Prompting styles, “vibes,” and benchmarks
- Different models respond best to different prompting styles: some reward under-specification and creativity, others need highly precise instructions, some like structured formats (XML/JSON).
- Users report large variability across runs and sensitivity to tiny wording changes; “magic words” and emotional tone sometimes change outcomes.
- Benchmarks are widely viewed as weak proxies: easily gamed, often misaligned with real workflows, and ignoring harness UX. Calls for more task-specific, human-rated evaluations.
Privacy, sovereignty, and skepticism
- Strong desire for local/open-weight ecosystems for health data, smart homes, and corporate IP, to avoid vendor lock‑in and opaque cloud policies.
- Skeptics question ROI, the flakiness of hardware/software stacks, and article hype; others argue local capability is improving so fast that today’s limits may not hold for long.