Running local LLMs offline on a ten-hour flight

Local LLM usefulness and limitations

  • Several commenters report that current local models (e.g., Qwen3.x, Gemma) often get stuck in loops or fail on multi-file or “agentic” coding tasks, even on high‑end Macs and GPUs.
  • Others say they are very productive with local models, especially for small, well‑scoped tasks: single‑file refactors, config migrations, scripts, math, library usage examples.
  • Consensus: they are far from frontier cloud models for sustained, multi‑step reasoning, but can be “good enough” for many day‑to‑day tasks if you adjust expectations and workflow.

Sampling, quantization, and tooling details

  • Multiple comments stress the importance of correct sampling parameters. Default or lab‑recommended settings often cause looping.
  • One contributor argues min_p is strictly better than top_p/top_k for avoiding degeneration and loops, and suggests using more distribution‑aware samplers when available.
  • Heavy quantization (and KV-cache quantization) plus immature GGUF/engine support is blamed for degraded behavior; best results are reported with higher‑precision quants and vLLM.
  • IDE harnesses (Claude Code–style, Cline, Roo Code, custom “AI harnesses”) strongly affect perceived quality; shorter prompts and fewer tools seem to work better for local models.

Local vs remote models and connectivity

  • Some view fully local inference on laptops as a fun proof‑of‑concept but prefer running big models on a home/server GPU and accessing them via VPN, tmux/mosh, etc., for better thermals and battery.
  • Hybrid routing is recommended: local models for easy tasks, cloud/frontier (or powerful remote self‑hosted) models for hard, multi‑step work.
  • With in‑flight internet (e.g., Starlink), some argue offline local models are increasingly unnecessary, though others object to particular providers and still value offline/privacy.

Hardware, heat, and power on planes

  • Many report laptops (Macs and PCs) quickly ramp fans, run very hot, and drain batteries fast under sustained LLM loads; some worry about lifespan or even safety.
  • There is discussion of power limits on in‑seat outlets, throttling, and fan‑control or cooling hacks.

Ergonomics, comfort, and attitudes to work while flying

  • Several find using a 14–16" laptop in economy claustrophobic; concerns include seat recline crushing screens and “T‑rex arms” posture.
  • Workarounds: Bluetooth keyboards on the lap, AR glasses (Xreal) as virtual displays, though reading/code quality is mixed.
  • Opinions diverge on whether one should work with LLMs on flights vs. just read or sleep; some lament the erosion of downtime.

Social and cost tangents

  • Debate over the relatability of using a €6k laptop for this use case; others note that cost is small relative to developer salaries.
  • A heated side thread argues about obese or large passengers in cramped economy seating and who should bear accommodation costs.

Other notes

  • Some emphasize privacy as a key reason to prefer local models.
  • Benchmarks are viewed with suspicion; self‑defined, realistic task suites are preferred.
  • A question about live web search notes that paid services (e.g., Exa) exist; free Google‑style integration is unclear.