Can I run AI locally?

Scope & Purpose of the Site

  • Tool estimates which LLMs can run locally and at what tokens/second, based largely on VRAM, RAM, and bandwidth.
  • Many find the idea very useful, especially for buying decisions and as a quick “can I run X?” reference.
  • Several say it’s reminiscent of old “Can You Run It?” PC game requirement checkers.

Accuracy, Data Quality & Gaps

  • Multiple reports that estimates are significantly off: models marked “can’t run” or “slow” actually run much faster in practice (e.g., Qwen 3.5 35B, GPT-OSS 120B, big MoE models).
  • Site appears to conflate prefill and generation speeds and may overstate Apple Silicon performance; some call this “nonsense” or “LLM‑generated.”
  • MoE models: calculator seems to use total parameters instead of active parameters, underestimating speed.
  • Quantization and mmap, KV offloading, and unified/shared memory (Apple/AMD/Intel iGPUs) are mostly ignored, so many real‑world configurations aren’t captured.
  • Hardware list is incomplete or incorrect for many: missing RTX Pro 6000, A4000, 4050/5060Ti, some Teslas, mobile GPUs, Tensor chips, Strix Halo, various AMD/Intel SKUs; RAM caps for M3 Ultra wrong; includes non‑existent “M4 Ultra.”

UX & Feature Requests

  • Requests for:
    • Ability to choose a model first and see performance across hardware.
    • Filters by task (coding, extraction, vision, embeddings) and by model quality, not just speed.
    • Clearer explanation of ratings (S/A/B/…) and metrics like latency/time‑to‑first‑token.
    • Better handling of quant levels, context sizes, and tool‑use behavior.
    • Higher-contrast, larger UI text; better mobile layout.
  • Some want crowdsourced, benchmark‑style data instead of pure estimation.

Privacy & Hardware Detection

  • Site uses browser APIs/WebGL/WebGPU as a heuristic for hardware; some are surprised their GPU specs are visible to websites and see fingerprinting risks.
  • Others note detection is often wrong (e.g., mis-reporting VRAM or GPU model).

Local vs Cloud Tradeoffs

  • Several argue economics and quality still favor cloud (Groq, frontier APIs), with huge speed/quality gaps.
  • Others prioritize privacy, offline access, experimentation freedom, and narrow local tasks (OCR, STT, embeddings, small coding helpers) despite slower, weaker models.