Llama.cpp supports Vulkan. why doesn't Ollama?

Vulkan support and iGPU usage

  • llama.cpp has had Vulkan for ~1 year; an Ollama Vulkan backend PR has been open ~7 months with almost no maintainer feedback, which many see as neglectful.
  • Supporters of Vulkan stress it’s “existential” for consumer hardware, especially AMD/Intel iGPUs, and report big speedups over ROCm in some setups.
  • Some argue the Ollama integration work is small (mostly build flags and VRAM detection) since llama.cpp already did the hard work; others note Vulkan in Ollama would still add maintenance and configuration complexity (backend selection, GPU layer allocation).
  • A forked Ollama with Vulkan and iGPU support already exists; others note separate Intel GPU support via ipex-llm.

Why people use Ollama vs llama.cpp

  • Ollama is repeatedly praised for frictionless UX: one-command install, built‑in model library, simple ollama run semantics, auto GPU detection, on‑demand loading, and a unified HTTP API.
  • Many describe llama.cpp as powerful but intimidating: build instructions foregrounded over binaries, manual Hugging Face downloads, picking quantizations, configuring GPU layers, and only a demo server.
  • Several say they are technically capable but still choose Ollama because it “just works” and avoids time spent on configuration; others argue that for CLI users, installing llama.cpp is comparably easy and now supports direct model URLs.

Criticism of Ollama’s behavior and governance

  • Strong sentiment that Ollama “launders” llama.cpp features as its own and under‑credits upstream, despite MIT licensing permitting this.
  • Multiple examples of long‑lived PRs (Vulkan, KV cache quantization) being ignored for months, leading to frustration and talk of forking.
  • Some users report negative experiences: opaque motives for a for‑profit company, aggressive Discord moderation, dismissive comments about community feedback, and vague website messaging.
  • Several run Ollama in VMs due to a general sense of “sketchiness,” though others question why it’s considered less trustworthy than other MIT‑licensed OSS.

Models, naming, and storage decisions

  • Ollama rehosts models in its own registry and stores them split into layers; this prevents straightforward weight sharing with other tools and is perceived by some as lock‑in.
  • DeepSeek-R1 naming is a flashpoint: Ollama’s deepseek-r1:* tags default to distilled/Qwen-based variants, which many find misleading and confusing for newcomers.
  • Others argue defaulting to full 671B R1 would be impractical, but agree that clearer labeling and communication are needed.

Alternatives and competition

  • Several alternatives are discussed: RamaLama (containerized llama.cpp wrapper), LM Studio (GUI), llamafile (single‑binary models), kobold.cpp, OpenWebUI, and emerging projects like cortex and icebreaker.
  • There is broad agreement that Ollama has been valuable for accessibility, but many want a comparable, more community‑aligned competitor to keep it in check.