Llama.cpp supports Vulkan. why doesn't Ollama?
Vulkan support and iGPU usage
- llama.cpp has had Vulkan for ~1 year; an Ollama Vulkan backend PR has been open ~7 months with almost no maintainer feedback, which many see as neglectful.
- Supporters of Vulkan stress it’s “existential” for consumer hardware, especially AMD/Intel iGPUs, and report big speedups over ROCm in some setups.
- Some argue the Ollama integration work is small (mostly build flags and VRAM detection) since llama.cpp already did the hard work; others note Vulkan in Ollama would still add maintenance and configuration complexity (backend selection, GPU layer allocation).
- A forked Ollama with Vulkan and iGPU support already exists; others note separate Intel GPU support via ipex-llm.
Why people use Ollama vs llama.cpp
- Ollama is repeatedly praised for frictionless UX: one-command install, built‑in model library, simple
ollama runsemantics, auto GPU detection, on‑demand loading, and a unified HTTP API. - Many describe llama.cpp as powerful but intimidating: build instructions foregrounded over binaries, manual Hugging Face downloads, picking quantizations, configuring GPU layers, and only a demo server.
- Several say they are technically capable but still choose Ollama because it “just works” and avoids time spent on configuration; others argue that for CLI users, installing llama.cpp is comparably easy and now supports direct model URLs.
Criticism of Ollama’s behavior and governance
- Strong sentiment that Ollama “launders” llama.cpp features as its own and under‑credits upstream, despite MIT licensing permitting this.
- Multiple examples of long‑lived PRs (Vulkan, KV cache quantization) being ignored for months, leading to frustration and talk of forking.
- Some users report negative experiences: opaque motives for a for‑profit company, aggressive Discord moderation, dismissive comments about community feedback, and vague website messaging.
- Several run Ollama in VMs due to a general sense of “sketchiness,” though others question why it’s considered less trustworthy than other MIT‑licensed OSS.
Models, naming, and storage decisions
- Ollama rehosts models in its own registry and stores them split into layers; this prevents straightforward weight sharing with other tools and is perceived by some as lock‑in.
- DeepSeek-R1 naming is a flashpoint: Ollama’s
deepseek-r1:*tags default to distilled/Qwen-based variants, which many find misleading and confusing for newcomers. - Others argue defaulting to full 671B R1 would be impractical, but agree that clearer labeling and communication are needed.
Alternatives and competition
- Several alternatives are discussed: RamaLama (containerized llama.cpp wrapper), LM Studio (GUI), llamafile (single‑binary models), kobold.cpp, OpenWebUI, and emerging projects like cortex and icebreaker.
- There is broad agreement that Ollama has been valuable for accessibility, but many want a comparable, more community‑aligned competitor to keep it in check.