2025-01-31

Llama.cpp supports Vulkan. why doesn't Ollama?

Vulkan support and iGPU usage

llama.cpp has had Vulkan for ~1 year; an Ollama Vulkan backend PR has been open ~7 months with almost no maintainer feedback, which many see as neglectful.
Supporters of Vulkan stress it’s “existential” for consumer hardware, especially AMD/Intel iGPUs, and report big speedups over ROCm in some setups.
Some argue the Ollama integration work is small (mostly build flags and VRAM detection) since llama.cpp already did the hard work; others note Vulkan in Ollama would still add maintenance and configuration complexity (backend selection, GPU layer allocation).
A forked Ollama with Vulkan and iGPU support already exists; others note separate Intel GPU support via ipex-llm.

Why people use Ollama vs llama.cpp

Ollama is repeatedly praised for frictionless UX: one-command install, built‑in model library, simple ollama run semantics, auto GPU detection, on‑demand loading, and a unified HTTP API.
Many describe llama.cpp as powerful but intimidating: build instructions foregrounded over binaries, manual Hugging Face downloads, picking quantizations, configuring GPU layers, and only a demo server.
Several say they are technically capable but still choose Ollama because it “just works” and avoids time spent on configuration; others argue that for CLI users, installing llama.cpp is comparably easy and now supports direct model URLs.

Criticism of Ollama’s behavior and governance

Strong sentiment that Ollama “launders” llama.cpp features as its own and under‑credits upstream, despite MIT licensing permitting this.
Multiple examples of long‑lived PRs (Vulkan, KV cache quantization) being ignored for months, leading to frustration and talk of forking.
Some users report negative experiences: opaque motives for a for‑profit company, aggressive Discord moderation, dismissive comments about community feedback, and vague website messaging.
Several run Ollama in VMs due to a general sense of “sketchiness,” though others question why it’s considered less trustworthy than other MIT‑licensed OSS.

Models, naming, and storage decisions

Ollama rehosts models in its own registry and stores them split into layers; this prevents straightforward weight sharing with other tools and is perceived by some as lock‑in.
DeepSeek-R1 naming is a flashpoint: Ollama’s deepseek-r1:* tags default to distilled/Qwen-based variants, which many find misleading and confusing for newcomers.
Others argue defaulting to full 671B R1 would be impractical, but agree that clearer labeling and communication are needed.

Alternatives and competition

Several alternatives are discussed: RamaLama (containerized llama.cpp wrapper), LM Studio (GUI), llamafile (single‑binary models), kobold.cpp, OpenWebUI, and emerging projects like cortex and icebreaker.
There is broad agreement that Ollama has been valuable for accessibility, but many want a comparable, more community‑aligned competitor to keep it in check.

Related topics