LibreCUDA – Launch CUDA code on Nvidia GPUs without the proprietary runtime

Purpose and Motivation

  • Project offers a minimal, open-source CUDA runtime that talks directly to Nvidia’s low-level RM (Resource Manager) interface via ioctls, bypassing the proprietary CUDA user-space stack.
  • Goals cited:
    • Learn how the stack works and have a “simple, transparent” reference implementation.
    • Enable lighter, more debuggable environments and eventually help port CUDA-like APIs to other platforms (e.g., *BSD).
    • Challenge Nvidia’s dominance and licensing constraints, and provide an open stack for research and verification.

Technical Scope and Limitations

  • Still very early; only a small fraction of the CUDA API is implemented. Several commenters say it must grow ~100× in coverage to be generally usable.
  • It still requires Nvidia’s kernel driver (proprietary or the newer open modules). Firmware and GSP remain proprietary.
  • CUDA binaries (ELF with SASS) are still needed; replacing ptxas (the proprietary PTX→SASS compiler) is described as “highly non-trivial” because Nvidia’s ISAs and latencies are undocumented and scheduling is complex.
  • Some view it as a reference implementation for “run simple stuff” to separate driver vs. compiler vs. hardware bugs.

Legal, Licensing, and Trademark Issues

  • Concern that using “CUDA” in the project name and API prefixes invites trademark trouble; discussion of likelihood-of-confusion tests and how far trademarks can reach across industries.
  • Debate over whether this can be used to bypass Nvidia’s “no GeForce in datacenters” EULA clause by pairing consumer GPUs with the open kernel driver and AOT-compiled kernels. Applicability of overlapping Nvidia licenses is described as murky and potentially riskier for companies than individuals.

Alternatives and Broader Ecosystem

  • Mention of related efforts: ZLUDA (CUDA on non-Nvidia hardware), tinygrad’s direct-ioctl runtimes for AMD and Nvidia, and other compiler stacks (Triton, Numba, Julia, JAX, etc.).
  • Some argue Vulkan compute or OpenCL could have been the open standard, but Vulkan/SPIR-V semantics and tooling are seen as less suitable than CUDA, and OpenCL is viewed as effectively abandoned.

Debate Over Value

  • Supporters: open stacks reduce lock-in, enable unsupported platforms, avoid vendor whims, and can be used even purely for testing correctness.
  • Skeptics: as long as it only runs on Nvidia hardware and is incomplete, the practical benefit is limited; the real win would be a mature CUDA-like API on non-Nvidia GPUs.