Rust running on every GPU

Overall reaction & goals

  • Many commenters are impressed that ordinary no_std Rust crates can run on GPUs with little or no change, seeing this as unlocking cross‑CPU/GPU libraries and easier GPU adoption for “CPU programmers.”
  • Core goal as understood: write a single Rust implementation (host + kernels) and run it across CPUs, CUDA, Vulkan/Metal/D3D, WebGPU, etc., with conditional compilation and runtime backend selection.

Abstraction vs performance

  • Performance‑focused participants distrust high‑level GPU abstractions; they want direct control over low‑level differences between vendors and architectures.
  • Concern that a portable Rust layer becomes a “jack of all trades,” incapable of exploiting vendor‑specific features (warp reductions, tensor cores, ray tracing, bindless, etc.).
  • Others argue all GPU APIs (including CUDA) are abstractions; you pick a trade‑off between engineering capacity and peak performance, and can still drop down to PTX/ISA where needed.
  • Some expect backend‑specific optimization passes and intrinsic support to improve over time, but acknowledge that truly optimal code may still require per‑architecture kernels.

Toolchain, compilation targets, and layers

  • Rust code is compiled to NVVM IR, then to PTX for NVIDIA, or to SPIR‑V for Vulkan/WebGPU; PTX can be embedded into CPU binaries or loaded at runtime.
  • Several people note the long chain of abstractions: Rust → GPU backend → wgpu/ash/etc. → Vulkan/Metal/DX/OpenGL → drivers → hardware, raising concerns about debugging and how well performance‑critical details survive.
  • Proponents reply that similar layering exists on CPUs and in game engines, and that Rust’s “zero‑cost abstractions” plus good codegen can keep overhead acceptable.

Relation to existing APIs and languages

  • CUDA: skeptics ask why use Rust instead of CUDA C++; supporters cite safer language, reuse of Rust crates, and integration with existing CUDA libraries via NVVM output.
  • WebGPU: discussed as a portable graphics/compute API, but with constraints (security verification, missing advanced features, decade‑old baseline). It doesn’t solve “same language on CPU and GPU,” since shaders use WGSL.
  • Other languages: Zig, Nim, LLVM IR can also target SPIR‑V. Julia and Mojo are praised for numerical computing and GPU support (including JITing kernels and native Metal paths); Rust is seen as less numerics‑centric but more “systems”‑oriented.

Ecosystem gaps, CUDA moat, and vendor politics

  • Commenters highlight the large CUDA library ecosystem (cuBLAS, cuDNN, nvCOMP, etc.) and note many are missing or incomplete on the Rust side; some bindings exist but replicating NVIDIA’s decade of engineering is seen as a huge task.
  • NVIDIA’s CUDA EULA and proprietary stance are cited as part of its moat; attempts at open standards (OpenCL, SPIR‑V, OpenMP) and Khronos politics are mentioned as only partial successes.
  • There is recurring desire for a common GPU ISA or a SPIR‑V‑like unification, but skepticism that dominant vendors will cooperate.

Portability, demand, and future directions

  • Some doubt there is strong demand among experienced CUDA developers to switch to Rust; others counter that Rust dramatically expanded the systems‑programming audience and could do the same for GPU programming even at modest initial performance.
  • Rust GPU contributors mention forming a GPU working group, integrating GPU concepts into Rust’s language/cfg() model, and building traits/APIs that lower‑level crates (like wgpu) can use.
  • Several see this as an early but significant step toward portable ML and heterogeneous compute in Rust, while acknowledging that vendor‑specific tuning and ecosystem maturity remain open challenges.