BarraCUDA Open-source CUDA compiler targeting AMD GPUs
Project & Technical Approach
- BarraCUDA is a from-scratch, C99 CUDA compiler targeting AMD GPUs, currently GFX11 (RDNA3).
- It parses and compiles the subset of C++ features that CUDA actually uses, not full C++.
- The toolchain is intentionally minimal: plain C, a simple Makefile, no external compiler frameworks, no HIP translation layer, outputs HSACO binaries that run with just the AMD driver (no ROCm required).
LLVM, HIP, ZLUDA, Tinygrad & Alternatives
- The author explicitly avoids LLVM, doing their own instruction encoding “to stay simple and targeted,” at the cost of not inheriting LLVM optimizations.
- Some commenters note LLVM’s AMD backend (via ROCm) is mature and production-used; others emphasize its size/complexity and difficulty of patching.
- HIP/hipify is cited as AMD’s official CUDA porting route; some say it “mostly works now” on recent hardware, others dismiss it as incomplete, Linux‑biased, and non–drop-in.
- ZLUDA is repeatedly mentioned as the more practical “drop-in CUDA on AMD” effort today.
- Tinygrad (and ML compilers like TorchInductor/OpenXLA) are framed as a different layer: high‑level tensor/ML abstraction vs BarraCUDA’s general CUDA C compiler.
Scope, Hardware Support & Viability
- Current target is RDNA3; author plans to support older (e.g., GFX10/RDNA1) and potentially other architectures but notes painful ISA-level differences.
- Commenters stress that without CUDA ecosystem libraries (BLAS/DNN/etc.) and heavy optimization work, this is more an impressive “build a GPU compiler” project than a production CUDA alternative.
- Some worry it won’t touch AMD’s enterprise/datacenter line (CDNA), so it’s not a “CUDA moat killer” yet.
AMD vs Nvidia Strategy & Market Effects
- Debate whether AMD “couldn’t” or “wouldn’t” support CUDA directly:
- One side: not supporting CUDA avoids strengthening Nvidia’s moat.
- Other side: AMD is losing the market anyway; a serious CUDA compatibility push (even billions invested) could pay off.
- Instinct vs consumer GPUs and fragmented software stacks are cited as reasons AMD still lags in AI despite hardware.
- Some fear success of such projects will drive up AMD GPU prices by pulling them into the AI gold rush, hurting gamers and hobbyists.
Legal/IP & Naming
- Some see using “CUDA” in the name as trademark-risky and suggest a rename.
- There’s speculation about potential Nvidia IP/legal action against full CUDA compatibility layers; others counter that compatibility layers are generally legal but lawsuits could be long and costly.
AI/LLM Use & Community Reactions
- A major subthread comes from confusion between LLVM and LLM, spawning accusations of “AI slop.”
- Several commenters inspect commits and writing style, inferring likely LLM assistance; others defend the project and decry reflexive “AI slop” accusations.
- The author clarifies:
- Code is largely hand-written; LLMs (Ollama/ChatGPT) were used for limited tasks (ASCII art, test summarization, some boilerplate/test CUDA).
- They discourage “vibe coding” with LLMs on ISA‑critical parts where bit‑level correctness matters.
- Broader discussion emerges about whether using LLMs for code is acceptable “power tools” use vs undermining perceived craftsmanship.
Ecosystem & Standards Discussion
- Some wish for a generalized, open CUDA-like standard (or better OpenCL‑successor) to end single‑vendor lock‑in; skepticism remains due to vendor fragmentation and misaligned incentives.
- SCALE and ChipStar are mentioned as other “run CUDA elsewhere” efforts; OpenCL is recalled as an unrealized “write once, run anywhere” promise.
Reception
- Many commenters are enthusiastic about the project’s ambition, minimalism, and educational value.
- Others repeatedly temper expectations: today it’s a very cool, non‑production, hobby‑grade compiler that highlights what’s possible rather than a drop‑in replacement for CUDA’s ecosystem.