2025-01-20

ROCm Device Support Wishlist

Perceived ROCm Device and Lifecycle Problems

Many commenters say ROCm support for consumer Radeon cards is narrow, late, and short‑lived.
Reports of cards being dropped 1–2 years after purchase; RX 7800 XT cited as supported for ~15 months.
Complaints that only a few RX 7900 and some Pro W7xxx cards are clearly “officially supported” on Linux, with older high‑end parts (e.g., Radeon VII, Pro VII, MI50/MI60) marked deprecated despite capable hardware.
Users argue this makes AMD GPUs a risky investment for ML/compute, especially compared to NVIDIA’s broad CUDA coverage.

Workarounds and Mixed User Experiences

Debian and Ubuntu ROCm packages are reported to work on many “unsupported” discrete GPUs since Vega, with CI testing across architectures; but they can lag in performance optimizations.
Some users successfully run Stable Diffusion and LLaMA derivatives on 6xxx/7xxx cards; others hit driver crashes, hard lockups, or CPU fallback.
Integrated GPUs (e.g., 780M) are particularly fragile; ROCm sometimes works until stress exposes bugs.

Documentation, Packaging, and Tooling Issues

Strong criticism of confusing, contradictory compatibility matrices between AMD’s main docs and Radeon‑specific pages; hard to know which exact GPUs are supported.
Complaints about ROCm installation being brittle, poorly documented, and far behind NVIDIA’s “install CUDA and it works” experience.
Go bindings for AMD SMI are cited as under‑documented, with dead links and missing distro packages.

Comparison with NVIDIA and Ecosystem Concerns

Many see NVIDIA as “software‑first”: mature CUDA, libraries, tooling, and wider hardware support.
AMD is perceived as “hardware‑first,” under‑resourced in software, and focused on hyperscalers and Instinct accelerators rather than consumers.
Several argue that rapid deprecation and unclear guarantees make ROCm unsuitable for enterprises that need predictable lifecycles.

AMD Engagement and Strategic Debates

An AMD representative appears in the thread, asking for feedback and prioritization guidance, acknowledging gaps but avoiding hard guarantees (“will try hard”).
Some welcome this as progress; others view it as repetition of old promises and demand explicit commitments to support “all recent GPUs” for longer periods.

Alternatives and Architectural Critiques

ROCm’s per‑architecture codegen (gfx10xx, etc.) is criticized versus CUDA’s PTX‑style virtual ISA.
Multiple commenters advocate focusing on open standards (Vulkan compute, SYCL/oneAPI, OpenXLA/IREE) instead of chasing CUDA parity with a proprietary stack.

Related topics