ROCm Device Support Wishlist

Perceived ROCm Device and Lifecycle Problems

  • Many commenters say ROCm support for consumer Radeon cards is narrow, late, and short‑lived.
  • Reports of cards being dropped 1–2 years after purchase; RX 7800 XT cited as supported for ~15 months.
  • Complaints that only a few RX 7900 and some Pro W7xxx cards are clearly “officially supported” on Linux, with older high‑end parts (e.g., Radeon VII, Pro VII, MI50/MI60) marked deprecated despite capable hardware.
  • Users argue this makes AMD GPUs a risky investment for ML/compute, especially compared to NVIDIA’s broad CUDA coverage.

Workarounds and Mixed User Experiences

  • Debian and Ubuntu ROCm packages are reported to work on many “unsupported” discrete GPUs since Vega, with CI testing across architectures; but they can lag in performance optimizations.
  • Some users successfully run Stable Diffusion and LLaMA derivatives on 6xxx/7xxx cards; others hit driver crashes, hard lockups, or CPU fallback.
  • Integrated GPUs (e.g., 780M) are particularly fragile; ROCm sometimes works until stress exposes bugs.

Documentation, Packaging, and Tooling Issues

  • Strong criticism of confusing, contradictory compatibility matrices between AMD’s main docs and Radeon‑specific pages; hard to know which exact GPUs are supported.
  • Complaints about ROCm installation being brittle, poorly documented, and far behind NVIDIA’s “install CUDA and it works” experience.
  • Go bindings for AMD SMI are cited as under‑documented, with dead links and missing distro packages.

Comparison with NVIDIA and Ecosystem Concerns

  • Many see NVIDIA as “software‑first”: mature CUDA, libraries, tooling, and wider hardware support.
  • AMD is perceived as “hardware‑first,” under‑resourced in software, and focused on hyperscalers and Instinct accelerators rather than consumers.
  • Several argue that rapid deprecation and unclear guarantees make ROCm unsuitable for enterprises that need predictable lifecycles.

AMD Engagement and Strategic Debates

  • An AMD representative appears in the thread, asking for feedback and prioritization guidance, acknowledging gaps but avoiding hard guarantees (“will try hard”).
  • Some welcome this as progress; others view it as repetition of old promises and demand explicit commitments to support “all recent GPUs” for longer periods.

Alternatives and Architectural Critiques

  • ROCm’s per‑architecture codegen (gfx10xx, etc.) is criticized versus CUDA’s PTX‑style virtual ISA.
  • Multiple commenters advocate focusing on open standards (Vulkan compute, SYCL/oneAPI, OpenXLA/IREE) instead of chasing CUDA parity with a proprietary stack.