FP8 is ~100 tflops faster when the kernel name has "cutlass" in it

Context of the “cutlass” FP8 behavior

  • A Triton pull request shows a conditional that, for FP8 (float8e5) kernels, prepends "cutlass_" to the kernel name with a comment like “Up to 150 TFLOPS faster for fp8!”
  • People note that libnvidia-nvvm.so contains the string cutlass near memory-dependence analysis, suggesting NVIDIA’s compiler applies special optimizations when it detects that substring in a kernel name.
  • The observed gain (~100 TFLOPs) is said to be only ~5–10% in context but still financially meaningful when trying to max out GPU utilization.

Is this cheating or a legitimate optimization?

  • Some see this as NVIDIA “cheating” or following an “emissions testing / Volkswagen” model: detecting known patterns and giving them better treatment.
  • Others suggest a more charitable view: an internal or experimental optimization path, originally meant for NVIDIA’s CUTLASS library, accidentally exposed via name matching.
  • There’s concern that relying on names for unsafe assumptions is sloppy or even a bug unless clearly documented.

Historical precedents for name‑based tricks

  • Comparisons to Intel’s “GenuineIntel” behavior: Intel compilers and MKL historically dispatched slower code on non‑Intel CPUs unless CPUID was patched.
  • References to the “Quack III” / Quake III era, where GPU drivers detected specific game or benchmark executable names and changed behavior (e.g., lowering texture quality, inserting clip planes) to improve scores.
  • Commenters note this is still common: game- and app-specific driver “fixes and optimizations” based on executable detection.

Names, contracts, and technical debt

  • Several note that compilers and large systems often rely heavily on names and informal “contracts” (types, patterns), making accidental name-dependent behavior plausible.
  • Parallel examples: browser User-Agent strings (still carrying legacy tokens), web frameworks repeatedly sanitizing inputs, legacy API misuse “fixed” in drivers for particular games.
  • Some argue these hacks create long‑term technical debt and distort APIs; others counter that much of this debt simply dies with the product and is invisible to consumers.

Debate over tweet context and significance

  • One side claims the tweet misrepresented the PR by lifting a single sentence out of broader context.
  • Others point to the explicit code snippet and comments as clear evidence that the name hack is intentional and performance-relevant.