FP8 is ~100 tflops faster when the kernel name has "cutlass" in it
Context of the “cutlass” FP8 behavior
- A Triton pull request shows a conditional that, for FP8 (float8e5) kernels, prepends
"cutlass_"to the kernel name with a comment like “Up to 150 TFLOPS faster for fp8!” - People note that
libnvidia-nvvm.socontains the stringcutlassnear memory-dependence analysis, suggesting NVIDIA’s compiler applies special optimizations when it detects that substring in a kernel name. - The observed gain (~100 TFLOPs) is said to be only ~5–10% in context but still financially meaningful when trying to max out GPU utilization.
Is this cheating or a legitimate optimization?
- Some see this as NVIDIA “cheating” or following an “emissions testing / Volkswagen” model: detecting known patterns and giving them better treatment.
- Others suggest a more charitable view: an internal or experimental optimization path, originally meant for NVIDIA’s CUTLASS library, accidentally exposed via name matching.
- There’s concern that relying on names for unsafe assumptions is sloppy or even a bug unless clearly documented.
Historical precedents for name‑based tricks
- Comparisons to Intel’s “GenuineIntel” behavior: Intel compilers and MKL historically dispatched slower code on non‑Intel CPUs unless CPUID was patched.
- References to the “Quack III” / Quake III era, where GPU drivers detected specific game or benchmark executable names and changed behavior (e.g., lowering texture quality, inserting clip planes) to improve scores.
- Commenters note this is still common: game- and app-specific driver “fixes and optimizations” based on executable detection.
Names, contracts, and technical debt
- Several note that compilers and large systems often rely heavily on names and informal “contracts” (types, patterns), making accidental name-dependent behavior plausible.
- Parallel examples: browser User-Agent strings (still carrying legacy tokens), web frameworks repeatedly sanitizing inputs, legacy API misuse “fixed” in drivers for particular games.
- Some argue these hacks create long‑term technical debt and distort APIs; others counter that much of this debt simply dies with the product and is invisible to consumers.
Debate over tweet context and significance
- One side claims the tweet misrepresented the PR by lifting a single sentence out of broader context.
- Others point to the explicit code snippet and comments as clear evidence that the name hack is intentional and performance-relevant.