2025-07-11

FP8 is ~100 tflops faster when the kernel name has "cutlass" in it

Context of the “cutlass” FP8 behavior

A Triton pull request shows a conditional that, for FP8 (float8e5) kernels, prepends "cutlass_" to the kernel name with a comment like “Up to 150 TFLOPS faster for fp8!”
People note that libnvidia-nvvm.so contains the string cutlass near memory-dependence analysis, suggesting NVIDIA’s compiler applies special optimizations when it detects that substring in a kernel name.
The observed gain (~100 TFLOPs) is said to be only ~5–10% in context but still financially meaningful when trying to max out GPU utilization.

Is this cheating or a legitimate optimization?

Some see this as NVIDIA “cheating” or following an “emissions testing / Volkswagen” model: detecting known patterns and giving them better treatment.
Others suggest a more charitable view: an internal or experimental optimization path, originally meant for NVIDIA’s CUTLASS library, accidentally exposed via name matching.
There’s concern that relying on names for unsafe assumptions is sloppy or even a bug unless clearly documented.

Historical precedents for name‑based tricks

Comparisons to Intel’s “GenuineIntel” behavior: Intel compilers and MKL historically dispatched slower code on non‑Intel CPUs unless CPUID was patched.
References to the “Quack III” / Quake III era, where GPU drivers detected specific game or benchmark executable names and changed behavior (e.g., lowering texture quality, inserting clip planes) to improve scores.
Commenters note this is still common: game- and app-specific driver “fixes and optimizations” based on executable detection.

Names, contracts, and technical debt

Several note that compilers and large systems often rely heavily on names and informal “contracts” (types, patterns), making accidental name-dependent behavior plausible.
Parallel examples: browser User-Agent strings (still carrying legacy tokens), web frameworks repeatedly sanitizing inputs, legacy API misuse “fixed” in drivers for particular games.
Some argue these hacks create long‑term technical debt and distort APIs; others counter that much of this debt simply dies with the product and is invisible to consumers.

Debate over tweet context and significance

One side claims the tweet misrepresented the PR by lifting a single sentence out of broader context.
Others point to the explicit code snippet and comments as clear evidence that the name hack is intentional and performance-relevant.

Related topics