Fp8 runs ~100 tflops faster when the kernel name has "cutlass" in it

Kernel-name-based optimization behavior

  • Disassembly of NVIDIA’s ptxas shows logic like strstr(kernel_name, "cutlass"), giving FP8 kernels a huge speed boost when named accordingly.
  • Commenters note this is probably an unstable, experimental optimization that can break correctness on general code, so NVIDIA limits it to “known good” kernels.
  • Some see this as pragmatic: GPU compilers struggle to find optimizations that never regress performance; aggressive passes often help some kernels and hurt others.
  • Others argue it’s fragile and exclusionary: a hidden name-based gate can create accidental failures and barriers for non-blessed libraries.

Flags vs hidden heuristics

  • Several people argue this should be a documented, opt‑in compiler/driver flag rather than a hidden heuristic on kernel names.
  • Pushback centers on long‑term support: once a flag is public, users rely on it, making it hard to remove even if it becomes obsolete or risky.
  • There’s debate over whether that support burden justifies opaque mechanisms that third parties eventually reverse‑engineer and depend on anyway.

Is this “cheating”? Comparisons with past scandals

  • Multiple historical examples are raised: ATI’s Quake III “quack” optimizations, Intel’s ICC “GenuineIntel” path, NVIDIA/3DMark, SPEC invalidating Intel results, phone SoC benchmark tricks, VW emissions, etc.
  • Some see NVIDIA’s behavior as qualitatively different: it speeds up its own hardware without seemingly degrading output or competitors, and is likely about safety, not benchmarks.
  • Others respond that special‑casing by name is the same structural pattern and still erodes trust, even if the motive is stability rather than deception.

Compiler and driver pragmatics

  • Compiler engineers note that name/signature‑based special cases are common in real systems when front‑ends don’t expose richer semantics.
  • Graphics drivers (including open ones) routinely have app‑specific workarounds and optimizations keyed on application identity; this is seen as normalized for large games.
  • Concern remains that such techniques are opaque, brittle, and can surprise uninvolved developers who accidentally reuse “magic” names.

Meta: commit messages, AI tools, and workflow

  • A large subthread critiques the PR’s many “wip”/“x” commits; others defend small, messy local commits plus later squashing or rebasing.
  • There’s extensive debate over:
    • Value of clean, meaningful commit history vs speed under deadlines.
    • Squash‑merging vs preserving granular commits for git bisect.
    • AI‑generated commit messages: sometimes detailed but often missing the crucial “why” and occasionally hallucinating tests or results.