Fp8 runs ~100 tflops faster when the kernel name has "cutlass" in it
Kernel-name-based optimization behavior
- Disassembly of NVIDIA’s
ptxasshows logic likestrstr(kernel_name, "cutlass"), giving FP8 kernels a huge speed boost when named accordingly. - Commenters note this is probably an unstable, experimental optimization that can break correctness on general code, so NVIDIA limits it to “known good” kernels.
- Some see this as pragmatic: GPU compilers struggle to find optimizations that never regress performance; aggressive passes often help some kernels and hurt others.
- Others argue it’s fragile and exclusionary: a hidden name-based gate can create accidental failures and barriers for non-blessed libraries.
Flags vs hidden heuristics
- Several people argue this should be a documented, opt‑in compiler/driver flag rather than a hidden heuristic on kernel names.
- Pushback centers on long‑term support: once a flag is public, users rely on it, making it hard to remove even if it becomes obsolete or risky.
- There’s debate over whether that support burden justifies opaque mechanisms that third parties eventually reverse‑engineer and depend on anyway.
Is this “cheating”? Comparisons with past scandals
- Multiple historical examples are raised: ATI’s Quake III “quack” optimizations, Intel’s ICC “GenuineIntel” path, NVIDIA/3DMark, SPEC invalidating Intel results, phone SoC benchmark tricks, VW emissions, etc.
- Some see NVIDIA’s behavior as qualitatively different: it speeds up its own hardware without seemingly degrading output or competitors, and is likely about safety, not benchmarks.
- Others respond that special‑casing by name is the same structural pattern and still erodes trust, even if the motive is stability rather than deception.
Compiler and driver pragmatics
- Compiler engineers note that name/signature‑based special cases are common in real systems when front‑ends don’t expose richer semantics.
- Graphics drivers (including open ones) routinely have app‑specific workarounds and optimizations keyed on application identity; this is seen as normalized for large games.
- Concern remains that such techniques are opaque, brittle, and can surprise uninvolved developers who accidentally reuse “magic” names.
Meta: commit messages, AI tools, and workflow
- A large subthread critiques the PR’s many “wip”/“x” commits; others defend small, messy local commits plus later squashing or rebasing.
- There’s extensive debate over:
- Value of clean, meaningful commit history vs speed under deadlines.
- Squash‑merging vs preserving granular commits for
git bisect. - AI‑generated commit messages: sometimes detailed but often missing the crucial “why” and occasionally hallucinating tests or results.