FFmpeg School of Assembly Language
Architecture focus: x86 vs RISC‑V/ARM
- Some dislike the tutorial’s x86 focus, arguing RISC‑V will eventually dominate; others counter that x86 and ARM still massively outgun RISC‑V in real hardware and performance.
- Several say ARM would have been a more pragmatic non‑x86 choice due to current market share.
- There is curiosity (and some skepticism) about how RISC‑V’s vector model maps to workloads like ffmpeg.
Handwritten assembly vs C/intrinsics/compilers
- Strong debate over the claim that asm can be “10x faster than C”:
- Many say modern compilers are excellent and naive C vs expert asm is an unfair comparison.
- Others insist that in tight DSP/codecs kernels, especially with SIMD, 2–10x gains over naïve scalar C are real.
- Several argue you can often get close enough with C + intrinsics or better memory layouts, and that algorithmic and cache-level improvements dwarf micro-optimizations for typical software.
SIMD, codecs, and performance-critical use cases
- Codecs and signal processing are highlighted as prime SIMD targets: “do this to every sample/pixel” maps perfectly to vectorization.
- Examples given of large wins (e.g., 30% less CPU for audio metering, big gains in AV1 decoder/encoder hotpaths).
- For extremely hot loops run trillions of times, even 10–50% differences between compiler SIMD and handwritten SIMD are considered worth the effort.
Portability, intrinsics, and tooling
- Major downside of asm: per‑ISA implementations, sometimes multiple per microarchitecture. Projects maintain C fallbacks plus many asm variants (x86 SSSE3/AVX2/AVX‑512, ARM NEON, etc.).
- ffmpeg is described as philosophically preferring pure asm over intrinsics: better control, fewer surprises from compilers, but at cost of readability and portability.
- Some criticize heavy NASM macro abuse in ffmpeg’s asm as obscuring what the code actually does.
- Portable SIMD libraries (Highway, Eigen, simde, language‑level SIMD in Rust/Zig/C#) are discussed; consensus is they’re useful but can leave performance on the table for very tuned projects like ffmpeg/dav1d.
Learning, enjoyment, and pedagogy
- Many find the tutorial unusually approachable and welcome a focused SIMD/x86 resource.
- Multiple comments describe assembly as fun, enlightening for understanding hardware and compiler behavior, and valuable for debugging—even if rarely written professionally.
FFmpeg, docs, and ecosystem
- Some praise ffmpeg’s capabilities and hardware acceleration support; others recount painful experiences with its C API and documentation (dated examples, deprecations, confusing build system).
- GStreamer in Rust is mentioned as a more modern framework, but not a drop‑in ffmpeg replacement.
Auto‑vectorization, superoptimization, and LLMs
- Mixed experiences: some report compilers now auto‑vectorize well if code is written “compiler‑friendly”; others show simple cases where compilers still miss ideal SIMD idioms.
- Superoptimization and search‑based tuning are mentioned as promising for tiny kernels; LLMs are generally seen as not yet trustworthy for correct, optimal asm.