21 GB/s CSV Parsing Using SIMD on AMD 9950X
Benchmark validity and “3x improvement” claim
- Several commenters object to calling it a ~3x improvement when the main comparison jumps from a 5950X (Zen 3) to a 9950X (Zen 5); they see that as conflating hardware and software gains.
- Others note the author did rerun version 0.9.0 on the new CPU, showing ~17% software improvement there; scaling that back to the old hardware yields ~2.1x over 0.1.0, which is viewed as more honest.
- Some complain the graph mixes whole-CPU throughput vs. per‑core, making 1.3 GB/s per thread look less impressive.
- There’s criticism that the blog doesn’t clearly define the CSV dialect or workload (e.g., proper quoting/escaping, what data is parsed), making “21 GB/s” ambiguous.
Meaningfulness of CSV GB/s numbers
- A strong thread argues that quoting bytes/sec for CSV is close to meaningless without specifying:
- Whether RFC 4180 features (quoted commas, newlines in fields) are supported.
- Whether actual type parsing (floats/ints) is done or just delimiter splitting.
- One commenter claims the library’s default mode skips quoting/escaping, making benchmark results “heavily misleading” for real-world CSV. Another notes properly handling quoted newlines generally forces more complex, slower strategies.
Use cases and persistence of CSV
- Some “who needs this?” skepticism contrasts with reports of: finance, telco CDRs, Netflow‑like pipelines, huge historical datasets, and enterprise ETL flows that must ingest decades of CSV or high‑volume exports from proprietary systems.
- CSV is defended as the de facto file‑based tabular interchange format: trivial to produce (“printf”), readable in Excel, and supported by every stack, even if many implementations are buggy.
- Alternatives discussed: JSON/XML (better-structured but poor for tabular data), protobuf/Cap’n Proto/MessagePack (efficient but higher friction and dependency overhead), Parquet/HDF5 (better for analytics and floating‑point data but not what spreadsheets export).
Implementation, .NET SIMD, and AVX-512 discussion
- Many are impressed this is pure C# using .NET’s SIMD intrinsics, noting .NET’s strong hardware‑intrinsic support.
- There’s a short technical discussion of SIMD tricks (multiple compares vs. shuffle/ternary logic), with mixed results in this case.
- The AVX2 vs AVX‑512 speedup here is small (18 → 20 → 21 GB/s), reinforcing views that this workload is memory‑bandwidth‑bound and that AVX‑512’s practical benefit over AVX2 can be marginal.
- This segues into a broader debate over Intel’s removal of AVX‑512 from consumer chips, trade‑offs versus more E‑cores, and general frustration with Intel’s feature segmentation and past product “rug pulls” (e.g., Optane).