2025-05-09

21 GB/s CSV Parsing Using SIMD on AMD 9950X

Benchmark validity and “3x improvement” claim

Several commenters object to calling it a ~3x improvement when the main comparison jumps from a 5950X (Zen 3) to a 9950X (Zen 5); they see that as conflating hardware and software gains.
Others note the author did rerun version 0.9.0 on the new CPU, showing ~17% software improvement there; scaling that back to the old hardware yields ~2.1x over 0.1.0, which is viewed as more honest.
Some complain the graph mixes whole-CPU throughput vs. per‑core, making 1.3 GB/s per thread look less impressive.
There’s criticism that the blog doesn’t clearly define the CSV dialect or workload (e.g., proper quoting/escaping, what data is parsed), making “21 GB/s” ambiguous.

Meaningfulness of CSV GB/s numbers

A strong thread argues that quoting bytes/sec for CSV is close to meaningless without specifying:
- Whether RFC 4180 features (quoted commas, newlines in fields) are supported.
- Whether actual type parsing (floats/ints) is done or just delimiter splitting.
One commenter claims the library’s default mode skips quoting/escaping, making benchmark results “heavily misleading” for real-world CSV. Another notes properly handling quoted newlines generally forces more complex, slower strategies.

Use cases and persistence of CSV

Some “who needs this?” skepticism contrasts with reports of: finance, telco CDRs, Netflow‑like pipelines, huge historical datasets, and enterprise ETL flows that must ingest decades of CSV or high‑volume exports from proprietary systems.
CSV is defended as the de facto file‑based tabular interchange format: trivial to produce (“printf”), readable in Excel, and supported by every stack, even if many implementations are buggy.
Alternatives discussed: JSON/XML (better-structured but poor for tabular data), protobuf/Cap’n Proto/MessagePack (efficient but higher friction and dependency overhead), Parquet/HDF5 (better for analytics and floating‑point data but not what spreadsheets export).

Implementation, .NET SIMD, and AVX-512 discussion

Many are impressed this is pure C# using .NET’s SIMD intrinsics, noting .NET’s strong hardware‑intrinsic support.
There’s a short technical discussion of SIMD tricks (multiple compares vs. shuffle/ternary logic), with mixed results in this case.
The AVX2 vs AVX‑512 speedup here is small (18 → 20 → 21 GB/s), reinforcing views that this workload is memory‑bandwidth‑bound and that AVX‑512’s practical benefit over AVX2 can be marginal.
This segues into a broader debate over Intel’s removal of AVX‑512 from consumer chips, trade‑offs versus more E‑cores, and general frustration with Intel’s feature segmentation and past product “rug pulls” (e.g., Optane).

Related topics