Why CSV is still king
CSV’s Enduring Appeal
- Seen as the “VHS/Markdown of data formats”: crude but ubiquitous and “good enough.”
- Strengths cited: human-readable, easy to eyeball/edit, trivial to generate, works with spreadsheets and databases out of the box.
- Many tools and ecosystems (Python, Rust, spreadsheets, databases, CLI tools) already handle CSV well, which reinforces its dominance.
- Inertia and compatibility are repeatedly mentioned as the primary reasons it “remains king.”
Limitations, Bugs, and Real-World Pain
- Escaping rules (especially quotes and newlines) are a recurring source of bugs; many tools mishandle them.
- Newlines in fields and parallel parsing are particularly tricky; some modern engines still break in edge cases.
- Large CSVs (1GB+) are common and painful: slow to parse, fragile to errors, yet still used because “you get what you get.”
- Excel-specific issues: locale-dependent separators, silent data corruption (IDs, dates, extra sheets), and quirks around encoding and delimiters.
Lack of a True Standard
- RFC 4180 exists but is seen as late, incomplete, and non-binding; many implementations diverge.
- Some treat “RFC 4180 + UTF‑8” as the de facto standard and consider non-conforming tools broken; others argue CSV is inherently multi-variant.
- Desire expressed for: a real standard with test suites and clearer examples, possibly a new format instead of retrofitting CSV.
Alternatives and Variants
- TSV, pipe-delimited, and semicolon-delimited files are popular in some domains to reduce quoting.
- ASCII separator characters (FS/GS/RS/US) are viewed as technically superior but blocked by keyboard/editor usability; a few niche tools and formats use them.
- JSON/NDJSON/JSONL: better for sparse or complex data; more regular parsing; but larger files and repeated keys unless arrays are used. Adoption of array-with-header patterns is unclear.
- Columnar/binary formats (Parquet, Arrow, ORC) are praised for typed data, metadata, sharding, and efficient querying; considered better for data lakes and analytics.
- Some advocate SQLite or Parquet as interchange formats, or typed CSV layers that remain backward compatible.
Meta and Tools
- Multiple GUI tools for viewing/querying CSVs (including SQL-based) are mentioned; licensing models (subscription vs perpetual) are debated.
- Overall sentiment: CSV is overused, but its ubiquity and simplicity make replacing it across legacy systems nearly impossible.