2024-11-10

ASCII Delimited Text – Not CSV or Tab Delimited Text

Scope of the Debate

Thread discusses using ASCII control characters (FS/GS/RS/US: 0x1C–0x1F) as field/record separators instead of commas/tabs.
Focus is on tradeoffs: escaping, human usability, tooling, and real‑world interoperability.

CSV/TSV vs ASCII Control Delimiters

Many argue CSV already solves the quoted article’s “shortcomings” via quoting and escaping, including newlines and quotes inside fields.
Others note CSV is messy in practice: many incompatible dialects, culture-specific separators (e.g., semicolons where comma is decimal separator), and ad‑hoc parsers.
Several commenters say ASCII separators reduce the need for escaping because those characters almost never appear in ordinary text, but:
- You still must escape or forbid them to truly handle arbitrary text.
- Once a format becomes common, those characters may start appearing in data (copy/paste, nested formats), reintroducing the problem.

Human Factors & Tooling

Major objection: control characters are hard to type, see, and reason about.
- CSV/TSV files can be “cat”-ed, grepped, diffed, edited by hand; control-char formats lose that simplicity unless editors gain special support.
Some argue it’s “just a tooling problem” that editors could fix by rendering control characters with visible symbols and shortcuts.
Others respond that if you require special tooling, you’ve undermined the main advantage of a plain text format.

Nesting, Escaping, and “Arbitrary Text”

CSV fields often embed CSV, JSON, or other structured text; escaping/quoting is unavoidable in such cases.
ASCII-delimited formats either:
- Forbid the separator characters in data (simpler but lossy for arbitrary text), or
- Need an escape mechanism, losing their main selling point.
Some suggest multi-level use of FS/GS/RS/US to represent nested structures, but this adds complexity and is largely hypothetical.

Alternative Approaches & Niche Uses

Length-delimited formats (netstrings, S-expressions, Hollerith-style, some CMS/WordPress internals) are mentioned as robust but unpleasant for manual editing.
Some report successful niche uses of control characters or rare Unicode symbols as internal delimiters (e.g., in web apps, Bible data, finance, or M2M transfers).
A recurring theme: CSV/TSV “won” because they balance human readability, keyboard accessibility, and “good enough” machine parsing, despite their flaws.

Related topics