ASCII Delimited Text – Not CSV or Tab Delimited Text
Scope of the Debate
- Thread discusses using ASCII control characters (FS/GS/RS/US: 0x1C–0x1F) as field/record separators instead of commas/tabs.
- Focus is on tradeoffs: escaping, human usability, tooling, and real‑world interoperability.
CSV/TSV vs ASCII Control Delimiters
- Many argue CSV already solves the quoted article’s “shortcomings” via quoting and escaping, including newlines and quotes inside fields.
- Others note CSV is messy in practice: many incompatible dialects, culture-specific separators (e.g., semicolons where comma is decimal separator), and ad‑hoc parsers.
- Several commenters say ASCII separators reduce the need for escaping because those characters almost never appear in ordinary text, but:
- You still must escape or forbid them to truly handle arbitrary text.
- Once a format becomes common, those characters may start appearing in data (copy/paste, nested formats), reintroducing the problem.
Human Factors & Tooling
- Major objection: control characters are hard to type, see, and reason about.
- CSV/TSV files can be “cat”-ed, grepped, diffed, edited by hand; control-char formats lose that simplicity unless editors gain special support.
- Some argue it’s “just a tooling problem” that editors could fix by rendering control characters with visible symbols and shortcuts.
- Others respond that if you require special tooling, you’ve undermined the main advantage of a plain text format.
Nesting, Escaping, and “Arbitrary Text”
- CSV fields often embed CSV, JSON, or other structured text; escaping/quoting is unavoidable in such cases.
- ASCII-delimited formats either:
- Forbid the separator characters in data (simpler but lossy for arbitrary text), or
- Need an escape mechanism, losing their main selling point.
- Some suggest multi-level use of FS/GS/RS/US to represent nested structures, but this adds complexity and is largely hypothetical.
Alternative Approaches & Niche Uses
- Length-delimited formats (netstrings, S-expressions, Hollerith-style, some CMS/WordPress internals) are mentioned as robust but unpleasant for manual editing.
- Some report successful niche uses of control characters or rare Unicode symbols as internal delimiters (e.g., in web apps, Bible data, finance, or M2M transfers).
- A recurring theme: CSV/TSV “won” because they balance human readability, keyboard accessibility, and “good enough” machine parsing, despite their flaws.