ASCII Delimited Text – Not CSV or Tab Delimited Text

Scope of the Debate

  • Thread discusses using ASCII control characters (FS/GS/RS/US: 0x1C–0x1F) as field/record separators instead of commas/tabs.
  • Focus is on tradeoffs: escaping, human usability, tooling, and real‑world interoperability.

CSV/TSV vs ASCII Control Delimiters

  • Many argue CSV already solves the quoted article’s “shortcomings” via quoting and escaping, including newlines and quotes inside fields.
  • Others note CSV is messy in practice: many incompatible dialects, culture-specific separators (e.g., semicolons where comma is decimal separator), and ad‑hoc parsers.
  • Several commenters say ASCII separators reduce the need for escaping because those characters almost never appear in ordinary text, but:
    • You still must escape or forbid them to truly handle arbitrary text.
    • Once a format becomes common, those characters may start appearing in data (copy/paste, nested formats), reintroducing the problem.

Human Factors & Tooling

  • Major objection: control characters are hard to type, see, and reason about.
    • CSV/TSV files can be “cat”-ed, grepped, diffed, edited by hand; control-char formats lose that simplicity unless editors gain special support.
  • Some argue it’s “just a tooling problem” that editors could fix by rendering control characters with visible symbols and shortcuts.
  • Others respond that if you require special tooling, you’ve undermined the main advantage of a plain text format.

Nesting, Escaping, and “Arbitrary Text”

  • CSV fields often embed CSV, JSON, or other structured text; escaping/quoting is unavoidable in such cases.
  • ASCII-delimited formats either:
    • Forbid the separator characters in data (simpler but lossy for arbitrary text), or
    • Need an escape mechanism, losing their main selling point.
  • Some suggest multi-level use of FS/GS/RS/US to represent nested structures, but this adds complexity and is largely hypothetical.

Alternative Approaches & Niche Uses

  • Length-delimited formats (netstrings, S-expressions, Hollerith-style, some CMS/WordPress internals) are mentioned as robust but unpleasant for manual editing.
  • Some report successful niche uses of control characters or rare Unicode symbols as internal delimiters (e.g., in web apps, Bible data, finance, or M2M transfers).
  • A recurring theme: CSV/TSV “won” because they balance human readability, keyboard accessibility, and “good enough” machine parsing, despite their flaws.