Whence '\n'?

Escape sequence \n and compiler bootstrapping

  • Central point: where does the mapping '\n' → byte 10 actually live when a compiler is self-hosting and its source only ever says '\n', not 10?
  • Discussion emphasizes that in Rust’s current sources, this mapping is not explicitly numeric; the knowledge is “inherited” transitively from earlier compilers, echoing classic “trusting trust” concerns.
  • Some argue this shows the compiler is not rebuildable “from scratch” purely from its current sources; it depends on implicit knowledge embedded in an ancestor binary.
  • Others stress that physically there has always been a 0x0A somewhere in the binary; '\n' is just a human-facing alias.

Related prior work and inspirations

  • Several commenters link similar explorations: self-hosting C compiler diaries, classic lectures on compilers inserting hidden behavior, and prior blog posts about escape sequences.
  • There’s interest in how different compilers (GCC, Clang) do it; code excerpts show they hardcode numeric values for escapes, with ASCII/EBCDIC branches.

Character escapes in other languages

  • OCaml’s decimal \010-style escapes are discussed; some find them more “primitive” than symbolic escapes like '\n', others note they still rely on numeric parsing that itself must be grounded somewhere.
  • Backslash-decimal escapes are noted as rare but present in OCaml, Lua, DNS.
  • Python’s \N{UNICODE NAME} is mentioned as a more self-documenting mechanism.
  • CSV-style \N as NULL and Python’s \N lead to some confusion over the HN title.

ASCII, control codes, and encodings

  • Several comments review ASCII control codes, caret notation (^C, ^M), and how control characters are mapped by terminals.
  • EBCDIC is raised as a complicating case: early C existed on non-ASCII systems, with different newline codes and even separate NEL vs LF.
  • There’s debate over whether the article misses this wider encoding context.

Meta reactions, alternatives, and tangents

  • Some readers find the article poetic and mind-expanding about compilers as inputs; others dismiss it as a “nothingburger.”
  • Speculation on alternatives if ASCII had no escapes: constants like PHP_EOL, font-level visible control glyphs, or APIs for cursor movement.
  • Extended humorous tangent on the “USB rule” and connector confusion.
  • Brief note that some newer languages deliberately postpone self-hosting, possibly to avoid such bootstrapping subtleties.