Rust for tokenising and parsing

Rust for tokenising and parsing

  • Several commenters agree Rust is pleasant for writing lexers/parsers, especially with algebraic data types and pattern matching.
  • Rust’s zero-cost abstractions and ownership model enable efficient, zero-copy parsing, which is valued for high-throughput or embedded use cases.
  • Others find Rust fine for tokenising/AST building but painful for later phases (interpreting, typechecking) because of the borrow checker.

Macros, AST design, and type systems

  • Some see the article’s heavy macro use as a sign of inexperience with algebraic data types; they argue simple enums/structs would suffice for an SQLite grammar.
  • Others describe sophisticated macro-based AST hierarchies using enums, Rc, and PhantomData to model up/down-casting without dynamic dispatch.
  • Debugging macros is typically done via cargo expand or IDE macro expansion; declarative macros are considered more maintainable than proc macros.

Parser libraries and tools mentioned

  • Rust crates:
    • winnow, nom for parser combinators, including examples of context-sensitive parsing (e.g., aⁿbⁿcⁿ).
    • logos for lexing.
    • pest (PEG-based) praised for ergonomics and an online grammar editor, with plans for stronger typing in Pest 3.
  • Other ecosystems:
    • Haskell libraries such as Megaparsec/Attoparsec held up as extremely expressive, almost BNF-like.
    • Ragel, Lemon (used by SQLite), ANTLR, tree-sitter, and LR(1)/LALR grammars discussed for SQL/SQLite.
  • Some argue mature generator-based stacks (e.g., C + Ragel + eBNF) reach a “saturation point” where only the grammar matters.

Borrow checker and language choices for PL work

  • One recurring theme: Rust’s borrow checker can dominate thought when building interpreters/typecheckers, pushing some toward OCaml or Haskell for PL research or prototyping.
  • Suggested Rust patterns to ease this:
    • Flat, ID-indexed ASTs (vectors plus typed indices) instead of nested Rc/Weak.
    • Avoid storing &str in AST nodes; use indices, interning, or static strings.
    • Arena allocators for long-lived trees.

Comparisons with other languages

  • Haskell is often seen as the most elegant for parser combinators; some love it, others dislike its error messages and ecosystem.
  • OCaml and F# are recommended as Rust-like but GC’d options for PL work.
  • Go is widely criticized as a poor fit for parsers (lack of sum types, verbose error handling) but defended as a simple, tooling-strong language for servers.
  • C/Ragel are cited as simple and performant but less ergonomic than Rust’s ADTs and type system.