Rust for tokenising and parsing
Rust for tokenising and parsing
- Several commenters agree Rust is pleasant for writing lexers/parsers, especially with algebraic data types and pattern matching.
- Rust’s zero-cost abstractions and ownership model enable efficient, zero-copy parsing, which is valued for high-throughput or embedded use cases.
- Others find Rust fine for tokenising/AST building but painful for later phases (interpreting, typechecking) because of the borrow checker.
Macros, AST design, and type systems
- Some see the article’s heavy macro use as a sign of inexperience with algebraic data types; they argue simple enums/structs would suffice for an SQLite grammar.
- Others describe sophisticated macro-based AST hierarchies using enums,
Rc, andPhantomDatato model up/down-casting without dynamic dispatch. - Debugging macros is typically done via
cargo expandor IDE macro expansion; declarative macros are considered more maintainable than proc macros.
Parser libraries and tools mentioned
- Rust crates:
winnow,nomfor parser combinators, including examples of context-sensitive parsing (e.g., aⁿbⁿcⁿ).logosfor lexing.pest(PEG-based) praised for ergonomics and an online grammar editor, with plans for stronger typing in Pest 3.
- Other ecosystems:
- Haskell libraries such as Megaparsec/Attoparsec held up as extremely expressive, almost BNF-like.
- Ragel, Lemon (used by SQLite), ANTLR, tree-sitter, and LR(1)/LALR grammars discussed for SQL/SQLite.
- Some argue mature generator-based stacks (e.g., C + Ragel + eBNF) reach a “saturation point” where only the grammar matters.
Borrow checker and language choices for PL work
- One recurring theme: Rust’s borrow checker can dominate thought when building interpreters/typecheckers, pushing some toward OCaml or Haskell for PL research or prototyping.
- Suggested Rust patterns to ease this:
- Flat, ID-indexed ASTs (vectors plus typed indices) instead of nested
Rc/Weak. - Avoid storing
&strin AST nodes; use indices, interning, or static strings. - Arena allocators for long-lived trees.
- Flat, ID-indexed ASTs (vectors plus typed indices) instead of nested
Comparisons with other languages
- Haskell is often seen as the most elegant for parser combinators; some love it, others dislike its error messages and ecosystem.
- OCaml and F# are recommended as Rust-like but GC’d options for PL work.
- Go is widely criticized as a poor fit for parsers (lack of sum types, verbose error handling) but defended as a simple, tooling-strong language for servers.
- C/Ragel are cited as simple and performant but less ergonomic than Rust’s ADTs and type system.