Show HN: Transductive regular expressions for text editing

Clarifying semantics & examples

  • Several readers were confused by README examples (especially the c:da:ot:g and cat/dog transformations) and initial typos/inconsistent outputs (cats vs dogs, infinite-loop examples).
  • Discussion revealed that:
    • Concatenation is implicit (like regex), with an “invisible” operator between characters.
    • : is a transduction operator with higher precedence than concatenation, so c:da:ot:g currently parses as c:d ~ a:o ~ t:g, not (cat):(dog).
    • Empty (epsilon) operands are implicitly injected on one side of : when omitted, which also affects parsing.

Operator precedence, grammar, and ambiguity

  • Multiple commenters found the current precedence (colon stronger than concatenation) unintuitive; many expect cat:dog to mean (cat):(dog), not ca(t:d)og.
  • Grammar in the docs is acknowledged as underspecified and potentially misleading, especially around:
    • Where epsilon can appear.
    • How multiple : operators (e.g. :a:) are parsed.
  • Suggestions included:
    • Making : lower precedence than concatenation.
    • Making epsilon explicit in the grammar rather than “injected”.
    • Possible alternative syntaxes (e.g. <regex>generator, or character-class mapping styles).

Capabilities vs. traditional regex/sed

  • Proponents see trre as:
    • A more “literal” search/replace syntax, particularly when doing contextual replacements without backreferences.
    • A small, direct implementation of finite-state transducers, enabling deterministic compilation, generation of matching strings, and tricks like Levenshtein-1 edits and simple spell-checking.
  • Skeptics argue:
    • It doesn’t provide fundamentally new capabilities beyond sed/regex substitutions and sometimes looks more verbose.
    • Lack of backreferences and structural transformations makes it weaker for some complex substitutions (reordering captured parts, more tree-like rewrites).
    • Adding : and extra escaping may make easy tasks (simple replacements) harder.

Use cases, limitations, and infinite generators

  • Current model is mostly “replace in place”; some worry it’s insufficient for more structural edits.
  • Right-side repetition (*, +) can cause infinite loops; author now leans toward disabling this, though infinite generators are seen as potentially interesting.
  • trre can also be used as a generator: given :(regex) it can enumerate strings in the regular language (with flags like -m -a), which some find compelling.

Implementation, ecosystem, and related work

  • Tool is praised for being small and readable C, with a clear automata-theoretic foundation (FSTs).
  • Many pointers to related FST toolkits and languages (XFST, FOMA, HFST, OpenFST, Pynini, Carmel, Rosie Pattern Language), and to prior work in morphology, speech recognition, and linguistics.
  • Some suggest this could fit well into editors lacking good regex-based replacement, though it’s still “raw” and evolving.