Show HN: Transductive regular expressions for text editing
Clarifying semantics & examples
- Several readers were confused by README examples (especially the
c:da:ot:gand cat/dog transformations) and initial typos/inconsistent outputs (cats vs dogs, infinite-loop examples). - Discussion revealed that:
- Concatenation is implicit (like regex), with an “invisible” operator between characters.
:is a transduction operator with higher precedence than concatenation, soc:da:ot:gcurrently parses asc:d ~ a:o ~ t:g, not(cat):(dog).- Empty (epsilon) operands are implicitly injected on one side of
:when omitted, which also affects parsing.
Operator precedence, grammar, and ambiguity
- Multiple commenters found the current precedence (colon stronger than concatenation) unintuitive; many expect
cat:dogto mean(cat):(dog), notca(t:d)og. - Grammar in the docs is acknowledged as underspecified and potentially misleading, especially around:
- Where epsilon can appear.
- How multiple
:operators (e.g.:a:) are parsed.
- Suggestions included:
- Making
:lower precedence than concatenation. - Making epsilon explicit in the grammar rather than “injected”.
- Possible alternative syntaxes (e.g.
<regex>generator, or character-class mapping styles).
- Making
Capabilities vs. traditional regex/sed
- Proponents see trre as:
- A more “literal” search/replace syntax, particularly when doing contextual replacements without backreferences.
- A small, direct implementation of finite-state transducers, enabling deterministic compilation, generation of matching strings, and tricks like Levenshtein-1 edits and simple spell-checking.
- Skeptics argue:
- It doesn’t provide fundamentally new capabilities beyond
sed/regex substitutions and sometimes looks more verbose. - Lack of backreferences and structural transformations makes it weaker for some complex substitutions (reordering captured parts, more tree-like rewrites).
- Adding
:and extra escaping may make easy tasks (simple replacements) harder.
- It doesn’t provide fundamentally new capabilities beyond
Use cases, limitations, and infinite generators
- Current model is mostly “replace in place”; some worry it’s insufficient for more structural edits.
- Right-side repetition (
*,+) can cause infinite loops; author now leans toward disabling this, though infinite generators are seen as potentially interesting. - trre can also be used as a generator: given
:(regex)it can enumerate strings in the regular language (with flags like-m -a), which some find compelling.
Implementation, ecosystem, and related work
- Tool is praised for being small and readable C, with a clear automata-theoretic foundation (FSTs).
- Many pointers to related FST toolkits and languages (XFST, FOMA, HFST, OpenFST, Pynini, Carmel, Rosie Pattern Language), and to prior work in morphology, speech recognition, and linguistics.
- Some suggest this could fit well into editors lacking good regex-based replacement, though it’s still “raw” and evolving.