Wc2: Investigates optimizing 'wc', the Unix word count program
State machines, types, and correctness
- Many praise finite-state machines (FSMs) as a clean way to express parsers and avoid “partially initialized object” bugs.
- Others argue that sum types / discriminated unions (e.g., Rust enums) are a better way to “make invalid states unrepresentable” than hand-written FSMs.
- Discussion compares Go (always has a default, often unwanted) vs C++ vs Rust (opt‑in default, move semantics, Result instead of exceptions).
- Some note that coroutines can hide an explicit state machine inside compiler-generated code.
How wc2’s approach differs
- Core benefit seen as: one pass, fixed work per byte, using a DFA table instead of branches and heavyweight multibyte / locale routines like
mbrtowc+iswspace. - Table encodes states like “in word”, “between words”, “after newline”; transitions depend on character class (word, space, newline).
- This yields throughput that mostly depends only on input size, not content mix, unlike implementations with complex branching and library calls.
- Several want better documentation of the state tables and generation process; existing comments are seen as opaque and “magic-number-like”.
SIMD, GPUs, and performance limits
- Multiple comments highlight SIMD
wcimplementations that achieve 4–10× speedups over byte-at-a-time code on ASCII, sometimes disk- or SSD-limited. - Unicode-aware SIMD is seen as much harder; ASCII-only is called “cheating” by some.
- One GPU prototype shows modest gains; others doubt GPU can beat a good SIMD CPU version once data transfer is considered.
“Asynchronous state-machine” terminology
- Several point out the code is just a DFA over a byte stream, not truly asynchronous I/O.
- Some argue “asynchronous” was misused; others interpret it loosely as “streaming, no full buffering”. Overall, meaning is considered unclear.
Unicode, locales, and POSIX compliance
- wc2’s speedup is partly attributed to assuming UTF‑8 and bypassing locale‑aware C APIs; critics note this breaks correctness in non‑UTF‑8 locales and is not a drop-in POSIX
wc. - Others counter that modern tools restricted to ASCII/UTF‑8 are acceptable in many domains, but agree that’s incompatible with being a full
wcreplacement.
Comparisons to existing wc implementations
- Benchmarks show BSD
wccan already be very fast; gains from wc2 are input- and option-dependent (e.g., line-only vs full counts). - Historical and alternative
wcimplementations (Unix, Plan 9, BusyBox, BSD, GNU, C++ versions) are discussed in terms of complexity, POSIX options, and readability. - Several note that unless these optimizations land in coreutils/BSD, most shell users will never benefit.