Wc2: Investigates optimizing 'wc', the Unix word count program

State machines, types, and correctness

  • Many praise finite-state machines (FSMs) as a clean way to express parsers and avoid “partially initialized object” bugs.
  • Others argue that sum types / discriminated unions (e.g., Rust enums) are a better way to “make invalid states unrepresentable” than hand-written FSMs.
  • Discussion compares Go (always has a default, often unwanted) vs C++ vs Rust (opt‑in default, move semantics, Result instead of exceptions).
  • Some note that coroutines can hide an explicit state machine inside compiler-generated code.

How wc2’s approach differs

  • Core benefit seen as: one pass, fixed work per byte, using a DFA table instead of branches and heavyweight multibyte / locale routines like mbrtowc + iswspace.
  • Table encodes states like “in word”, “between words”, “after newline”; transitions depend on character class (word, space, newline).
  • This yields throughput that mostly depends only on input size, not content mix, unlike implementations with complex branching and library calls.
  • Several want better documentation of the state tables and generation process; existing comments are seen as opaque and “magic-number-like”.

SIMD, GPUs, and performance limits

  • Multiple comments highlight SIMD wc implementations that achieve 4–10× speedups over byte-at-a-time code on ASCII, sometimes disk- or SSD-limited.
  • Unicode-aware SIMD is seen as much harder; ASCII-only is called “cheating” by some.
  • One GPU prototype shows modest gains; others doubt GPU can beat a good SIMD CPU version once data transfer is considered.

“Asynchronous state-machine” terminology

  • Several point out the code is just a DFA over a byte stream, not truly asynchronous I/O.
  • Some argue “asynchronous” was misused; others interpret it loosely as “streaming, no full buffering”. Overall, meaning is considered unclear.

Unicode, locales, and POSIX compliance

  • wc2’s speedup is partly attributed to assuming UTF‑8 and bypassing locale‑aware C APIs; critics note this breaks correctness in non‑UTF‑8 locales and is not a drop-in POSIX wc.
  • Others counter that modern tools restricted to ASCII/UTF‑8 are acceptable in many domains, but agree that’s incompatible with being a full wc replacement.

Comparisons to existing wc implementations

  • Benchmarks show BSD wc can already be very fast; gains from wc2 are input- and option-dependent (e.g., line-only vs full counts).
  • Historical and alternative wc implementations (Unix, Plan 9, BusyBox, BSD, GNU, C++ versions) are discussed in terms of complexity, POSIX options, and readability.
  • Several note that unless these optimizations land in coreutils/BSD, most shell users will never benefit.