A spellchecker used to be a major feat of software engineering (2008)

Historical constraints and ingenuity

  • Several comments recall early PC and 8‑bit days where having any spellchecker felt miraculous. Separate programs, floppy swaps, and no suggestions were common.
  • The original article’s core challenge—200K words on ~256KB machines, sometimes floppy‑only—is emphasized as the real feat, not just lookup logic.
  • References are made to early Unix spell, classic Programming Pearls columns, and later writeups explaining how these systems fit into 64KB and similar limits.

Algorithms, data structures, and compression

  • People wish the article went deeper into techniques; they speculate and link to:
    • Tries, compressed DAGs, and Bloom filters as dictionary representations.
    • Levenshtein distance (often limited, e.g. edit distance 1) for suggestions.
    • Disk‑based lookup with custom indexing and caching hot words.
  • The need to store dictionaries at well under 1 byte/word pushes toward probabilistic or highly compressed schemes rather than simple tries.

Checker vs. corrector, and why it’s still hard

  • Multiple commenters distinguish:
    • Spell checking: is this token in the word list? Conceptually trivial with basic data structures.
    • Spell correction: generate high‑quality suggestions, especially in context. This is much harder.
  • Examples show how valid but wrong words (“form” vs “from”, “pubic” vs “public”) defeat naive systems.
  • Some argue writing a basic checker is undergraduate‑level; others insist a truly good spellchecker is still non‑trivial without libraries and corpora.

Modern spellcheckers: widespread frustration

  • Many complain that today’s cloud‑scale systems (Gmail/Chrome, Android, iOS) often feel worse than older desktop tools.
  • Suspected causes include:
    • Backend shifts toward more generic ML/LLM systems hurting quality.
    • Poor or missing use of context.
    • Overweighting first letters or profanity filters.
  • iPhone and Android keyboards are singled out for erratic suggestions, over‑eager autocorrect, and annoying UI/UX decisions (e.g., period vs space behavior).

Social and educational angles

  • Historical parallels are drawn with anxiety over calculators and grammar checkers “dumbing down” users.
  • Some say spellcheck improved their spelling by constant feedback; others argue people still can’t spell, so benefits are limited.
  • There’s interest in tools that log personal errors, integrate full dictionaries/thesauri, and explicitly help users learn.

LLMs and new possibilities

  • Commenters note that weak LLMs can now do spell/grammar/style checking and more, via prompts rather than bespoke algorithms.
  • Ideas include using masked‑token models or LLM “surprise” scores to heat‑map awkward or likely‑wrong tokens, addressing the “valid but wrong word” problem.

Beyond English: CJK and input methods

  • Spellchecker‑like technology is noted as essential for Chinese, Japanese, and Korean input, mapping phonetic or partial codes to characters amid high ambiguity.
  • Historical Chinese input hardware and the challenge of large glyph sets are mentioned as related feats of engineering.