2025-08-09

A spellchecker used to be a major feat of software engineering (2008)

Historical constraints and ingenuity

Several comments recall early PC and 8‑bit days where having any spellchecker felt miraculous. Separate programs, floppy swaps, and no suggestions were common.
The original article’s core challenge—200K words on ~256KB machines, sometimes floppy‑only—is emphasized as the real feat, not just lookup logic.
References are made to early Unix spell, classic Programming Pearls columns, and later writeups explaining how these systems fit into 64KB and similar limits.

Algorithms, data structures, and compression

People wish the article went deeper into techniques; they speculate and link to:
- Tries, compressed DAGs, and Bloom filters as dictionary representations.
- Levenshtein distance (often limited, e.g. edit distance 1) for suggestions.
- Disk‑based lookup with custom indexing and caching hot words.
The need to store dictionaries at well under 1 byte/word pushes toward probabilistic or highly compressed schemes rather than simple tries.

Checker vs. corrector, and why it’s still hard

Multiple commenters distinguish:
- Spell checking: is this token in the word list? Conceptually trivial with basic data structures.
- Spell correction: generate high‑quality suggestions, especially in context. This is much harder.
Examples show how valid but wrong words (“form” vs “from”, “pubic” vs “public”) defeat naive systems.
Some argue writing a basic checker is undergraduate‑level; others insist a truly good spellchecker is still non‑trivial without libraries and corpora.

Modern spellcheckers: widespread frustration

Many complain that today’s cloud‑scale systems (Gmail/Chrome, Android, iOS) often feel worse than older desktop tools.
Suspected causes include:
- Backend shifts toward more generic ML/LLM systems hurting quality.
- Poor or missing use of context.
- Overweighting first letters or profanity filters.
iPhone and Android keyboards are singled out for erratic suggestions, over‑eager autocorrect, and annoying UI/UX decisions (e.g., period vs space behavior).

Social and educational angles

Historical parallels are drawn with anxiety over calculators and grammar checkers “dumbing down” users.
Some say spellcheck improved their spelling by constant feedback; others argue people still can’t spell, so benefits are limited.
There’s interest in tools that log personal errors, integrate full dictionaries/thesauri, and explicitly help users learn.

LLMs and new possibilities

Commenters note that weak LLMs can now do spell/grammar/style checking and more, via prompts rather than bespoke algorithms.
Ideas include using masked‑token models or LLM “surprise” scores to heat‑map awkward or likely‑wrong tokens, addressing the “valid but wrong word” problem.

Beyond English: CJK and input methods

Spellchecker‑like technology is noted as essential for Chinese, Japanese, and Korean input, mapping phonetic or partial codes to characters amid high ambiguity.
Historical Chinese input hardware and the challenge of large glyph sets are mentioned as related feats of engineering.

Related topics