A spellchecker used to be a major feat of software engineering (2008)
Historical constraints and ingenuity
- Several comments recall early PC and 8‑bit days where having any spellchecker felt miraculous. Separate programs, floppy swaps, and no suggestions were common.
- The original article’s core challenge—200K words on ~256KB machines, sometimes floppy‑only—is emphasized as the real feat, not just lookup logic.
- References are made to early Unix
spell, classic Programming Pearls columns, and later writeups explaining how these systems fit into 64KB and similar limits.
Algorithms, data structures, and compression
- People wish the article went deeper into techniques; they speculate and link to:
- Tries, compressed DAGs, and Bloom filters as dictionary representations.
- Levenshtein distance (often limited, e.g. edit distance 1) for suggestions.
- Disk‑based lookup with custom indexing and caching hot words.
- The need to store dictionaries at well under 1 byte/word pushes toward probabilistic or highly compressed schemes rather than simple tries.
Checker vs. corrector, and why it’s still hard
- Multiple commenters distinguish:
- Spell checking: is this token in the word list? Conceptually trivial with basic data structures.
- Spell correction: generate high‑quality suggestions, especially in context. This is much harder.
- Examples show how valid but wrong words (“form” vs “from”, “pubic” vs “public”) defeat naive systems.
- Some argue writing a basic checker is undergraduate‑level; others insist a truly good spellchecker is still non‑trivial without libraries and corpora.
Modern spellcheckers: widespread frustration
- Many complain that today’s cloud‑scale systems (Gmail/Chrome, Android, iOS) often feel worse than older desktop tools.
- Suspected causes include:
- Backend shifts toward more generic ML/LLM systems hurting quality.
- Poor or missing use of context.
- Overweighting first letters or profanity filters.
- iPhone and Android keyboards are singled out for erratic suggestions, over‑eager autocorrect, and annoying UI/UX decisions (e.g., period vs space behavior).
Social and educational angles
- Historical parallels are drawn with anxiety over calculators and grammar checkers “dumbing down” users.
- Some say spellcheck improved their spelling by constant feedback; others argue people still can’t spell, so benefits are limited.
- There’s interest in tools that log personal errors, integrate full dictionaries/thesauri, and explicitly help users learn.
LLMs and new possibilities
- Commenters note that weak LLMs can now do spell/grammar/style checking and more, via prompts rather than bespoke algorithms.
- Ideas include using masked‑token models or LLM “surprise” scores to heat‑map awkward or likely‑wrong tokens, addressing the “valid but wrong word” problem.
Beyond English: CJK and input methods
- Spellchecker‑like technology is noted as essential for Chinese, Japanese, and Korean input, mapping phonetic or partial codes to characters amid high ambiguity.
- Historical Chinese input hardware and the challenge of large glyph sets are mentioned as related feats of engineering.