2026-06-19

How many of the 170k English words do you know?

Overall Impressions

Many found the quiz fun, ego-boosting, and a pleasant diversion, especially as words became more obscure near the end.
Others got bored or quit early due to the basic early levels and sluggish interaction; some felt it never became truly challenging.

UX and Interaction Design

Major complaint: too many clicks per word (choose → check → continue), with buttons far apart and sometimes off-screen on mobile.
Strong recurring requests:
- Single-click flow (answer immediately advances, maybe with an undo).
- Keyboard shortcuts (1–4 to choose, Enter to submit/advance).
- Adaptive difficulty so strong users skip long “basics” phases.
Layout jank and scrolling issues (especially mobile) were noted as annoying.

Test Design, Guessability, and Validity

Many noticed patterns that made guessing easy:
- Correct answer often the longest definition.
- Often a “correct + opposite + 2 random” structure.
- Sometimes two directly opposite choices, with the answer always one of them.
Several people tested “always pick the longest answer” and reported high accuracy.
Lack of an “I don’t know” button was widely criticized; forced guessing inflates scores.
Multiple commenters argued the test measures guessing strategies and etymological deduction more than true vocabulary size.

Word Choice, Difficulty, and Bias

Word list seen as skewed:
- Heavy on Latin/French/Greek-derived words and meta-words about language, verbosity, and rhetoric.
- Late levels mix genuinely obscure items with fairly common ones (“obfuscate,” “zeitgeist,” “kerfuffle”).
Non-native speakers of Romance or Germanic languages often scored very high by leveraging cognates and roots.
Some objected to fictional, jokey, or hyper-rare words and to including proper names.

Definitions and “Science” Claims

Multiple users flagged definitions as shallow, circular, misleading, or incomplete (e.g., “lethargic,” “verbose,” “candid,” “complacent,” “frugal,” “obsequious”).
Several criticized apparent LLM-generated distractors and wording as “AI slop.”
The described “stratified sampling” algorithm impressed some but others noted:
- Band sizes only sum to ~85k vs the ~171k cited, so even 100/100 yields ~50% of total.
- Word bands don’t consistently match actual difficulty or frequency.
Alternatives were suggested (adaptive/IRT-style tests, Elo-like models, other existing vocab tests).

Related topics