How many of the 170k English words do you know?
Overall Impressions
- Many found the quiz fun, ego-boosting, and a pleasant diversion, especially as words became more obscure near the end.
- Others got bored or quit early due to the basic early levels and sluggish interaction; some felt it never became truly challenging.
UX and Interaction Design
- Major complaint: too many clicks per word (choose → check → continue), with buttons far apart and sometimes off-screen on mobile.
- Strong recurring requests:
- Single-click flow (answer immediately advances, maybe with an undo).
- Keyboard shortcuts (1–4 to choose, Enter to submit/advance).
- Adaptive difficulty so strong users skip long “basics” phases.
- Layout jank and scrolling issues (especially mobile) were noted as annoying.
Test Design, Guessability, and Validity
- Many noticed patterns that made guessing easy:
- Correct answer often the longest definition.
- Often a “correct + opposite + 2 random” structure.
- Sometimes two directly opposite choices, with the answer always one of them.
- Several people tested “always pick the longest answer” and reported high accuracy.
- Lack of an “I don’t know” button was widely criticized; forced guessing inflates scores.
- Multiple commenters argued the test measures guessing strategies and etymological deduction more than true vocabulary size.
Word Choice, Difficulty, and Bias
- Word list seen as skewed:
- Heavy on Latin/French/Greek-derived words and meta-words about language, verbosity, and rhetoric.
- Late levels mix genuinely obscure items with fairly common ones (“obfuscate,” “zeitgeist,” “kerfuffle”).
- Non-native speakers of Romance or Germanic languages often scored very high by leveraging cognates and roots.
- Some objected to fictional, jokey, or hyper-rare words and to including proper names.
Definitions and “Science” Claims
- Multiple users flagged definitions as shallow, circular, misleading, or incomplete (e.g., “lethargic,” “verbose,” “candid,” “complacent,” “frugal,” “obsequious”).
- Several criticized apparent LLM-generated distractors and wording as “AI slop.”
- The described “stratified sampling” algorithm impressed some but others noted:
- Band sizes only sum to ~85k vs the ~171k cited, so even 100/100 yields ~50% of total.
- Word bands don’t consistently match actual difficulty or frequency.
- Alternatives were suggested (adaptive/IRT-style tests, Elo-like models, other existing vocab tests).