How many of the 170k English words do you know?

Overall Impressions

  • Many found the quiz fun, ego-boosting, and a pleasant diversion, especially as words became more obscure near the end.
  • Others got bored or quit early due to the basic early levels and sluggish interaction; some felt it never became truly challenging.

UX and Interaction Design

  • Major complaint: too many clicks per word (choose → check → continue), with buttons far apart and sometimes off-screen on mobile.
  • Strong recurring requests:
    • Single-click flow (answer immediately advances, maybe with an undo).
    • Keyboard shortcuts (1–4 to choose, Enter to submit/advance).
    • Adaptive difficulty so strong users skip long “basics” phases.
  • Layout jank and scrolling issues (especially mobile) were noted as annoying.

Test Design, Guessability, and Validity

  • Many noticed patterns that made guessing easy:
    • Correct answer often the longest definition.
    • Often a “correct + opposite + 2 random” structure.
    • Sometimes two directly opposite choices, with the answer always one of them.
  • Several people tested “always pick the longest answer” and reported high accuracy.
  • Lack of an “I don’t know” button was widely criticized; forced guessing inflates scores.
  • Multiple commenters argued the test measures guessing strategies and etymological deduction more than true vocabulary size.

Word Choice, Difficulty, and Bias

  • Word list seen as skewed:
    • Heavy on Latin/French/Greek-derived words and meta-words about language, verbosity, and rhetoric.
    • Late levels mix genuinely obscure items with fairly common ones (“obfuscate,” “zeitgeist,” “kerfuffle”).
  • Non-native speakers of Romance or Germanic languages often scored very high by leveraging cognates and roots.
  • Some objected to fictional, jokey, or hyper-rare words and to including proper names.

Definitions and “Science” Claims

  • Multiple users flagged definitions as shallow, circular, misleading, or incomplete (e.g., “lethargic,” “verbose,” “candid,” “complacent,” “frugal,” “obsequious”).
  • Several criticized apparent LLM-generated distractors and wording as “AI slop.”
  • The described “stratified sampling” algorithm impressed some but others noted:
    • Band sizes only sum to ~85k vs the ~171k cited, so even 100/100 yields ~50% of total.
    • Word bands don’t consistently match actual difficulty or frequency.
  • Alternatives were suggested (adaptive/IRT-style tests, Elo-like models, other existing vocab tests).