List animals until failure

Potential LLM / cognition benchmark

  • Several comments suggest using the game as an LLM benchmark: how many unique animals a model can list without repetition or invalid entries.
  • Ideas for harder variants: require no token reuse at all, or enforce patterns (e.g., each token is an anagram of one N steps back) to test planning over long context windows.
  • People speculate that “thinking” models might adopt strategies like alphabetical order or calling tools to track past outputs.

Implementation and data source

  • The game is explicitly non-LLM: basic text parsing plus key–value tables, with main maps for lowercased titles and a taxonomy tree.
  • Data ultimately comes from Wikidata, which explains deep coverage (e.g., tardigrades, obscure insects, dinosaurs) and oddities (joke entries like “drop bear”).
  • There are extra tables for easter eggs and special responses; some users inspect hashes and discover specific strings they map to.

Easter eggs and personality

  • Numerous special responses delight players: “Are you Australian?” for dingoes, special handling of “human,” jokes for unicorn, haggis, Obama, car, etc.
  • Visual touches (background shifts, title color changes, clown and animal emojis) and a playful “JS disabled” message make it feel hand-crafted and personal.

Gameplay strategies and user experience

  • Players report a wide range of scores (tens to a few hundred) and strong mental fatigue under the timer.
  • Common strategies: alphabet (A–Z), grouping by biome (sea/forest/jungle), taxonomic groups (reptiles, birds, insects), extinct animals, or even using Pokémon as cues.
  • Some use it as language practice; others note mobile input and UI lag as main difficulties.

Taxonomy, semantics, and inaccuracies

  • Heated debates arise over equivalences: chipmunks vs squirrels, pigeon vs dove, frogs vs toads, buffalo vs bison, elk vs deer, dingo vs dog, parrot vs budgie, jellyfish vs Portuguese man o’ war.
  • The system often treats general common names as parents of more specific ones, sometimes in ways users find wrong or unintuitive (e.g., “panther,” jellyfish vs siphonophore).
  • This leads to semantic arguments about common vs scientific names, what counts as a “vegetable,” and whether colonies of zooids are “one animal.”

Reverse engineering and maximum score

  • One commenter fully analyzes the internal dataset and rules: deduplications, “too specific” species, unreachable entries, and parent–child relationships.
  • They compute a theoretical max score (~322k animals) and show that, with a custom script and data-structure optimizations, it can be achieved in seconds—though the in-game timer would still run for weeks.