Bag of words, have mercy on us

Metaphors and Mental Models

  • Many object to “bag of words” as a metaphor: it’s already a specific NLP term, sounds trivial, and doesn’t match how people actually use LLMs.
  • Alternatives proposed: “superpowered autocomplete,” “glorified/luxury autocomplete,” “search engine that can remix results,” “spoken query language,” or “Library of Babel with compression and artifacts.”
  • Some defend “bag of words” (or “word-hoard”) as deliberately anti-personal: a corrective to “silicon homunculus” metaphors, not a technical description.

Anthropomorphism and Interfaces

  • Commenters repeatedly see people treat LLMs as thinking, feeling agents, despite repeated explanations that they’re predictors.
  • Chat-style UIs, system prompts, memory, tool use, and human-like tone are seen as major anthropomorphizing scaffolding that hides the underlying mechanics.
  • Some argue a less chatty, more “complete this text / call this tool” interface would reduce misplaced trust and quasi-religious attitudes.

Capabilities vs. “Just Autocomplete”

  • Disagreement over whether “just prediction” is dismissive:
    • Critics: next-token prediction on text ≠ modeling the physical world or doing reliable reasoning; models lack stable world models, meta-knowledge, and consistent self-critique.
    • Defenders: prediction is central to human cognition too; given scale, tool use, feedback loops and agents, prediction-plus-scaffolding may cross into genuine problem solving.
  • Examples cited both ways: impressive math/competition performance, code generation for novel ISAs vs. brittle reasoning, hallucinations, and inconsistency under minor prompt changes.

Human Cognition Comparisons

  • Long subthread on whether all thinking is prediction: references to predictive processing / free-energy ideas vs. objections that this redefines “thinking” so broadly it loses usefulness.
  • Some argue we don’t understand human thought or consciousness well enough to assert LLMs categorically “don’t think”; others say lack of learning at inference time, motivation, and embodiment are decisive differences.

Ethics, Risk, and Social Roles

  • Underestimating LLMs risks missed opportunities; overestimating them risks delusion, over-delegation in high-stakes domains, and possible moral misclassification (either of humans or models).
  • Economic concern: many “word-only” roles may be replaceable if a “magic bag of words” is good enough for employers.
  • Creative concern: several insist they value works because humans made them, akin to the “forklift at the gym” analogy; others see AI as acceptable when the goal is output, not personal growth.

Interpretability and Inner Structure

  • Interpretability work (e.g., concept neurons, cross-lingual features, confidence/introspection signals) is cited as evidence of internal structure beyond naive bag-of-words.
  • Skeptics counter that much of this research is unreviewed, commercially motivated, and doesn’t yet demonstrate human-like understanding or robust world models.