Large models of what? Mistaking engineering achievements for linguistic agency

Embodiment, “Languaging,” and the Paper’s Core Claim

  • Paper argues LLMs lack embodiment, interaction with the real world, and “linguistic agency” (multiple simultaneous goals in communication).
  • Some see this as basically correct but obvious and somewhat tautological: it restates what skeptics already believe.
  • Others say it’s dated: ignores multimodal models and extensive interactive post‑training, so the video‑game / “brain in a vat” analogy is too rigid.

Training, Feedback, and Post‑Training Methods

  • Discussion of RLHF, RLAIF, and newer methods like DPO; all rely on human preference data but differ in how reward is modeled.
  • User–model conversations likely feed future training; this blurs the paper’s framing of LLMs as trained only on static corpora.
  • Some note that long chains of interactive correction and retraining don’t yet have a standard name; it’s just “how training works.”

Capabilities vs. Limitations

  • Several commenters report long, coherent dialogues with recent models, contradicting the paper’s example where the model “loses the thread” quickly.
  • LLMs can often perform abstract reasoning on novel, symbolic problems, especially in idealized textbook forms.
  • Critics counter that failures at simple arithmetic and brittle reasoning show a lack of underlying concepts; successes are attributed to pattern matching on recurring forms.
  • There’s dispute over whether next‑token prediction inherently precludes internal world models; some argue any computable process can be cast as such, others insist current systems are just high‑dimensional curve fits.

Intelligence, AGI, and Definitions

  • Repeated theme: we lack precise, agreed definitions of intelligence, consciousness, and AGI, making “LLMs can’t be AGI” or “LLMs think” claims hard to settle.
  • One camp: sufficiently advanced behavioral mimicry just is the thing (language, intelligence) under physicalism.
  • Other camp: embodiment, stakes, and non‑linguistic experience are essential; text‑only models can at best approximate.
  • Some suggest using consensus and obviousness (as with recognizing “flight”) as a pragmatic criterion for intelligence; others point out historical failures to recognize the intelligence of animals or other human groups.

Hype, Value, and Research Trajectory

  • Practitioners describe concrete but narrow wins: using LLMs for text structuring and data engineering vs. over‑engineered “agents” without clear business needs.
  • Disagreement over scaling laws: some think more data/parameters will eventually hit hard limits; others expect continued gains with better training and hybrid architectures.
  • Overall, thread balances excitement about practical capabilities with skepticism about strong claims of understanding, agency, or inevitable AGI.