2024-07-16

Large models of what? Mistaking engineering achievements for linguistic agency

Embodiment, “Languaging,” and the Paper’s Core Claim

Paper argues LLMs lack embodiment, interaction with the real world, and “linguistic agency” (multiple simultaneous goals in communication).
Some see this as basically correct but obvious and somewhat tautological: it restates what skeptics already believe.
Others say it’s dated: ignores multimodal models and extensive interactive post‑training, so the video‑game / “brain in a vat” analogy is too rigid.

Training, Feedback, and Post‑Training Methods

Discussion of RLHF, RLAIF, and newer methods like DPO; all rely on human preference data but differ in how reward is modeled.
User–model conversations likely feed future training; this blurs the paper’s framing of LLMs as trained only on static corpora.
Some note that long chains of interactive correction and retraining don’t yet have a standard name; it’s just “how training works.”

Capabilities vs. Limitations

Several commenters report long, coherent dialogues with recent models, contradicting the paper’s example where the model “loses the thread” quickly.
LLMs can often perform abstract reasoning on novel, symbolic problems, especially in idealized textbook forms.
Critics counter that failures at simple arithmetic and brittle reasoning show a lack of underlying concepts; successes are attributed to pattern matching on recurring forms.
There’s dispute over whether next‑token prediction inherently precludes internal world models; some argue any computable process can be cast as such, others insist current systems are just high‑dimensional curve fits.

Intelligence, AGI, and Definitions

Repeated theme: we lack precise, agreed definitions of intelligence, consciousness, and AGI, making “LLMs can’t be AGI” or “LLMs think” claims hard to settle.
One camp: sufficiently advanced behavioral mimicry just is the thing (language, intelligence) under physicalism.
Other camp: embodiment, stakes, and non‑linguistic experience are essential; text‑only models can at best approximate.
Some suggest using consensus and obviousness (as with recognizing “flight”) as a pragmatic criterion for intelligence; others point out historical failures to recognize the intelligence of animals or other human groups.

Hype, Value, and Research Trajectory

Practitioners describe concrete but narrow wins: using LLMs for text structuring and data engineering vs. over‑engineered “agents” without clear business needs.
Disagreement over scaling laws: some think more data/parameters will eventually hit hard limits; others expect continued gains with better training and hybrid architectures.
Overall, thread balances excitement about practical capabilities with skepticism about strong claims of understanding, agency, or inevitable AGI.

Related topics