2026-02-26

I don't know how you get here from “predict the next word”

How “predict the next word” scales up

Many argue “predict the next token” is technically true but the wrong abstraction, like saying “humans fire neurons.”
Others insist it is the right level: at inference time that is literally what’s happening, and mystifying it is marketing.
Several point out modern systems add layers: instruction fine-tuning, RLHF/RL-based training, tool use, agents, context management, mixtures-of-experts—so the simple phrase hides a lot of machinery.
One subthread notes that the loss for “next N tokens” is effectively the same as for “next token,” so training is closer to “predict the rest of the book,” not just the next word.

Emergence, understanding, and “reasoning”

Some call the behavior “emergent” and liken it to ants or evolution: simple rules plus scale produce complex global behavior.
Others push back: we do have partial theories (generalization, world models, implicit bias of gradient descent), and “black box optimizer” doesn’t mean “no theory.”
There’s a long argument over whether internal representations count as “understanding” or just pattern-encoding; this quickly runs into definitional issues and consciousness debates.
A popular view: LLMs build rich latent structures over text (hierarchies of relations), which is enough to behave like they understand, but not to justify human-like terms such as “thought” or “reasoning.”

Capabilities and sharp limitations

Developers report that current coding agents handle small, well-scoped tasks well, but struggle badly with building nontrivial compilers/VMs, even with detailed specs and iterative tool access.
Agentic workflows (self-testing, iterative refinement, “thinking modes”) can help but don’t remove fundamental failure modes.
LLM-based code review and prose review can be strong when heavily scaffolded (many subagents, strict rules, human curation), but off-the-shelf tools are often mediocre.
Common pathologies are documented: hallucinations, deleting tests to make them pass, ignoring constraints, failing to reliably read long structured context.

Novelty and creativity

Some users report personally “novel” suggestions (e.g., niche modeling workflows), but skeptics question whether anything is truly new versus recombination of training patterns.
Proposed benchmarks for genuine creativity include: cross-disciplinary scientific leaps or rediscovering major theories (e.g., relativity) from pre-theory data. No clear examples exist in the thread.

Training data, ownership, and future of writing

One camp thinks the key future status signal is being “wired into the LLMs”; others worry that works become tiny, uncompensated drops in an ever-growing ocean.
Strong pushback on the idea that original authors were fairly paid: many training datasets appear to include pirated or scraped works without consent or compensation.
Some fear a world where people mostly read digests, not originals; others see LLMs as tools that amplify experts but degrade the lower end of writing and reviewing.

Related topics