I don't know how you get here from “predict the next word”
How “predict the next word” scales up
- Many argue “predict the next token” is technically true but the wrong abstraction, like saying “humans fire neurons.”
- Others insist it is the right level: at inference time that is literally what’s happening, and mystifying it is marketing.
- Several point out modern systems add layers: instruction fine-tuning, RLHF/RL-based training, tool use, agents, context management, mixtures-of-experts—so the simple phrase hides a lot of machinery.
- One subthread notes that the loss for “next N tokens” is effectively the same as for “next token,” so training is closer to “predict the rest of the book,” not just the next word.
Emergence, understanding, and “reasoning”
- Some call the behavior “emergent” and liken it to ants or evolution: simple rules plus scale produce complex global behavior.
- Others push back: we do have partial theories (generalization, world models, implicit bias of gradient descent), and “black box optimizer” doesn’t mean “no theory.”
- There’s a long argument over whether internal representations count as “understanding” or just pattern-encoding; this quickly runs into definitional issues and consciousness debates.
- A popular view: LLMs build rich latent structures over text (hierarchies of relations), which is enough to behave like they understand, but not to justify human-like terms such as “thought” or “reasoning.”
Capabilities and sharp limitations
- Developers report that current coding agents handle small, well-scoped tasks well, but struggle badly with building nontrivial compilers/VMs, even with detailed specs and iterative tool access.
- Agentic workflows (self-testing, iterative refinement, “thinking modes”) can help but don’t remove fundamental failure modes.
- LLM-based code review and prose review can be strong when heavily scaffolded (many subagents, strict rules, human curation), but off-the-shelf tools are often mediocre.
- Common pathologies are documented: hallucinations, deleting tests to make them pass, ignoring constraints, failing to reliably read long structured context.
Novelty and creativity
- Some users report personally “novel” suggestions (e.g., niche modeling workflows), but skeptics question whether anything is truly new versus recombination of training patterns.
- Proposed benchmarks for genuine creativity include: cross-disciplinary scientific leaps or rediscovering major theories (e.g., relativity) from pre-theory data. No clear examples exist in the thread.
Training data, ownership, and future of writing
- One camp thinks the key future status signal is being “wired into the LLMs”; others worry that works become tiny, uncompensated drops in an ever-growing ocean.
- Strong pushback on the idea that original authors were fairly paid: many training datasets appear to include pirated or scraped works without consent or compensation.
- Some fear a world where people mostly read digests, not originals; others see LLMs as tools that amplify experts but degrade the lower end of writing and reviewing.