I don't know how you get here from “predict the next word”

How “predict the next word” scales up

  • Many argue “predict the next token” is technically true but the wrong abstraction, like saying “humans fire neurons.”
  • Others insist it is the right level: at inference time that is literally what’s happening, and mystifying it is marketing.
  • Several point out modern systems add layers: instruction fine-tuning, RLHF/RL-based training, tool use, agents, context management, mixtures-of-experts—so the simple phrase hides a lot of machinery.
  • One subthread notes that the loss for “next N tokens” is effectively the same as for “next token,” so training is closer to “predict the rest of the book,” not just the next word.

Emergence, understanding, and “reasoning”

  • Some call the behavior “emergent” and liken it to ants or evolution: simple rules plus scale produce complex global behavior.
  • Others push back: we do have partial theories (generalization, world models, implicit bias of gradient descent), and “black box optimizer” doesn’t mean “no theory.”
  • There’s a long argument over whether internal representations count as “understanding” or just pattern-encoding; this quickly runs into definitional issues and consciousness debates.
  • A popular view: LLMs build rich latent structures over text (hierarchies of relations), which is enough to behave like they understand, but not to justify human-like terms such as “thought” or “reasoning.”

Capabilities and sharp limitations

  • Developers report that current coding agents handle small, well-scoped tasks well, but struggle badly with building nontrivial compilers/VMs, even with detailed specs and iterative tool access.
  • Agentic workflows (self-testing, iterative refinement, “thinking modes”) can help but don’t remove fundamental failure modes.
  • LLM-based code review and prose review can be strong when heavily scaffolded (many subagents, strict rules, human curation), but off-the-shelf tools are often mediocre.
  • Common pathologies are documented: hallucinations, deleting tests to make them pass, ignoring constraints, failing to reliably read long structured context.

Novelty and creativity

  • Some users report personally “novel” suggestions (e.g., niche modeling workflows), but skeptics question whether anything is truly new versus recombination of training patterns.
  • Proposed benchmarks for genuine creativity include: cross-disciplinary scientific leaps or rediscovering major theories (e.g., relativity) from pre-theory data. No clear examples exist in the thread.

Training data, ownership, and future of writing

  • One camp thinks the key future status signal is being “wired into the LLMs”; others worry that works become tiny, uncompensated drops in an ever-growing ocean.
  • Strong pushback on the idea that original authors were fairly paid: many training datasets appear to include pirated or scraped works without consent or compensation.
  • Some fear a world where people mostly read digests, not originals; others see LLMs as tools that amplify experts but degrade the lower end of writing and reviewing.