When ChatGPT broke the field of NLP: An oral history

What Is “Intelligence” Here?

  • Long thread debating whether LLMs qualify as intelligent vs merely advanced pattern matchers.
  • One view: iterated Markov/forecasting over tokens is enough to produce intelligent behavior; language structure “emerges.”
  • Counterview: LLMs only remix human-produced data and lack foundational mechanisms; “intelligence requires agency,” goal-directed behavior, and often embodiment.
  • Some argue machines may become indistinguishable from humans in chat yet still be “philosophical zombies” with no inner life. Others reply that we lack any scientific test for subjective experience, so this is mostly a philosophical or theological issue, not an engineering one.

Consciousness, Simulation, and Subjectivity

  • Analogy: simulating digestion vs actually digesting vs experiencing digestion; people warn against jumping from “simulates” to “does” to “has qualia.”
  • Several note we already assume other humans are conscious on faith; nothing like an empirical test exists, so claims about machine consciousness are speculative.

LLMs vs Humans: Capabilities and Limits

  • LLMs are criticized as “artificial stupidity”: fluent but prone to hallucinations, ignoring negations like “not,” brittle reasoning, no embodiment (e.g., can’t navigate like a cockroach).
  • Others counter that in many economic tasks they already outperform typical humans, exposing how little curiosity or rigor most people display.
  • Some urge avoiding anthropomorphism: the model’s “world” is just its training corpus, full of contradictions; its lies are misalignments between that statistical world and ours.

NLP as a Field: Obsolescence and the “Bitter Lesson”

  • Strong consensus that large LMs abruptly obsoleted decades of “traditional” NLP (parsing, WSD, symbolic semantics, phrase-based MT, etc.) as practical technologies.
  • Several researchers describe the experience of spending years on machine translation or structured parsing only to be leapfrogged by end-to-end transformers trained at massive scale.
  • The field feels “short-circuited”: many intermediate tasks turned out not to be necessary for building useful systems.

Linguistics vs Probabilistic Methods

  • Ongoing Chomsky-style vs Norvig-style tension: explicit grammar/structure vs big-data statistics.
  • Some argue classical linguistic models remain uniquely explanatory about human language, whereas LLMs are powerful but opaque.
  • Others reply that probabilistic methods “just work” and that insisting they aren’t “real” language understanding is an instance of the AI effect.
  • Debate over universal grammar and word order: one side stresses usage-based redundancy and bag-of-words success; the other insists edge cases and constituent structure show word order is deeply informative.

Survival of Traditional NLP Techniques

  • Traditional tools (e.g., dictionary-based sentiment like VADER) are still used because they’re cheap, transparent, and can’t hallucinate or be jailbroken.
  • Critics call reliance on such methods “malpractice” when more modern, lightweight transformers (e.g., DistilBERT) can run on CPUs and perform far better with modest cost.
  • Practical pattern some describe: use GPT-class models for prototyping, data extraction, and synthetic labeling; then train smaller specialized models for large-scale or cost-sensitive workloads.

Resource Barriers and Academic Fallout

  • Many academics note they simply cannot train or even run models near the frontier; serious NLP research now demands industrial-scale compute.
  • Some compare this to the LHC: the game is determined by who controls the giant machine. Toy models or prompt “studies” on closed APIs feel unscientific or second-class.
  • Tenure provides individual safety, but there is a palpable sense of grief: entire research agendas and skill sets feel like “zombie fields” that persist institutionally but are no longer central.

Future Directions and Hybrid Approaches

  • Several speculate that next steps will involve:
    • Integrating formal methods and proof systems to verify and “lock in” correct LLM-derived knowledge.
    • Extracting structured representations (logic, law corpora, compiler-like formalisms) on top of LLMs, not instead of them.
    • Introducing stronger inductive biases or linguistically informed structure to get smaller, more efficient models—especially where compute is scarce.

Broader Reflections on AI and Society

  • Commenters connect the upheaval in NLP to wider anxieties: AI undercutting “mental work” across professions, destabilizing education, and shifting scientific progress into corporations.
  • Some see humans as reluctant to accept superhuman intelligence; others note that belief in gods and spirits shows the opposite—people eagerly posit higher beings, real or imagined.
  • Overall tone mixes awe at “actually good NLP” finally arriving with bitterness that it came in a way that bypassed much of the field’s prior intellectual scaffolding.