Has LLM killed traditional NLP?

What “traditional NLP” means in the thread

  • Several people note “traditional NLP” has already shifted: rule‑based → classical ML → deep learning.
  • Current debate is mostly “task‑specific smaller models / pipelines” vs “general LLM calls / prompting,” not rules vs AI.
  • Some argue that embedding models and BERT‑like models are themselves now “traditional” compared to chat-style LLMs.

Cost, scale, and latency trade‑offs

  • Strong disagreement around feasibility of using LLMs for millions of classifications.
  • One side: LLMs are too slow and expensive at 10M+ items; small bespoke models or classic ML (Naive Bayes, SVM, DistilBERT) are far more efficient and easier to scale to sub‑ms latency.
  • Other side: with current prices, a 10M‑item binary classification can cost single‑ or low double‑digit dollars and run fast with batching/parallelism or cheap cloud APIs; hiring specialist NLP engineers can cost more than LLM usage.
  • Some emphasize evolving hardware, quantization, and falling inference costs; others stress energy/carbon cost and the risk of building on tech that may be obsolete soon.

Where LLMs are favored

  • Rapid prototyping and “casual” NLP: classification, intent extraction, and data labeling without learning ML toolchains.
  • Replacing older dialog/NLU stacks (e.g., call centers and Lex‑style intent systems) with prompt‑based flows.
  • Generating training data and bootstrapping smaller local models.
  • Semantic chunking, clustering, and labeling representative documents for RAG or categorization.

Where traditional NLP still strong

  • Massive batch workloads needing extreme throughput and low cost/latency.
  • Tasks like NER/relationship extraction and specialized models (e.g., GLiNER) that are lighter, structured, and competitive or better on quality.
  • Highly deterministic or formatting‑sensitive tasks (e.g., quote curling) where LLMs still make subtle errors.
  • Similarity search over large corpora where embeddings + simple classifiers are standard.

Quality, validation, and reliability

  • Evaluation methods (labeled test sets, precision/recall, sampling) are similar for LLM and non‑LLM models.
  • Concerns about hallucinations, difficulty enforcing strict structure, and error tails for high‑stakes tasks.
  • Some see LLMs eventually dominating most NLP; others expect a long‑term coexistence, with traditional methods becoming more niche but still important.