2025-01-15

Has LLM killed traditional NLP?

What “traditional NLP” means in the thread

Several people note “traditional NLP” has already shifted: rule‑based → classical ML → deep learning.
Current debate is mostly “task‑specific smaller models / pipelines” vs “general LLM calls / prompting,” not rules vs AI.
Some argue that embedding models and BERT‑like models are themselves now “traditional” compared to chat-style LLMs.

Cost, scale, and latency trade‑offs

Strong disagreement around feasibility of using LLMs for millions of classifications.
One side: LLMs are too slow and expensive at 10M+ items; small bespoke models or classic ML (Naive Bayes, SVM, DistilBERT) are far more efficient and easier to scale to sub‑ms latency.
Other side: with current prices, a 10M‑item binary classification can cost single‑ or low double‑digit dollars and run fast with batching/parallelism or cheap cloud APIs; hiring specialist NLP engineers can cost more than LLM usage.
Some emphasize evolving hardware, quantization, and falling inference costs; others stress energy/carbon cost and the risk of building on tech that may be obsolete soon.

Where LLMs are favored

Rapid prototyping and “casual” NLP: classification, intent extraction, and data labeling without learning ML toolchains.
Replacing older dialog/NLU stacks (e.g., call centers and Lex‑style intent systems) with prompt‑based flows.
Generating training data and bootstrapping smaller local models.
Semantic chunking, clustering, and labeling representative documents for RAG or categorization.

Where traditional NLP still strong

Massive batch workloads needing extreme throughput and low cost/latency.
Tasks like NER/relationship extraction and specialized models (e.g., GLiNER) that are lighter, structured, and competitive or better on quality.
Highly deterministic or formatting‑sensitive tasks (e.g., quote curling) where LLMs still make subtle errors.
Similarity search over large corpora where embeddings + simple classifiers are standard.

Quality, validation, and reliability

Evaluation methods (labeled test sets, precision/recall, sampling) are similar for LLM and non‑LLM models.
Concerns about hallucinations, difficulty enforcing strict structure, and error tails for high‑stakes tasks.
Some see LLMs eventually dominating most NLP; others expect a long‑term coexistence, with traditional methods becoming more niche but still important.

Related topics