Has LLM killed traditional NLP?
What “traditional NLP” means in the thread
- Several people note “traditional NLP” has already shifted: rule‑based → classical ML → deep learning.
- Current debate is mostly “task‑specific smaller models / pipelines” vs “general LLM calls / prompting,” not rules vs AI.
- Some argue that embedding models and BERT‑like models are themselves now “traditional” compared to chat-style LLMs.
Cost, scale, and latency trade‑offs
- Strong disagreement around feasibility of using LLMs for millions of classifications.
- One side: LLMs are too slow and expensive at 10M+ items; small bespoke models or classic ML (Naive Bayes, SVM, DistilBERT) are far more efficient and easier to scale to sub‑ms latency.
- Other side: with current prices, a 10M‑item binary classification can cost single‑ or low double‑digit dollars and run fast with batching/parallelism or cheap cloud APIs; hiring specialist NLP engineers can cost more than LLM usage.
- Some emphasize evolving hardware, quantization, and falling inference costs; others stress energy/carbon cost and the risk of building on tech that may be obsolete soon.
Where LLMs are favored
- Rapid prototyping and “casual” NLP: classification, intent extraction, and data labeling without learning ML toolchains.
- Replacing older dialog/NLU stacks (e.g., call centers and Lex‑style intent systems) with prompt‑based flows.
- Generating training data and bootstrapping smaller local models.
- Semantic chunking, clustering, and labeling representative documents for RAG or categorization.
Where traditional NLP still strong
- Massive batch workloads needing extreme throughput and low cost/latency.
- Tasks like NER/relationship extraction and specialized models (e.g., GLiNER) that are lighter, structured, and competitive or better on quality.
- Highly deterministic or formatting‑sensitive tasks (e.g., quote curling) where LLMs still make subtle errors.
- Similarity search over large corpora where embeddings + simple classifiers are standard.
Quality, validation, and reliability
- Evaluation methods (labeled test sets, precision/recall, sampling) are similar for LLM and non‑LLM models.
- Concerns about hallucinations, difficulty enforcing strict structure, and error tails for high‑stakes tasks.
- Some see LLMs eventually dominating most NLP; others expect a long‑term coexistence, with traditional methods becoming more niche but still important.