When ChatGPT summarises, it does nothing of the kind
Nature of LLM “Summaries”
- Many commenters say current LLMs mostly “shorten” text, often missing critical or novel points, especially conclusions or minority arguments.
- Others report that for many articles, GPT‑4‑class models do capture their own perceived “main points,” highlighting that what counts as “key” is subjective.
- Some argue a summary’s goal is just to help decide whether to read the full text; others want summaries that can safely replace reading.
Prompting, Methodology, and System Design
- Several criticize the article’s lack of details: model version, prompt, number of runs, exact errors.
- Multiple people say “just call summarize()” is inadequate. They describe more elaborate pipelines:
- Chunking text, embedding + clustering, extracting key quotes, verifying against source, then having the LLM rewrite in prose.
- Multi-step prompts with explicit instructions to include niche or rarely mentioned points.
- API behavior and web UI helpers may differ; long-context usage degrades accuracy.
Use Cases, Reliability, and Benchmarks
- Experiences vary widely: some find LLMs excellent for condensing their own writing, grant applications, meeting notes, or HN threads; others find them frequently wrong or overconfident.
- Error tolerance is seen as use‑case dependent: acceptable for blogs or “fluff,” not for medical records or high‑stakes domains.
- Several call for objective summarization benchmarks; others note this is an active research area.
Context Windows, RAG, and Technical Limits
- Long context windows and sliding attention are blamed for “content drift” and skipped details; splitting into smaller overlapping chunks is often recommended.
- Opinions on RAG diverge: some call it overhyped and hallucination‑prone; others find simple vector search plus light LLM summarization effective.
AI Hype, Skepticism, and “Understanding”
- Thread reflects both strong skepticism (“toy,” “dangerous,” overhyped like metaverse/NFTs) and strong optimism (LLMs as major productivity tools, part of a broader ML trend).
- There is debate over whether LLMs genuinely “understand” text or are sophisticated pattern matchers; failures on math and niche tasks are cited against “understanding.”
- Trust is a recurring concern: if a summary must always be checked against the original, its practical value is questioned.