When ChatGPT summarises, it does nothing of the kind

Nature of LLM “Summaries”

  • Many commenters say current LLMs mostly “shorten” text, often missing critical or novel points, especially conclusions or minority arguments.
  • Others report that for many articles, GPT‑4‑class models do capture their own perceived “main points,” highlighting that what counts as “key” is subjective.
  • Some argue a summary’s goal is just to help decide whether to read the full text; others want summaries that can safely replace reading.

Prompting, Methodology, and System Design

  • Several criticize the article’s lack of details: model version, prompt, number of runs, exact errors.
  • Multiple people say “just call summarize()” is inadequate. They describe more elaborate pipelines:
    • Chunking text, embedding + clustering, extracting key quotes, verifying against source, then having the LLM rewrite in prose.
    • Multi-step prompts with explicit instructions to include niche or rarely mentioned points.
  • API behavior and web UI helpers may differ; long-context usage degrades accuracy.

Use Cases, Reliability, and Benchmarks

  • Experiences vary widely: some find LLMs excellent for condensing their own writing, grant applications, meeting notes, or HN threads; others find them frequently wrong or overconfident.
  • Error tolerance is seen as use‑case dependent: acceptable for blogs or “fluff,” not for medical records or high‑stakes domains.
  • Several call for objective summarization benchmarks; others note this is an active research area.

Context Windows, RAG, and Technical Limits

  • Long context windows and sliding attention are blamed for “content drift” and skipped details; splitting into smaller overlapping chunks is often recommended.
  • Opinions on RAG diverge: some call it overhyped and hallucination‑prone; others find simple vector search plus light LLM summarization effective.

AI Hype, Skepticism, and “Understanding”

  • Thread reflects both strong skepticism (“toy,” “dangerous,” overhyped like metaverse/NFTs) and strong optimism (LLMs as major productivity tools, part of a broader ML trend).
  • There is debate over whether LLMs genuinely “understand” text or are sophisticated pattern matchers; failures on math and niche tasks are cited against “understanding.”
  • Trust is a recurring concern: if a summary must always be checked against the original, its practical value is questioned.