The Deep Research problem

Perceived usefulness & concrete use cases

  • Some users find Deep Research and similar tools nearly worthless in domains they know well (e.g., game dev, B2B sales modeling), calling results shallow, wrong, or spammy.
  • Others report strong practical value when:
    • Doing broad, annoying data collection (e.g., public salary comparisons across many municipalities).
    • Getting “good enough” qualitative overviews, structure, and first drafts to beat blank-page or analysis paralysis.
    • Using it as a “hazmat suit” for today’s SEO-poisoned web: it suffers from the same sources, but at least handles the clicking and skimming.

Accuracy, trust, and verification burden

  • Central tension: if you must verify every fact and number, does it really save time over doing your own research?
  • Users emphasize that LLMs:
    • Hallucinate, misquote, and even misread specific tables/PDFs.
    • Present partial or 60%-correct output as if it were 100% reliable.
  • For tabular or quantitative work, several commenters say they wouldn’t trust it at all; qualitative synthesis is seen as safer.

Comparison to humans, search, and “interns”

  • Supporters argue it’s still an upgrade over ad-driven, SEO-gamed web search and low-quality social media.
  • Many compare Deep Research to an unreliable intern: useful if you already know the domain and can critically review everything; dangerous if you don’t.
  • Debate over whether LLM “lies” are comparable to human error:
    • One side: humans misremember but don’t routinely invent entities the way LLMs do.
    • Other side: functionally, both produce wrong answers that must be checked.

Workflows, multi‑LLM strategies, and domain scoping

  • Several people describe elaborate multi-model workflows: run the same query across multiple LLMs, discard 60–75% “slop,” then have another model synthesize the remainder.
  • Others rely on tools that operate only over curated, user-provided sources to avoid SEO-driven junk.
  • Suggested mitigations: domain-specific profiles curated by experts; stronger source control; visible context and inline citations; explicit uncertainty ratings.

Marketing, terminology, and future trajectory

  • Strong criticism of branding this as “deep research” from an organization that positions itself as doing “research.”
  • Many accept that current systems are “intern level”: impressive but not trustworthy for high-stakes research, especially in academia or medicine.
  • Disagreement on future: some expect dramatic improvement akin to coding assistants; others argue structural limits (source quality, incentives, bias, SEO gaming) mean error-free “research” is unlikely.