2025-02-21

The Deep Research problem

Perceived usefulness & concrete use cases

Some users find Deep Research and similar tools nearly worthless in domains they know well (e.g., game dev, B2B sales modeling), calling results shallow, wrong, or spammy.
Others report strong practical value when:
- Doing broad, annoying data collection (e.g., public salary comparisons across many municipalities).
- Getting “good enough” qualitative overviews, structure, and first drafts to beat blank-page or analysis paralysis.
- Using it as a “hazmat suit” for today’s SEO-poisoned web: it suffers from the same sources, but at least handles the clicking and skimming.

Accuracy, trust, and verification burden

Central tension: if you must verify every fact and number, does it really save time over doing your own research?
Users emphasize that LLMs:
- Hallucinate, misquote, and even misread specific tables/PDFs.
- Present partial or 60%-correct output as if it were 100% reliable.
For tabular or quantitative work, several commenters say they wouldn’t trust it at all; qualitative synthesis is seen as safer.

Comparison to humans, search, and “interns”

Supporters argue it’s still an upgrade over ad-driven, SEO-gamed web search and low-quality social media.
Many compare Deep Research to an unreliable intern: useful if you already know the domain and can critically review everything; dangerous if you don’t.
Debate over whether LLM “lies” are comparable to human error:
- One side: humans misremember but don’t routinely invent entities the way LLMs do.
- Other side: functionally, both produce wrong answers that must be checked.

Workflows, multi‑LLM strategies, and domain scoping

Several people describe elaborate multi-model workflows: run the same query across multiple LLMs, discard 60–75% “slop,” then have another model synthesize the remainder.
Others rely on tools that operate only over curated, user-provided sources to avoid SEO-driven junk.
Suggested mitigations: domain-specific profiles curated by experts; stronger source control; visible context and inline citations; explicit uncertainty ratings.

Marketing, terminology, and future trajectory

Strong criticism of branding this as “deep research” from an organization that positions itself as doing “research.”
Many accept that current systems are “intern level”: impressive but not trustworthy for high-stakes research, especially in academia or medicine.
Disagreement on future: some expect dramatic improvement akin to coding assistants; others argue structural limits (source quality, incentives, bias, SEO gaming) mean error-free “research” is unlikely.

Related topics