The Deep Research problem
Perceived usefulness & concrete use cases
- Some users find Deep Research and similar tools nearly worthless in domains they know well (e.g., game dev, B2B sales modeling), calling results shallow, wrong, or spammy.
- Others report strong practical value when:
- Doing broad, annoying data collection (e.g., public salary comparisons across many municipalities).
- Getting “good enough” qualitative overviews, structure, and first drafts to beat blank-page or analysis paralysis.
- Using it as a “hazmat suit” for today’s SEO-poisoned web: it suffers from the same sources, but at least handles the clicking and skimming.
Accuracy, trust, and verification burden
- Central tension: if you must verify every fact and number, does it really save time over doing your own research?
- Users emphasize that LLMs:
- Hallucinate, misquote, and even misread specific tables/PDFs.
- Present partial or 60%-correct output as if it were 100% reliable.
- For tabular or quantitative work, several commenters say they wouldn’t trust it at all; qualitative synthesis is seen as safer.
Comparison to humans, search, and “interns”
- Supporters argue it’s still an upgrade over ad-driven, SEO-gamed web search and low-quality social media.
- Many compare Deep Research to an unreliable intern: useful if you already know the domain and can critically review everything; dangerous if you don’t.
- Debate over whether LLM “lies” are comparable to human error:
- One side: humans misremember but don’t routinely invent entities the way LLMs do.
- Other side: functionally, both produce wrong answers that must be checked.
Workflows, multi‑LLM strategies, and domain scoping
- Several people describe elaborate multi-model workflows: run the same query across multiple LLMs, discard 60–75% “slop,” then have another model synthesize the remainder.
- Others rely on tools that operate only over curated, user-provided sources to avoid SEO-driven junk.
- Suggested mitigations: domain-specific profiles curated by experts; stronger source control; visible context and inline citations; explicit uncertainty ratings.
Marketing, terminology, and future trajectory
- Strong criticism of branding this as “deep research” from an organization that positions itself as doing “research.”
- Many accept that current systems are “intern level”: impressive but not trustworthy for high-stakes research, especially in academia or medicine.
- Disagreement on future: some expect dramatic improvement akin to coding assistants; others argue structural limits (source quality, incentives, bias, SEO gaming) mean error-free “research” is unlikely.