2025-02-03

Introducing deep research

Competitive positioning & “copying” debate

Many see Deep Research as OpenAI’s response to DeepSeek and Google’s Gemini “Deep Research,” with some arguing the name and timing are meant to muddy SEO and narrative.
Others stress it’s closer to Google’s product (long-running agent that searches, calls tools, and synthesizes a report) than to “open-weight” models like DeepSeek or Llama.
Some commenters claim this is just what Perplexity, You.com, Kagi lenses, or simple “Bing + LLM” agents already do; others argue the non-trivial part is reliability at scale, not the loop itself.

IP, fair use, and “stealing from thieves”

One line of argument: OpenAI scraped copyrighted web content, so they have no moral high ground if their own outputs or APIs are mined by competitors.
Counter-argument: web scraping for training may be protected by fair use, whereas violating OpenAI’s terms to train DeepSeek is framed as a contract and trade-secret issue.
There’s disagreement over whether ToS violations are “illegal,” and whether non-human-generated outputs can be “intellectual property” at all.

Models, benchmarks, and technical questions

Deep Research is described as powered by a specialized upcoming o3 variant, optimized for browsing and data analysis; only o3‑mini is publicly available.
Benchmarks (e.g., ~26.6% on Humanity’s Last Exam, ~72% on GAIA) impress some, but others note that 20% pass rate on internal “expert” tasks sounds like “mostly wrong,” with examples ranging from deep category theory to tricky fact-chains.
Debate over how much gains come from better reasoning vs. simple access to tools/web; some speculate multi-model orchestration, others say we’ve seen little evidence of that in current frontends.

Accuracy, hallucinations, and verification burden

OpenAI’s own limitations section (hallucinations, poor confidence calibration, difficulty judging authority) is repeatedly cited as a core problem.
Critics argue that for any task where correctness matters, you must re-do enough verification that time savings may evaporate; they view this as “slop generators” for slide decks and corporate box-ticking.
Supporters respond that:
- These tasks are genuinely hard (often beyond typical human expertise).
- Doing a day’s research in 30 minutes, even if you spend another hour verifying, can be a net win.
- Many real-world uses tolerate some error or already involve imperfect human research.

Use cases, ethics, and impact on the web

Suggested uses: technical and legal research, academic surveys, sports analytics, industry and product analysis, and enterprise “deep search” over private corpora.
Concern that these tools “exploit” open-knowledge creators and CC BY‑NC content without compensation; defenders note humans already do this via search engines.
Worries that web content will be increasingly polluted by AI-generated text, making future research and RAG less trustworthy; some foresee an arms race over crawler blocking, paywalls, and bot evasion.

Access, pricing, and user impressions

Many Pro subscribers initially reported no access despite the announcement, fueling claims of rushed, PR-driven launches and “existential crisis” narratives; others dismiss this as overblown.
Pricing ($200/month tier first) is widely criticized, especially compared with much cheaper DeepSeek APIs and Gemini’s inclusion of deep research on lower-cost plans.
Early hands-on reports: notably strong synthesis and breadth, but non-trivial factual mistakes even in modest biographies or industry overviews, reinforcing the “powerful but untrustworthy without checking” consensus.

Related topics