Introducing deep research
Competitive positioning & “copying” debate
- Many see Deep Research as OpenAI’s response to DeepSeek and Google’s Gemini “Deep Research,” with some arguing the name and timing are meant to muddy SEO and narrative.
- Others stress it’s closer to Google’s product (long-running agent that searches, calls tools, and synthesizes a report) than to “open-weight” models like DeepSeek or Llama.
- Some commenters claim this is just what Perplexity, You.com, Kagi lenses, or simple “Bing + LLM” agents already do; others argue the non-trivial part is reliability at scale, not the loop itself.
IP, fair use, and “stealing from thieves”
- One line of argument: OpenAI scraped copyrighted web content, so they have no moral high ground if their own outputs or APIs are mined by competitors.
- Counter-argument: web scraping for training may be protected by fair use, whereas violating OpenAI’s terms to train DeepSeek is framed as a contract and trade-secret issue.
- There’s disagreement over whether ToS violations are “illegal,” and whether non-human-generated outputs can be “intellectual property” at all.
Models, benchmarks, and technical questions
- Deep Research is described as powered by a specialized upcoming o3 variant, optimized for browsing and data analysis; only o3‑mini is publicly available.
- Benchmarks (e.g., ~26.6% on Humanity’s Last Exam, ~72% on GAIA) impress some, but others note that 20% pass rate on internal “expert” tasks sounds like “mostly wrong,” with examples ranging from deep category theory to tricky fact-chains.
- Debate over how much gains come from better reasoning vs. simple access to tools/web; some speculate multi-model orchestration, others say we’ve seen little evidence of that in current frontends.
Accuracy, hallucinations, and verification burden
- OpenAI’s own limitations section (hallucinations, poor confidence calibration, difficulty judging authority) is repeatedly cited as a core problem.
- Critics argue that for any task where correctness matters, you must re-do enough verification that time savings may evaporate; they view this as “slop generators” for slide decks and corporate box-ticking.
- Supporters respond that:
- These tasks are genuinely hard (often beyond typical human expertise).
- Doing a day’s research in 30 minutes, even if you spend another hour verifying, can be a net win.
- Many real-world uses tolerate some error or already involve imperfect human research.
Use cases, ethics, and impact on the web
- Suggested uses: technical and legal research, academic surveys, sports analytics, industry and product analysis, and enterprise “deep search” over private corpora.
- Concern that these tools “exploit” open-knowledge creators and CC BY‑NC content without compensation; defenders note humans already do this via search engines.
- Worries that web content will be increasingly polluted by AI-generated text, making future research and RAG less trustworthy; some foresee an arms race over crawler blocking, paywalls, and bot evasion.
Access, pricing, and user impressions
- Many Pro subscribers initially reported no access despite the announcement, fueling claims of rushed, PR-driven launches and “existential crisis” narratives; others dismiss this as overblown.
- Pricing ($200/month tier first) is widely criticized, especially compared with much cheaper DeepSeek APIs and Gemini’s inclusion of deep research on lower-cost plans.
- Early hands-on reports: notably strong synthesis and breadth, but non-trivial factual mistakes even in modest biographies or industry overviews, reinforcing the “powerful but untrustworthy without checking” consensus.