Introducing deep research

Competitive positioning & “copying” debate

  • Many see Deep Research as OpenAI’s response to DeepSeek and Google’s Gemini “Deep Research,” with some arguing the name and timing are meant to muddy SEO and narrative.
  • Others stress it’s closer to Google’s product (long-running agent that searches, calls tools, and synthesizes a report) than to “open-weight” models like DeepSeek or Llama.
  • Some commenters claim this is just what Perplexity, You.com, Kagi lenses, or simple “Bing + LLM” agents already do; others argue the non-trivial part is reliability at scale, not the loop itself.

IP, fair use, and “stealing from thieves”

  • One line of argument: OpenAI scraped copyrighted web content, so they have no moral high ground if their own outputs or APIs are mined by competitors.
  • Counter-argument: web scraping for training may be protected by fair use, whereas violating OpenAI’s terms to train DeepSeek is framed as a contract and trade-secret issue.
  • There’s disagreement over whether ToS violations are “illegal,” and whether non-human-generated outputs can be “intellectual property” at all.

Models, benchmarks, and technical questions

  • Deep Research is described as powered by a specialized upcoming o3 variant, optimized for browsing and data analysis; only o3‑mini is publicly available.
  • Benchmarks (e.g., ~26.6% on Humanity’s Last Exam, ~72% on GAIA) impress some, but others note that 20% pass rate on internal “expert” tasks sounds like “mostly wrong,” with examples ranging from deep category theory to tricky fact-chains.
  • Debate over how much gains come from better reasoning vs. simple access to tools/web; some speculate multi-model orchestration, others say we’ve seen little evidence of that in current frontends.

Accuracy, hallucinations, and verification burden

  • OpenAI’s own limitations section (hallucinations, poor confidence calibration, difficulty judging authority) is repeatedly cited as a core problem.
  • Critics argue that for any task where correctness matters, you must re-do enough verification that time savings may evaporate; they view this as “slop generators” for slide decks and corporate box-ticking.
  • Supporters respond that:
    • These tasks are genuinely hard (often beyond typical human expertise).
    • Doing a day’s research in 30 minutes, even if you spend another hour verifying, can be a net win.
    • Many real-world uses tolerate some error or already involve imperfect human research.

Use cases, ethics, and impact on the web

  • Suggested uses: technical and legal research, academic surveys, sports analytics, industry and product analysis, and enterprise “deep search” over private corpora.
  • Concern that these tools “exploit” open-knowledge creators and CC BY‑NC content without compensation; defenders note humans already do this via search engines.
  • Worries that web content will be increasingly polluted by AI-generated text, making future research and RAG less trustworthy; some foresee an arms race over crawler blocking, paywalls, and bot evasion.

Access, pricing, and user impressions

  • Many Pro subscribers initially reported no access despite the announcement, fueling claims of rushed, PR-driven launches and “existential crisis” narratives; others dismiss this as overblown.
  • Pricing ($200/month tier first) is widely criticized, especially compared with much cheaper DeepSeek APIs and Gemini’s inclusion of deep research on lower-cost plans.
  • Early hands-on reports: notably strong synthesis and breadth, but non-trivial factual mistakes even in modest biographies or industry overviews, reinforcing the “powerful but untrustworthy without checking” consensus.