GPT-5 Thinking in ChatGPT (a.k.a. Research Goblin) is good at search

Capabilities of GPT‑5 “Thinking” for Search

  • Many commenters find GPT‑5 Thinking + web search markedly better than earlier ChatGPT search: runs multiple queries, evaluates sources, continues when results look weak, often surfaces niche docs (e.g., product datasheets, planning applications, obscure trivia).
  • Seen as ideal for “mild curiosities” and multi-step lookups users wouldn’t manually research, and for stitching together scattered information (e.g., podcast revenue, floor plans, car troubleshooting, book influences).
  • Several say it’s more useful than OpenAI’s own Deep Research for many tasks, and competitive or better than Gemini’s Deep Research in quality, though slower.

Comparisons with Traditional Search & Other LLMs

  • Some experiments comparing GPT‑5 Thinking vs Google (often with udm=14) show:
    • Simple, factoid-like tasks are faster and perfectly adequate with manual Google + Wikipedia or Google Lens.
    • For harder, multi-hop or messy queries, GPT‑5 can reduce user effort by aggregating and cross-referencing.
  • Still concerns that LLMs often summarize “top‑N” search results and repeat marketing or forum speculation; quality strongly tied to web SEO.
  • Mixed views on competitors: Gemini Deep Research praised for car/technical work but criticized for boilerplate “consultant report” style and hallucinations; Kagi Assistant liked for filters and transparent citations; some miss “heavy” non-search models with richer internal knowledge.

Reliability, Hallucinations, and Limits

  • Multiple reports of subtle errors: shallow Wikipedia-like answers, missed primary sources in historical topics, wrong or fabricated details despite authoritative sources being online.
  • OCR and image understanding: GPT‑5 often hallucinates text/manufacturers in invoices; Gemini 2.5 is said to be much stronger on images and OCR.
  • Users emphasize verifying links, pushing models to compare/contrast evidence, and arguing back to expose weaknesses; some note models will agree with almost any asserted “truth” if steered.

Pedagogy, Cheating, and Skills

  • Educators worry about student reliance on such tools; suggestions include:
    • Socratic questioning to force students to explain and critique AI‑derived answers.
    • Assignments that require showing reasoning, not just polished output.
  • Some fear research skills and patience for “manual grind” will atrophy; others argue AI lets them be more ambitious and curious overall.

Meta: Article, Hype, and HN Dynamics

  • Reactions to the article itself are split:
    • Supporters appreciate everyday, “non‑heroic” examples and the “Research Goblin” framing as honest, evolutionary progress.
    • Critics see it as overlong, anecdotal, and breathless for something many already do with LLMs; some complain about reposts and personality-driven upvoting.
  • Broader unease about energy/token costs of “unreasonable” deep searches and about calling these features “research” rather than assisted evidence gathering.