2025-09-06

GPT-5 Thinking in ChatGPT (a.k.a. Research Goblin) is good at search

Capabilities of GPT‑5 “Thinking” for Search

Many commenters find GPT‑5 Thinking + web search markedly better than earlier ChatGPT search: runs multiple queries, evaluates sources, continues when results look weak, often surfaces niche docs (e.g., product datasheets, planning applications, obscure trivia).
Seen as ideal for “mild curiosities” and multi-step lookups users wouldn’t manually research, and for stitching together scattered information (e.g., podcast revenue, floor plans, car troubleshooting, book influences).
Several say it’s more useful than OpenAI’s own Deep Research for many tasks, and competitive or better than Gemini’s Deep Research in quality, though slower.

Comparisons with Traditional Search & Other LLMs

Some experiments comparing GPT‑5 Thinking vs Google (often with udm=14) show:
- Simple, factoid-like tasks are faster and perfectly adequate with manual Google + Wikipedia or Google Lens.
- For harder, multi-hop or messy queries, GPT‑5 can reduce user effort by aggregating and cross-referencing.
Still concerns that LLMs often summarize “top‑N” search results and repeat marketing or forum speculation; quality strongly tied to web SEO.
Mixed views on competitors: Gemini Deep Research praised for car/technical work but criticized for boilerplate “consultant report” style and hallucinations; Kagi Assistant liked for filters and transparent citations; some miss “heavy” non-search models with richer internal knowledge.

Reliability, Hallucinations, and Limits

Multiple reports of subtle errors: shallow Wikipedia-like answers, missed primary sources in historical topics, wrong or fabricated details despite authoritative sources being online.
OCR and image understanding: GPT‑5 often hallucinates text/manufacturers in invoices; Gemini 2.5 is said to be much stronger on images and OCR.
Users emphasize verifying links, pushing models to compare/contrast evidence, and arguing back to expose weaknesses; some note models will agree with almost any asserted “truth” if steered.

Pedagogy, Cheating, and Skills

Educators worry about student reliance on such tools; suggestions include:
- Socratic questioning to force students to explain and critique AI‑derived answers.
- Assignments that require showing reasoning, not just polished output.
Some fear research skills and patience for “manual grind” will atrophy; others argue AI lets them be more ambitious and curious overall.

Meta: Article, Hype, and HN Dynamics

Reactions to the article itself are split:
- Supporters appreciate everyday, “non‑heroic” examples and the “Research Goblin” framing as honest, evolutionary progress.
- Critics see it as overlong, anecdotal, and breathless for something many already do with LLMs; some complain about reposts and personality-driven upvoting.
Broader unease about energy/token costs of “unreasonable” deep searches and about calling these features “research” rather than assisted evidence gathering.

Related topics