GPT-5 Thinking in ChatGPT (a.k.a. Research Goblin) is good at search
Capabilities of GPT‑5 “Thinking” for Search
- Many commenters find GPT‑5 Thinking + web search markedly better than earlier ChatGPT search: runs multiple queries, evaluates sources, continues when results look weak, often surfaces niche docs (e.g., product datasheets, planning applications, obscure trivia).
- Seen as ideal for “mild curiosities” and multi-step lookups users wouldn’t manually research, and for stitching together scattered information (e.g., podcast revenue, floor plans, car troubleshooting, book influences).
- Several say it’s more useful than OpenAI’s own Deep Research for many tasks, and competitive or better than Gemini’s Deep Research in quality, though slower.
Comparisons with Traditional Search & Other LLMs
- Some experiments comparing GPT‑5 Thinking vs Google (often with
udm=14) show:- Simple, factoid-like tasks are faster and perfectly adequate with manual Google + Wikipedia or Google Lens.
- For harder, multi-hop or messy queries, GPT‑5 can reduce user effort by aggregating and cross-referencing.
- Still concerns that LLMs often summarize “top‑N” search results and repeat marketing or forum speculation; quality strongly tied to web SEO.
- Mixed views on competitors: Gemini Deep Research praised for car/technical work but criticized for boilerplate “consultant report” style and hallucinations; Kagi Assistant liked for filters and transparent citations; some miss “heavy” non-search models with richer internal knowledge.
Reliability, Hallucinations, and Limits
- Multiple reports of subtle errors: shallow Wikipedia-like answers, missed primary sources in historical topics, wrong or fabricated details despite authoritative sources being online.
- OCR and image understanding: GPT‑5 often hallucinates text/manufacturers in invoices; Gemini 2.5 is said to be much stronger on images and OCR.
- Users emphasize verifying links, pushing models to compare/contrast evidence, and arguing back to expose weaknesses; some note models will agree with almost any asserted “truth” if steered.
Pedagogy, Cheating, and Skills
- Educators worry about student reliance on such tools; suggestions include:
- Socratic questioning to force students to explain and critique AI‑derived answers.
- Assignments that require showing reasoning, not just polished output.
- Some fear research skills and patience for “manual grind” will atrophy; others argue AI lets them be more ambitious and curious overall.
Meta: Article, Hype, and HN Dynamics
- Reactions to the article itself are split:
- Supporters appreciate everyday, “non‑heroic” examples and the “Research Goblin” framing as honest, evolutionary progress.
- Critics see it as overlong, anecdotal, and breathless for something many already do with LLMs; some complain about reposts and personality-driven upvoting.
- Broader unease about energy/token costs of “unreasonable” deep searches and about calling these features “research” rather than assisted evidence gathering.