AI assisted search-based research works now
Recent improvements in AI-assisted research
- Several commenters report a clear step up from newer models (o3/o4‑mini, Gemini 2.5 Pro) for:
- Multi-step, search-backed “deep research”
- Reasoning over long contexts (e.g., understanding/upgrading large codebases)
- Automatically adapting to API/package breaking changes by reading docs and release notes
- Deep research agents can now reliably:
- Run many queries, vary keywords, and aggregate sources better than most humans would
- Find obscure local news / archival material that users struggle to surface via manual search
- Perform investigative-style tasks (e.g., geolocating photos, proposing new lines of inquiry)
Limits, failure modes, and verification
- Strong criticism around:
- “Tunnel vision” and overconfidence: agents often don’t adjust goals when new constraints appear (e.g., insurance coverage, availability).
- Inability to handle precise counting/aggregation tasks (NFL roster example) even when a human could script it in an hour.
- Weak performance on niche product searches and local service comparisons.
- Many argue LLMs are “narrative tools” whose value depends on:
- Domain experts verifying outputs via testing, replication, or cross-checking
- Good prompts plus strong habits in manual/automated testing and code review
- Concern about “skill drift”: experts relying on lossy summaries instead of primary sources.
Data, paywalls, and business models
- High-value research data in many fields remains behind paywalls (journals, professional archives, industry databases).
- Expectation that a major business model will be selling LLM access on top of these archives.
- arXiv is useful but limited to a few disciplines; many foundational works remain paywalled.
- Distinction raised between:
- Deep Research (iterated search + tool use over unstructured web)
- Deep Analytics (database-style pipelines for exact counts and exhaustive queries).
Tools and integration
- Notable tools mentioned: Kagi Assistant, Gemini Deep Research, OpenAI’s Deep Research, Perplexity, you.com, Grok, GPT Researcher, custom agentic workflows.
- Debate over specialized formats/protocols (MCP, context7-style “devdocs for LLMs”) vs just writing good human-readable docs.
- Some have built multi-model, domain-specific research agents that outperform generic systems in their niche.
Search, web economics, and trust
- Many report Google usage dropping in favor of LLM search; Google’s AI Overviews are widely criticized as unreliable summaries of SEO spam.
- Concern that ad-driven AI answers will become inseparable from genuine information.
- Mixed views on trust:
- For programming and some diagnostics, users find LLMs already as good as or better than generic professionals.
- For health, law, and history, commenters warn they are “fancy snake oil” without strong human verification.
- Ethical worries about AI’s main “real-world” deployments (warfare, surveillance) versus more benign productivity uses.