AI assisted search-based research works now

Recent improvements in AI-assisted research

  • Several commenters report a clear step up from newer models (o3/o4‑mini, Gemini 2.5 Pro) for:
    • Multi-step, search-backed “deep research”
    • Reasoning over long contexts (e.g., understanding/upgrading large codebases)
    • Automatically adapting to API/package breaking changes by reading docs and release notes
  • Deep research agents can now reliably:
    • Run many queries, vary keywords, and aggregate sources better than most humans would
    • Find obscure local news / archival material that users struggle to surface via manual search
    • Perform investigative-style tasks (e.g., geolocating photos, proposing new lines of inquiry)

Limits, failure modes, and verification

  • Strong criticism around:
    • “Tunnel vision” and overconfidence: agents often don’t adjust goals when new constraints appear (e.g., insurance coverage, availability).
    • Inability to handle precise counting/aggregation tasks (NFL roster example) even when a human could script it in an hour.
    • Weak performance on niche product searches and local service comparisons.
  • Many argue LLMs are “narrative tools” whose value depends on:
    • Domain experts verifying outputs via testing, replication, or cross-checking
    • Good prompts plus strong habits in manual/automated testing and code review
  • Concern about “skill drift”: experts relying on lossy summaries instead of primary sources.

Data, paywalls, and business models

  • High-value research data in many fields remains behind paywalls (journals, professional archives, industry databases).
  • Expectation that a major business model will be selling LLM access on top of these archives.
  • arXiv is useful but limited to a few disciplines; many foundational works remain paywalled.
  • Distinction raised between:
    • Deep Research (iterated search + tool use over unstructured web)
    • Deep Analytics (database-style pipelines for exact counts and exhaustive queries).

Tools and integration

  • Notable tools mentioned: Kagi Assistant, Gemini Deep Research, OpenAI’s Deep Research, Perplexity, you.com, Grok, GPT Researcher, custom agentic workflows.
  • Debate over specialized formats/protocols (MCP, context7-style “devdocs for LLMs”) vs just writing good human-readable docs.
  • Some have built multi-model, domain-specific research agents that outperform generic systems in their niche.

Search, web economics, and trust

  • Many report Google usage dropping in favor of LLM search; Google’s AI Overviews are widely criticized as unreliable summaries of SEO spam.
  • Concern that ad-driven AI answers will become inseparable from genuine information.
  • Mixed views on trust:
    • For programming and some diagnostics, users find LLMs already as good as or better than generic professionals.
    • For health, law, and history, commenters warn they are “fancy snake oil” without strong human verification.
  • Ethical worries about AI’s main “real-world” deployments (warfare, surveillance) versus more benign productivity uses.