Is the doc bot docs, or not?

Reliability vs. Traditional Docs

  • Many comments argue that a “doc bot” isn’t documentation if it can hallucinate; incorrect docs are already bad, but non‑deterministic, sometimes-wrong answers are worse.
  • Others counter that written docs are often wrong, outdated, or misleading by omission; everything has an error rate, and faster-but-imperfect answers can be acceptable if users are expected to test them.
  • A recurring theme: doc bots should clearly be framed as helpers or community-like Q&A, not as canonical documentation.

Shopify Example & Practical Frustrations

  • The original case (Shopify Collective tag detection in Liquid emails) is seen as exactly the kind of subtle, timing- and implementation-dependent behavior where LLMs struggle without real platform experience.
  • Some dispute that the showcased “wrong” answer was actually wrong, suggesting it depended on tagging specifics and that the author may have confirmation bias.
  • Others emphasize: if two users get different answers to the same official question, the tool fails as “docs.”
  • There’s frustration with needing real credit cards for tests and with fragmented or sales-oriented docs vs. technical docs.

RAG, Context, and Architecture

  • Multiple participants explain that building robust docs bots (RAG systems) is harder than it looks: chunking, retrieval, GraphQL schemas, and context size all affect quality.
  • Debate over “just stuff all docs in the context” vs. selective retrieval: the former is simpler but expensive, doesn’t scale, and degrades accuracy; the latter is cheaper but complex to engineer.
  • Some describe advanced setups: knowledge graphs, multi-agent summarization, and document summaries to improve retrieval.

Non‑Determinism, “I Don’t Know,” and Trust

  • Long subthread on whether “non-determinism” is the real issue; some say the core problem is probabilistic/chaotic behavior and sensitivity to prompts, not strict CS definitions.
  • Humans also give wrong answers, but they can admit ignorance and escalate; most deployed LLM bots are tuned to always answer, which undermines trust.
  • A few report success with prompts and systems where models do say “I don’t know” when the docs don’t cover something, but others haven’t seen this reliably in production tools.

Role of Doc Bots Today

  • Many see doc bots as akin to asking a semi-informed colleague: sometimes helpful, sometimes confidently wrong, never authoritative.
  • Some teams report evaluation runs where ~60% of answers are good, ~20% neutral, and ~20% actively harmful—insufficient to expose directly to customers.
  • Suggested better uses: surfacing feature gaps (questions the bot can’t answer), augmenting human support, or powering smarter search, not replacing official documentation.