2025-06-11

It's the end of observability as we know it (and I feel fine)

Cost, Data Volume, and Architecture

Many see the proposed “AI-first” observability model as a cost amplifier: unified sub‑second stores, anomaly detection, and constant analysis imply huge telemetry and compute bills.
Several argue that LLMs don’t remove the need for graphs, alerts, or careful logging strategy; they just sit on top of an already-expensive stack.
There’s concern that using LLMs to proactively scan all telemetry for issues would be far more expensive than traditional threshold-based alerting.

What LLMs Actually Add

Strong support for using LLMs to accelerate root cause analysis once you know something is wrong: given a starting signal (alert, spike), an agent can traverse logs/metrics/traces, test hypotheses, and propose narratives.
Others note the blog’s demo was closer to “LLM as smart pivoting UI” than a full agentic workflow; the human still framed the question and knew where to look.
Some see LLMs as valuable integrators across disparate tools (traces, logs, metrics) without deep product-level integration.

Skepticism, Hype, and Marketing

Many call the post a thin or not-at-all-veiled product pitch, with grandiose language (“end of observability”, “speed of AI”) that doesn’t match the incremental reality.
Critics stress that anomaly detection and RCA remain intrinsically hard; framing AI as paradigm-ending is seen as overselling.

Reliability, Determinism, and Correlation Traps

A recurring theme: nondeterministic, occasionally-confidently-wrong systems are dangerous for RCA. People want tools that surface hypotheses but that also quantify uncertainty or actively try to disprove themselves.
Several warn about spurious correlations in time-series and “AI that correlates everything with everything”; statistical metrics (r², p‑values) are easily abused by both humans and LLMs.

Skills, Responsibility, and Over-Reliance

Debate over whether AI will help people learn or encourage shallow, copy‑paste understanding; concern that less-expert staff plus AI will be “good enough” for management.
Strong view that humans must remain accountable for decisions; AI is best treated like a powerful but error-prone intern.
Some see real upside for small SRE/IT teams and SMBs: LLMs can lower the bar to “big-league” observability setups and faster incident triage, without staffing large expert teams.

Tooling and UX Frustrations

Multiple comments say if you need an LLM to pivot between traces, logs, and metrics, the observability product probably has UX/feature gaps.
Others counter that most observability UIs are bad enough that a natural-language layer is a net win, even if it doesn’t replace graphs.

Related topics