2025-01-10

I got OpenTelemetry to work. But why was it so complicated?

Overall sentiment

Many commenters find OpenTelemetry (OTel) powerful in theory but unnecessarily complex in practice.
Strong split: some see it as the essential future standard for observability; others consider it over‑engineered “enterprise” machinery not yet ready for broad production use.

Complexity, ergonomics, and docs

Common complaints: high conceptual “floor”, lots of abstraction/indirection, vague and inconsistent terminology (e.g., what “trace” means), and poor discoverability in docs.
SDKs are seen as especially heavy in Go, JS, Rust, C++, with multiple packages/crates and sparse examples.
Auto‑instrumentation works “magically” when supported, but often breaks with newer framework versions or nonstandard setups; fallback to manual wiring is painful.
Docs tend to jump straight to full multi‑signal, multi‑service stacks (often on Kubernetes), instead of starting with “send one metric/trace.”

Maturity and performance

Tracing is widely viewed as the most mature and compelling part of OTel; metrics and logs are described as young, rough, or “garbage” in some languages.
Performance concerns: significantly higher CPU overhead compared to StatsD/Prometheus in some Node/JS setups; agents for proprietary APMs can introduce latency and heisenbugs.
Some report backend or vendor limits (e.g., span count caps, throttling), and note silent failures and unclear error feedback.

Vendors, lock‑in, and interoperability

Goal: one open spec and data model so instrumentation isn’t tied to a single vendor SDK, making it easier to switch backends (Datadog, New Relic, Grafana, Honeycomb, AWS X-Ray, etc.).
Skeptics argue this portability is overstated (likened to “standard SQL” portability) and that most teams have more urgent problems than hypothetical future migrations.
Some vendors ingest OTel but still push proprietary agents/SDKs, maintaining lock‑in on the sending side.

Tooling, deployment, and local dev

Collectors and full stacks (collector + Tempo/Jaeger + Prometheus/Grafana) are seen as heavy, especially for small apps and local development.
Others highlight easier paths: single‑binary or single‑compose backends (e.g., all‑in‑one OTel backends, Jaeger, OpenObserve, SigNoz), operators for Kubernetes, and language‑specific scaffolding or frameworks (.NET Aspire, k8s operators, custom starter repos).
A recurring pattern: people end up using OTel mainly for traces, keeping existing solutions for metrics/logs.

Alternatives and partial adoption

Many suggest simpler stacks for small or brownfield systems: Prometheus for metrics, Tempo/Jaeger/Sentry/Skywalking for traces, Loki/syslog for logs, or even homegrown metrics via logs.
Several report abandoning full OTel after struggling, but some are happy with a constrained subset: manual traces only, one backend, carefully‑chosen tooling.

Related topics