I got OpenTelemetry to work. But why was it so complicated?
Overall sentiment
- Many commenters find OpenTelemetry (OTel) powerful in theory but unnecessarily complex in practice.
- Strong split: some see it as the essential future standard for observability; others consider it over‑engineered “enterprise” machinery not yet ready for broad production use.
Complexity, ergonomics, and docs
- Common complaints: high conceptual “floor”, lots of abstraction/indirection, vague and inconsistent terminology (e.g., what “trace” means), and poor discoverability in docs.
- SDKs are seen as especially heavy in Go, JS, Rust, C++, with multiple packages/crates and sparse examples.
- Auto‑instrumentation works “magically” when supported, but often breaks with newer framework versions or nonstandard setups; fallback to manual wiring is painful.
- Docs tend to jump straight to full multi‑signal, multi‑service stacks (often on Kubernetes), instead of starting with “send one metric/trace.”
Maturity and performance
- Tracing is widely viewed as the most mature and compelling part of OTel; metrics and logs are described as young, rough, or “garbage” in some languages.
- Performance concerns: significantly higher CPU overhead compared to StatsD/Prometheus in some Node/JS setups; agents for proprietary APMs can introduce latency and heisenbugs.
- Some report backend or vendor limits (e.g., span count caps, throttling), and note silent failures and unclear error feedback.
Vendors, lock‑in, and interoperability
- Goal: one open spec and data model so instrumentation isn’t tied to a single vendor SDK, making it easier to switch backends (Datadog, New Relic, Grafana, Honeycomb, AWS X-Ray, etc.).
- Skeptics argue this portability is overstated (likened to “standard SQL” portability) and that most teams have more urgent problems than hypothetical future migrations.
- Some vendors ingest OTel but still push proprietary agents/SDKs, maintaining lock‑in on the sending side.
Tooling, deployment, and local dev
- Collectors and full stacks (collector + Tempo/Jaeger + Prometheus/Grafana) are seen as heavy, especially for small apps and local development.
- Others highlight easier paths: single‑binary or single‑compose backends (e.g., all‑in‑one OTel backends, Jaeger, OpenObserve, SigNoz), operators for Kubernetes, and language‑specific scaffolding or frameworks (.NET Aspire, k8s operators, custom starter repos).
- A recurring pattern: people end up using OTel mainly for traces, keeping existing solutions for metrics/logs.
Alternatives and partial adoption
- Many suggest simpler stacks for small or brownfield systems: Prometheus for metrics, Tempo/Jaeger/Sentry/Skywalking for traces, Loki/syslog for logs, or even homegrown metrics via logs.
- Several report abandoning full OTel after struggling, but some are happy with a constrained subset: manual traces only, one backend, carefully‑chosen tooling.