Log by time, not by count
Logs vs Metrics: Definitions and Roles
- Many commenters say the post is really about metrics, not logs: “logging by time” is essentially emitting metrics at a fixed interval.
- Common framing:
- Logs = discrete, human-readable events for diagnostics and postmortem analysis.
- Metrics = quantitative measurements over time, usually aggregated, used for dashboards, alerting, capacity planning.
- Several note that at scale logs should be structured (JSON/logfmt) so they can be filtered and partially treated like metrics, but the conceptual goals differ.
Time-Based vs Count-Based Logging
- Support for the post’s intuition: count-based “every N items” logs can overwhelm readers and backends; time-based summaries are often what humans actually want.
- Critiques: if you want periodic summaries, that’s a metric; use a metrics system instead of repurposing logs.
- Some point out a subtle bug: if your processing loop blocks when there’s no work, “log every T seconds” may not actually give a consistent log rate.
- Others argue time-based throttling is useful in multithreaded code because it avoids global contended counters.
Observability Practices and Tooling
- Strong SRE/ops sentiment:
- Logs are for “why,” metrics are for “is it healthy,” and tracing is for following a request across services.
- Do not rely on logs for health checks or alerting; use dedicated metrics (Prometheus, Datadog, etc.) and health endpoints.
- Modern observability stacks ingest structured events, then derive metrics and traces later (OpenTelemetry, columnar backends, “wide events”).
Volume, Sampling, and Aggregation
- At high volume you cannot log everything:
- Metrics aggregate (counts, sums, max, etc.).
- Logs are sampled or throttled (by time or probability).
- Traces are sampled at the “request/span” level.
- Several emphasize “filter and aggregate after ingestion, not in application code,” if storage allows.
Logging Best Practices and Pitfalls
- Recommended patterns: per-request IDs, log important branches and errors, dynamically adjustable log levels (even per-user), structured logs.
- Warnings against: using log search as a metrics system, unbounded verbose logging, and treating log stream behavior as a production interface that’s hard to change later.
- Distinction between program logs, audit logs (e.g. flight data recorders), write-ahead logs, and event-sourcing streams is highlighted as often overlooked.