2025-07-21

Log by time, not by count

Logs vs Metrics: Definitions and Roles

Many commenters say the post is really about metrics, not logs: “logging by time” is essentially emitting metrics at a fixed interval.
Common framing:
- Logs = discrete, human-readable events for diagnostics and postmortem analysis.
- Metrics = quantitative measurements over time, usually aggregated, used for dashboards, alerting, capacity planning.
Several note that at scale logs should be structured (JSON/logfmt) so they can be filtered and partially treated like metrics, but the conceptual goals differ.

Time-Based vs Count-Based Logging

Support for the post’s intuition: count-based “every N items” logs can overwhelm readers and backends; time-based summaries are often what humans actually want.
Critiques: if you want periodic summaries, that’s a metric; use a metrics system instead of repurposing logs.
Some point out a subtle bug: if your processing loop blocks when there’s no work, “log every T seconds” may not actually give a consistent log rate.
Others argue time-based throttling is useful in multithreaded code because it avoids global contended counters.

Observability Practices and Tooling

Strong SRE/ops sentiment:
- Logs are for “why,” metrics are for “is it healthy,” and tracing is for following a request across services.
- Do not rely on logs for health checks or alerting; use dedicated metrics (Prometheus, Datadog, etc.) and health endpoints.
Modern observability stacks ingest structured events, then derive metrics and traces later (OpenTelemetry, columnar backends, “wide events”).

Volume, Sampling, and Aggregation

At high volume you cannot log everything:
- Metrics aggregate (counts, sums, max, etc.).
- Logs are sampled or throttled (by time or probability).
- Traces are sampled at the “request/span” level.
Several emphasize “filter and aggregate after ingestion, not in application code,” if storage allows.

Logging Best Practices and Pitfalls

Recommended patterns: per-request IDs, log important branches and errors, dynamically adjustable log levels (even per-user), structured logs.
Warnings against: using log search as a metrics system, unbounded verbose logging, and treating log stream behavior as a production interface that’s hard to change later.
Distinction between program logs, audit logs (e.g. flight data recorders), write-ahead logs, and event-sourcing streams is highlighted as often overlooked.

Related topics