Debugging Hetzner: Uncovering failures with powerstat, sensors, and dmidecode

Caution with new hardware and software

  • Many commenters endorse waiting months before adopting new server models or software releases, especially for production or stability-critical systems.
  • Suggested practices include staying 1–2 versions behind, or “burning in” new hardware for weeks under non-critical workloads to catch latent faults.

Hetzner motherboard failures and reliability

  • The thread confirms widespread issues with certain Hetzner AX-series servers (AX42/52/102/162) due to faulty motherboards; Hetzner is running a large-scale replacement program.
  • Several users report months-delayed, hard-to-diagnose crashes that disappeared only after mainboard swaps; diagnostics often reported “no issue.”
  • There’s some confusion over vendors (ASRock vs Dell board IDs), and the exact electrical/board-level root cause remains unclear.
  • Opinions on Hetzner’s reliability are split: some say “cheap and fine if you know what you’re doing,” others highlight recurrent hardware issues and lack of proactive monitoring.

Power limiting and potential hardware degradation

  • A central debate is whether datacenter power capping can damage components.
  • Multiple electronics engineers and power-management specialists argue that standard server power limiting (e.g., via CPU throttling at constant voltage) is safe and should extend lifetime due to lower heat.
  • Others speculate about failure modes involving undervoltage, current limiting, VRM stress, or reduced fan speeds causing localized hotspots, but evidence in the thread is sparse, and several participants explicitly say the article’s claim is not technically convincing.
  • Consensus: servers are normally power-limited by frequency/clock control, not by starving voltage; any “degradation from power caps” mechanism is unresolved and likely mischaracterized.

CPU governors, performance, and energy

  • Commenters warn that “powersave” or eco governors on rented servers can dramatically reduce peak performance and introduce latency jitter for short, bursty workloads.
  • Benchmarks shared in the thread show noticeable latency differences between powersave and performance modes, especially for high-QPS database workloads.
  • Others stress that power-saving modes can yield significant energy savings with minimal impact for non-latency-sensitive workloads and should be the default in many datacenters; customers, however, expect full performance when they pay for cores.

Monitoring and operational responsibility

  • Multiple anecdotes across providers (Hetzner, big clouds, Dell, others) describe fan failures, bogus PROCHOT signals, and random throttling or crashes that are hard to detect remotely.
  • Strong consensus that with “unmanaged” bare metal, customers must run their own robust monitoring, including hardware health, temperatures, clocks, and reboot causes; cheap pricing implicitly assumes this.

Ubicloud’s hosting strategy

  • Commenters discuss why Ubicloud rents from Hetzner instead of owning hardware: early-stage company, limited capital, and desire to focus on software rather than building a datacenter operation.
  • Several note that even owning hardware wouldn’t necessarily have avoided a bad motherboard batch; the advantage would mainly be more control, not immunity to component defects.