My 71 TiB ZFS NAS After 10 Years and Zero Drive Failures

Drive longevity & power‑cycling

  • Thread debates whether powering disks off extends life or increases risk.
  • Some argue continuous running avoids wear from start/stop cycles, stiction, bearing issues, and inrush current.
  • Others note many consumer/NAS drives already spin down frequently and are rated for large load/unload counts; for homelabs electricity savings may outweigh marginal wear.
  • Several anecdotes of:
    • Old “stiction” problems and drives that die after sitting powered off for years.
    • Bearings failing more on always‑on systems vs rarely on systems that spin down.
  • Statistical back‑of‑envelope using Backblaze AFRs suggests 24 drives lasting 10 years without failure is “lucky but not extraordinary,” especially once early failures are past.

Use cases for large home storage

  • Common uses: media libraries (Plex/Jellyfin), photography/video (terabytes per project), ML datasets and models, torrents, Docker, personal archiving of web content, social media art, and conference talks.
  • Some systems are mostly cold storage: backups or archives powered on only for sync or access.

ZFS, data integrity & ECC

  • Many emphasize ZFS scrubs with block‑level checksums as key for detecting bit rot; scrubs are easy to schedule.
  • ZFS checksums are per record/block, not file‑level cryptographic hashes; some layer file hashes on top.
  • ECC RAM is repeatedly described as important for serious data integrity; others note ECC can be hard/expensive to deploy on consumer hardware.
  • Some have personal horror stories of silent corruption on non‑checksummed filesystems, motivating ZFS.

RAID levels, mirrors, and backups

  • Strong reminder: RAID/ZFS ≠ backup. Still need offline/air‑gapped or off‑site copies to handle user error, ransomware, or catastrophic failures.
  • Several argue parity RAID (RAID5/6, RAIDZ) is overused at home:
    • Slow, risky rebuilds on large drives; correlated failures in same‑batch disks.
    • Mirrored vdevs or simple volumes plus good backups are seen as simpler, safer, and more expandable.
  • Others defend RAID6/RAIDZ2 for larger arrays, but stress drive diversity and rotation.

Power, noise, cooling, and UPS

  • Power‑off strategy can save thousands in electricity over a decade for a 200W‑idle NAS, especially in high‑tariff regions.
  • Large, slow fans and good fan control (PID loops) significantly reduce noise and fan power draw.
  • UPSes are valued not just for clean shutdowns but for smoothing brownouts and spikes; some consider skipping a UPS an unjustified risk, others accept it for home use.
  • Offline powered‑down backups are also used as ransomware protection.

Filesystem alternatives & experimental tech

  • btrfs: mixed reputation; some report past data loss, others long‑term stable use when avoiding its RAID layer and using only snapshots/compression/checksums.
  • bcachefs: seen as promising (checksums, flexible caching) but currently marked experimental; kernel maintainer concerns and early breakages make people cautious about production data.
  • General sentiment: for long‑lived important data, ZFS (or at least a mature checksumming FS) on well‑understood hardware is still the conservative choice.