My 71 TiB ZFS NAS After 10 Years and Zero Drive Failures
Drive longevity & power‑cycling
- Thread debates whether powering disks off extends life or increases risk.
- Some argue continuous running avoids wear from start/stop cycles, stiction, bearing issues, and inrush current.
- Others note many consumer/NAS drives already spin down frequently and are rated for large load/unload counts; for homelabs electricity savings may outweigh marginal wear.
- Several anecdotes of:
- Old “stiction” problems and drives that die after sitting powered off for years.
- Bearings failing more on always‑on systems vs rarely on systems that spin down.
- Statistical back‑of‑envelope using Backblaze AFRs suggests 24 drives lasting 10 years without failure is “lucky but not extraordinary,” especially once early failures are past.
Use cases for large home storage
- Common uses: media libraries (Plex/Jellyfin), photography/video (terabytes per project), ML datasets and models, torrents, Docker, personal archiving of web content, social media art, and conference talks.
- Some systems are mostly cold storage: backups or archives powered on only for sync or access.
ZFS, data integrity & ECC
- Many emphasize ZFS scrubs with block‑level checksums as key for detecting bit rot; scrubs are easy to schedule.
- ZFS checksums are per record/block, not file‑level cryptographic hashes; some layer file hashes on top.
- ECC RAM is repeatedly described as important for serious data integrity; others note ECC can be hard/expensive to deploy on consumer hardware.
- Some have personal horror stories of silent corruption on non‑checksummed filesystems, motivating ZFS.
RAID levels, mirrors, and backups
- Strong reminder: RAID/ZFS ≠ backup. Still need offline/air‑gapped or off‑site copies to handle user error, ransomware, or catastrophic failures.
- Several argue parity RAID (RAID5/6, RAIDZ) is overused at home:
- Slow, risky rebuilds on large drives; correlated failures in same‑batch disks.
- Mirrored vdevs or simple volumes plus good backups are seen as simpler, safer, and more expandable.
- Others defend RAID6/RAIDZ2 for larger arrays, but stress drive diversity and rotation.
Power, noise, cooling, and UPS
- Power‑off strategy can save thousands in electricity over a decade for a 200W‑idle NAS, especially in high‑tariff regions.
- Large, slow fans and good fan control (PID loops) significantly reduce noise and fan power draw.
- UPSes are valued not just for clean shutdowns but for smoothing brownouts and spikes; some consider skipping a UPS an unjustified risk, others accept it for home use.
- Offline powered‑down backups are also used as ransomware protection.
Filesystem alternatives & experimental tech
- btrfs: mixed reputation; some report past data loss, others long‑term stable use when avoiding its RAID layer and using only snapshots/compression/checksums.
- bcachefs: seen as promising (checksums, flexible caching) but currently marked experimental; kernel maintainer concerns and early breakages make people cautious about production data.
- General sentiment: for long‑lived important data, ZFS (or at least a mature checksumming FS) on well‑understood hardware is still the conservative choice.