OpenZFS deduplication is good now and you shouldn't use it
Where ZFS dedup helps vs. where it doesn’t
- Strong wins reported for:
- Many similar VMs / templates on shared storage (classic enterprise use; also some home labs).
- Highly duplicated build inputs or archives (build pools, personal “dumping ground” archives, nix store, Flatpak/OSTree-like setups).
- Some users see ~3–8x space savings in these narrow workloads, sometimes making NVMe storage economically viable.
- Many commenters confirm that “general purpose” desktop/laptop or mixed file server workloads show little benefit.
- Logs and text usually benefit far more from compression than from dedup.
Cost, RAM, and performance concerns
- Traditional ZFS inline dedup requires a large in-RAM dedup table; widely cited rule of thumb: up to multiple GB RAM per TB of data.
- If the table spills to disk, performance can collapse “to nearly zero.”
- Every write/free triggers table lookups and updates, even when there is no duplicate, so random or mostly-unique data pays persistent overhead.
- Block-level, fixed-size dedup means partial overlaps or misaligned repeated assets are missed.
Desire for offline / lazy dedup
- Several people want “lazy” or scrub-time dedup to avoid write-path penalties.
- Others note this would require block pointer rewrite across snapshots, which ZFS’ Merkle-tree design effectively forbids.
- Workarounds discussed:
- Separate datasets: write to non-dedup dataset, later move to dedup-enabled one.
- Userspace “offline dedup” with hardlinks or reflinks (rdfind, jdupes, duperemove) once ZFS exposes the right syscalls.
- Planned/desired tools that scan for identical file ranges and convert them to cloned blocks.
Reflinks, block cloning, and alternatives
- Many argue modern block cloning / reflinks (BRT,
copy_file_range,cp --reflink=auto) provide most of the practical benefit:- Cheap, instantaneous “copies” when the system knows an operation is a copy (VM templates, file copies, containers, Flatpak).
- No global dedup table; overhead is proportional to actual clones.
- Consensus: enable ZFS compression almost everywhere; consider dedup only for very specific, proven-high-duplication workloads.
Enterprise arrays vs. filesystems
- Some report 3:1–6:1+ savings with enterprise arrays (Pure, Dell/EMC, Nimble, Windows server dedup).
- Others point out:
- Arrays often use smaller blocks, offline or background dedup, and different economics (power, rack space, controller cost).
- Filesystem-level inline dedup is harder to make generally cheap and safe.
Other themes
- Security: concern about cross-tenant information leaks via dedupe (timing/side channels), echoing prior RAM-dedup issues.
- Snapshots: dedup or clone changes don’t reclaim space until old snapshots referencing blocks are removed.
- Encryption: stacking ZFS on dm-crypt/LUKS avoids ZFS’s own encryption quirks but precludes block-level dedup.