OpenZFS deduplication is good now and you shouldn't use it

Where ZFS dedup helps vs. where it doesn’t

  • Strong wins reported for:
    • Many similar VMs / templates on shared storage (classic enterprise use; also some home labs).
    • Highly duplicated build inputs or archives (build pools, personal “dumping ground” archives, nix store, Flatpak/OSTree-like setups).
    • Some users see ~3–8x space savings in these narrow workloads, sometimes making NVMe storage economically viable.
  • Many commenters confirm that “general purpose” desktop/laptop or mixed file server workloads show little benefit.
  • Logs and text usually benefit far more from compression than from dedup.

Cost, RAM, and performance concerns

  • Traditional ZFS inline dedup requires a large in-RAM dedup table; widely cited rule of thumb: up to multiple GB RAM per TB of data.
  • If the table spills to disk, performance can collapse “to nearly zero.”
  • Every write/free triggers table lookups and updates, even when there is no duplicate, so random or mostly-unique data pays persistent overhead.
  • Block-level, fixed-size dedup means partial overlaps or misaligned repeated assets are missed.

Desire for offline / lazy dedup

  • Several people want “lazy” or scrub-time dedup to avoid write-path penalties.
  • Others note this would require block pointer rewrite across snapshots, which ZFS’ Merkle-tree design effectively forbids.
  • Workarounds discussed:
    • Separate datasets: write to non-dedup dataset, later move to dedup-enabled one.
    • Userspace “offline dedup” with hardlinks or reflinks (rdfind, jdupes, duperemove) once ZFS exposes the right syscalls.
    • Planned/desired tools that scan for identical file ranges and convert them to cloned blocks.

Reflinks, block cloning, and alternatives

  • Many argue modern block cloning / reflinks (BRT, copy_file_range, cp --reflink=auto) provide most of the practical benefit:
    • Cheap, instantaneous “copies” when the system knows an operation is a copy (VM templates, file copies, containers, Flatpak).
    • No global dedup table; overhead is proportional to actual clones.
  • Consensus: enable ZFS compression almost everywhere; consider dedup only for very specific, proven-high-duplication workloads.

Enterprise arrays vs. filesystems

  • Some report 3:1–6:1+ savings with enterprise arrays (Pure, Dell/EMC, Nimble, Windows server dedup).
  • Others point out:
    • Arrays often use smaller blocks, offline or background dedup, and different economics (power, rack space, controller cost).
    • Filesystem-level inline dedup is harder to make generally cheap and safe.

Other themes

  • Security: concern about cross-tenant information leaks via dedupe (timing/side channels), echoing prior RAM-dedup issues.
  • Snapshots: dedup or clone changes don’t reclaim space until old snapshots referencing blocks are removed.
  • Encryption: stacking ZFS on dm-crypt/LUKS avoids ZFS’s own encryption quirks but precludes block-level dedup.