Show HN: Dut – a fast Linux disk usage calculator
Performance & Implementation Details
- Tool is designed as a fast, non-interactive
dureplacement, optimized with raw Linux syscalls (getdents,statx) and multi-threading. - Profiling shows most time spent in kernel directory/inode cache management rather than syscall overhead; author saw little benefit in switching from
fstatattostatxin later tests. io_uringwas considered but rejected: it doesn’t supportgetdents, mixing models is complex, and perf suggested little syscall overhead. Some commenters argueio_uringmight still help and reference experiments showing speedups.- Kernel parameter
vfs_cache_pressurecan dramatically change behavior on huge trees (millions of inodes), affecting the usefulness of tool’s threading.
Filesystem Semantics (Reflinks, Sparse Files, Directory Accounting)
- Reflinks (e.g.,
cp --reflinkon Btrfs) are not handled specially; sizes come directly fromstatx. This may double-count shared data. - Sparse file disk usage is also read directly from
statfamily calls. - Several comments wish filesystems stored per-directory usage, but others highlight complexity: extra writes, contention on hot directories, hardlink correctness, crash recovery, and special virtual filesystems. Some distributed/cluster FSs (e.g., CephFS) reportedly do maintain approximate per-dir stats.
Approximate / Sampling Approaches
- Some users want faster, approximate “who’s biggest” scans.
- Tool author argues you must visit all leaves to avoid missing huge files; others suggest sampling or partial stats, especially leveraging
getdentsdata. - Btrfs-specific tool
btduis cited as a successful sampling-based approach (random disk sampling → file paths), with known FS-specific limitations.
UI, Output Format, and Features
- Not interactive; multiple users still prefer ncdu-/gdu-style TUIs with delete commands, treemaps, or flamegraph-like views.
- The “tree that grows upward” sorting (children above parents, sorted by size) confuses some; comparisons are made to
dust, which uses similar logic but clearer ASCII art. - Hidden files are included if permissions allow; symlinks are not followed. A surprising case is that a symlinked directory with trailing
/is not traversed (unlikedu). - Requests for diffing between runs, caching to avoid full rescans, and various graphical/treemap front-ends (CLI and GUI) appear frequently.
Platform, Build, and Language Choices
- Linux-only due to Linux-specific syscalls; does not work on macOS.
- Build issues noted around missing
-pthread/-lpthread. - Choice of C over Rust/Zig triggers debate: performance, syscall ease, flexible array members vs. Rust’s DST limitations, and differing views on security vs. practicality for a local tool.
Comparisons & Alternatives
- Benchmarks (per README) show it significantly faster than GNU
du, faster or comparable to tools likedua, but users also praisencdu,gdu,diskonaut, and various shell/du/sortscripts. - Some Windows users compare it conceptually to NTFS tools like WizTree/WinDirStat, which read FS tables directly; applicability of that model to Linux filesystems is debated.