Show HN: Dut – a fast Linux disk usage calculator

Performance & Implementation Details

  • Tool is designed as a fast, non-interactive du replacement, optimized with raw Linux syscalls (getdents, statx) and multi-threading.
  • Profiling shows most time spent in kernel directory/inode cache management rather than syscall overhead; author saw little benefit in switching from fstatat to statx in later tests.
  • io_uring was considered but rejected: it doesn’t support getdents, mixing models is complex, and perf suggested little syscall overhead. Some commenters argue io_uring might still help and reference experiments showing speedups.
  • Kernel parameter vfs_cache_pressure can dramatically change behavior on huge trees (millions of inodes), affecting the usefulness of tool’s threading.

Filesystem Semantics (Reflinks, Sparse Files, Directory Accounting)

  • Reflinks (e.g., cp --reflink on Btrfs) are not handled specially; sizes come directly from statx. This may double-count shared data.
  • Sparse file disk usage is also read directly from stat family calls.
  • Several comments wish filesystems stored per-directory usage, but others highlight complexity: extra writes, contention on hot directories, hardlink correctness, crash recovery, and special virtual filesystems. Some distributed/cluster FSs (e.g., CephFS) reportedly do maintain approximate per-dir stats.

Approximate / Sampling Approaches

  • Some users want faster, approximate “who’s biggest” scans.
  • Tool author argues you must visit all leaves to avoid missing huge files; others suggest sampling or partial stats, especially leveraging getdents data.
  • Btrfs-specific tool btdu is cited as a successful sampling-based approach (random disk sampling → file paths), with known FS-specific limitations.

UI, Output Format, and Features

  • Not interactive; multiple users still prefer ncdu-/gdu-style TUIs with delete commands, treemaps, or flamegraph-like views.
  • The “tree that grows upward” sorting (children above parents, sorted by size) confuses some; comparisons are made to dust, which uses similar logic but clearer ASCII art.
  • Hidden files are included if permissions allow; symlinks are not followed. A surprising case is that a symlinked directory with trailing / is not traversed (unlike du).
  • Requests for diffing between runs, caching to avoid full rescans, and various graphical/treemap front-ends (CLI and GUI) appear frequently.

Platform, Build, and Language Choices

  • Linux-only due to Linux-specific syscalls; does not work on macOS.
  • Build issues noted around missing -pthread/-lpthread.
  • Choice of C over Rust/Zig triggers debate: performance, syscall ease, flexible array members vs. Rust’s DST limitations, and differing views on security vs. practicality for a local tool.

Comparisons & Alternatives

  • Benchmarks (per README) show it significantly faster than GNU du, faster or comparable to tools like dua, but users also praise ncdu, gdu, diskonaut, and various shell/du/sort scripts.
  • Some Windows users compare it conceptually to NTFS tools like WizTree/WinDirStat, which read FS tables directly; applicability of that model to Linux filesystems is debated.