Entropy, a CLI that scans files to find high entropy lines (might be secrets)

Overall reception and use cases

  • Many find the CLI useful as a quick audit step on inherited or legacy code to gauge “how much pain” to expect from secret leaks.
  • Several see it as a “last line of defense” rather than primary protection; it should complement, not replace, strong secret-management and credential rotation.
  • Some worry such tools could give a false sense of security, but others argue any extra layer helps given how low the baseline often is.

Comparison to existing tools

  • Multiple alternatives are mentioned: tartufo, trufflehog, detect-secrets, semgrep-secrets, PyWhat, noseyparker, gitleaks, ggshield.
  • Some commenters think specialized, battle-tested secret scanners outperform naive entropy-based tools, though entropy is more general.

How “high entropy” is calculated and limits

  • The tool appears to estimate entropy from per-line character frequency (Shannon-style). High-entropy lines (hard to compress) often indicate random-looking tokens and secrets.
  • Weak or human-like passwords and passphrases (e.g., multiple words) may evade detection since they have lower character-level entropy despite good overall security.
  • Several criticize treating “entropy of a known string” as mathematically loose, suggesting what’s really approximated is Kolmogorov complexity via compressibility.
  • There’s discussion of better methods: dictionaries tuned to natural language/source code, shared compression dictionaries, or statistical randomness tests.

Alternative detection strategies

  • Some propose using file or repo-level compression ratios (gzip, zstd, xz) as a proxy for entropy instead of per-line character counts.
  • Others suggest language-model–based approaches that flag tokens or spans that are highly “surprising” in context, which could distinguish true secrets from common constants like Base62 alphabets.

Security, distribution, and performance concerns

  • Skepticism about running random precompiled binaries for a security tool; some prefer building from source or running inside containers.
  • Discussion about packaging: Homebrew taps, Docker images, static Go binaries, and ignoring or handling compressed archives.

Feature ideas and limitations

  • Requests for: reading .gitignore, scanning full git history, and more sophisticated strategies (e.g., complexity-based metrics).
  • Acknowledgment that binary/compressed files must be ignored or specially handled, or results will be meaningless.