2024-06-04

Entropy, a CLI that scans files to find high entropy lines (might be secrets)

Overall reception and use cases

Many find the CLI useful as a quick audit step on inherited or legacy code to gauge “how much pain” to expect from secret leaks.
Several see it as a “last line of defense” rather than primary protection; it should complement, not replace, strong secret-management and credential rotation.
Some worry such tools could give a false sense of security, but others argue any extra layer helps given how low the baseline often is.

Comparison to existing tools

Multiple alternatives are mentioned: tartufo, trufflehog, detect-secrets, semgrep-secrets, PyWhat, noseyparker, gitleaks, ggshield.
Some commenters think specialized, battle-tested secret scanners outperform naive entropy-based tools, though entropy is more general.

How “high entropy” is calculated and limits

The tool appears to estimate entropy from per-line character frequency (Shannon-style). High-entropy lines (hard to compress) often indicate random-looking tokens and secrets.
Weak or human-like passwords and passphrases (e.g., multiple words) may evade detection since they have lower character-level entropy despite good overall security.
Several criticize treating “entropy of a known string” as mathematically loose, suggesting what’s really approximated is Kolmogorov complexity via compressibility.
There’s discussion of better methods: dictionaries tuned to natural language/source code, shared compression dictionaries, or statistical randomness tests.

Alternative detection strategies

Some propose using file or repo-level compression ratios (gzip, zstd, xz) as a proxy for entropy instead of per-line character counts.
Others suggest language-model–based approaches that flag tokens or spans that are highly “surprising” in context, which could distinguish true secrets from common constants like Base62 alphabets.

Security, distribution, and performance concerns

Skepticism about running random precompiled binaries for a security tool; some prefer building from source or running inside containers.
Discussion about packaging: Homebrew taps, Docker images, static Go binaries, and ignoring or handling compressed archives.

Feature ideas and limitations

Requests for: reading .gitignore, scanning full git history, and more sophisticated strategies (e.g., complexity-based metrics).
Acknowledgment that binary/compressed files must be ignored or specially handled, or results will be meaningless.

Related topics