Entropy, a CLI that scans files to find high entropy lines (might be secrets)
Overall reception and use cases
- Many find the CLI useful as a quick audit step on inherited or legacy code to gauge “how much pain” to expect from secret leaks.
- Several see it as a “last line of defense” rather than primary protection; it should complement, not replace, strong secret-management and credential rotation.
- Some worry such tools could give a false sense of security, but others argue any extra layer helps given how low the baseline often is.
Comparison to existing tools
- Multiple alternatives are mentioned: tartufo, trufflehog, detect-secrets, semgrep-secrets, PyWhat, noseyparker, gitleaks, ggshield.
- Some commenters think specialized, battle-tested secret scanners outperform naive entropy-based tools, though entropy is more general.
How “high entropy” is calculated and limits
- The tool appears to estimate entropy from per-line character frequency (Shannon-style). High-entropy lines (hard to compress) often indicate random-looking tokens and secrets.
- Weak or human-like passwords and passphrases (e.g., multiple words) may evade detection since they have lower character-level entropy despite good overall security.
- Several criticize treating “entropy of a known string” as mathematically loose, suggesting what’s really approximated is Kolmogorov complexity via compressibility.
- There’s discussion of better methods: dictionaries tuned to natural language/source code, shared compression dictionaries, or statistical randomness tests.
Alternative detection strategies
- Some propose using file or repo-level compression ratios (gzip, zstd, xz) as a proxy for entropy instead of per-line character counts.
- Others suggest language-model–based approaches that flag tokens or spans that are highly “surprising” in context, which could distinguish true secrets from common constants like Base62 alphabets.
Security, distribution, and performance concerns
- Skepticism about running random precompiled binaries for a security tool; some prefer building from source or running inside containers.
- Discussion about packaging: Homebrew taps, Docker images, static Go binaries, and ignoring or handling compressed archives.
Feature ideas and limitations
- Requests for: reading
.gitignore, scanning full git history, and more sophisticated strategies (e.g., complexity-based metrics). - Acknowledgment that binary/compressed files must be ignored or specially handled, or results will be meaningless.