Why not parse `ls` and what to do instead
Why parsing ls is discouraged
- Core objection:
lsis a human-oriented formatter; turning its output back into machine data is fragile and needless extra work. - Typical anti-pattern:
for f in $(ls)converts a list of filenames to a single string, then re-splits it, breaking on spaces, newlines, and control characters. - Shell already exposes directory data structurally via globs (
for f in *) and tools likefind, so parsinglsis considered a “wrong obvious solution” beginners gravitate to.
Preferred Unix-side alternatives
- Use shell globs with options like
nullglob/failglobto handle missing matches robustly. - Use
findfor recursion, filtering, and safe execution:-exec,-print0 | xargs -0,-printffor custom formats, and sometimes-regex. - For tricky pipelines,
while read -r -d '\0'loops plusfind -print0are recommended, though commenters note these quickly become complex and easy to get subtly wrong. - Some argue “always use
findrather thanfor+ glob” for serious scripts.
Structured/modern shells and higher-level languages
- Several posters advocate PowerShell, Nushell, or object-stream shells (e.g. Python-based) where
lsreturns structured objects instead of text, eliminating many parsing issues. - Enthusiasts highlight easier filtering, sorting, and type-aware pipelines; critics say these shells create separate ecosystems, lack traditional features, and aren’t installed everywhere.
- Strong pro-Python contingent: shell is seen as a brittle glue language; Python (or similar) is preferred once scripts stop being trivial. Others counter that shell is faster to start, ubiquitous, and often “good enough”.
Filename edge cases and robustness
- Debate centers on how much to care about filenames with spaces, newlines, control chars, or odd Unicode.
- Some insist robust tools must handle all legal bytes except
/and NUL; others argue such names are rare in their domains and not worth the complexity. - There is interest in restricting filename character sets where possible (e.g., internal systems) but recognition that networked filesystems and other platforms prevent relying on that globally.
Standardized machine-readable output and teaching
- Multiple people want a uniform
--json(or CSV-like) output mode for core utilities (ls,find,df,mount,stat, etc.) to simplify safe parsing; others respond that real programs should call APIs likereaddirinstead. - Skepticism that a coordinated JSON standard across all Unix tools is realistically achievable.
- For teaching,
ls | ...is acknowledged as a natural early pattern; later, instructors must “unteach” parsinglsand introduce globs,find, and safer patterns.