Why not parse `ls` and what to do instead

Why parsing ls is discouraged

  • Core objection: ls is a human-oriented formatter; turning its output back into machine data is fragile and needless extra work.
  • Typical anti-pattern: for f in $(ls) converts a list of filenames to a single string, then re-splits it, breaking on spaces, newlines, and control characters.
  • Shell already exposes directory data structurally via globs (for f in *) and tools like find, so parsing ls is considered a “wrong obvious solution” beginners gravitate to.

Preferred Unix-side alternatives

  • Use shell globs with options like nullglob / failglob to handle missing matches robustly.
  • Use find for recursion, filtering, and safe execution: -exec, -print0 | xargs -0, -printf for custom formats, and sometimes -regex.
  • For tricky pipelines, while read -r -d '\0' loops plus find -print0 are recommended, though commenters note these quickly become complex and easy to get subtly wrong.
  • Some argue “always use find rather than for + glob” for serious scripts.

Structured/modern shells and higher-level languages

  • Several posters advocate PowerShell, Nushell, or object-stream shells (e.g. Python-based) where ls returns structured objects instead of text, eliminating many parsing issues.
  • Enthusiasts highlight easier filtering, sorting, and type-aware pipelines; critics say these shells create separate ecosystems, lack traditional features, and aren’t installed everywhere.
  • Strong pro-Python contingent: shell is seen as a brittle glue language; Python (or similar) is preferred once scripts stop being trivial. Others counter that shell is faster to start, ubiquitous, and often “good enough”.

Filename edge cases and robustness

  • Debate centers on how much to care about filenames with spaces, newlines, control chars, or odd Unicode.
  • Some insist robust tools must handle all legal bytes except / and NUL; others argue such names are rare in their domains and not worth the complexity.
  • There is interest in restricting filename character sets where possible (e.g., internal systems) but recognition that networked filesystems and other platforms prevent relying on that globally.

Standardized machine-readable output and teaching

  • Multiple people want a uniform --json (or CSV-like) output mode for core utilities (ls, find, df, mount, stat, etc.) to simplify safe parsing; others respond that real programs should call APIs like readdir instead.
  • Skepticism that a coordinated JSON standard across all Unix tools is realistically achievable.
  • For teaching, ls | ... is acknowledged as a natural early pattern; later, instructors must “unteach” parsing ls and introduce globs, find, and safer patterns.