YAML document from hell (2023)

YAML Footguns and Implicit Typing

  • Many comments focus on the “Norway problem” (no, on, etc. auto-coercing to booleans) and sexagesimal parsing (22:22 → time/number), calling these choices “too clever” and overfitted to niche use cases.
  • Kubernetes still effectively uses YAML 1.1, so these traps remain real; a long-standing issue to upgrade remains unresolved.
  • People highlight confusion from unquoted strings, non-string keys, and invisible structure (indentation as punctuation), making large files hard to reason about.
  • Anchors/aliases are seen as powerful by some (e.g. deduplicating Kubernetes/Helm configs) and as confusing or unreadable by others.

Workarounds and Linting

  • A recurring recommendation is to “quote everything” (values, and often keys) plus use linting tools (e.g. yamllint) to avoid most traps.
  • Others argue that once you require pervasive quoting, YAML’s main advantages (terse, delimiter-free, heuristic parsing) are undermined.
  • Type-aware deserialization and strict type checking are suggested as safer than auto-fixing or custom YAML dialects baked into parsers.

Tooling Experiences (Ansible, Kubernetes, etc.)

  • Ansible is praised for low-friction onboarding (no agents, easy to start) but criticized for:
    • Poor scalability and performance on large fleets.
    • Painful debugging and whitespace/templating complexity (Jinja2 + YAML).
  • Some use Ansible only to bootstrap more scalable tools (Puppet, Terraform); others report good results with techniques like job slicing.
  • YAML-heavy ecosystems like Kubernetes and GitLab are cited as places where these quirks regularly surface.

Alternatives and Subsets

  • Many alternatives are mentioned, none clearly dominant:
    • JSON (+ comments variants: JSON5, JSONC, HuJSON), TOML, INI, XML, S-expressions.
    • Config DSLs: HCL, Nix, CUE, Dhall, Jsonnet, Starlark, KDL, Pkl, Lua tables, PHP arrays.
    • YAML subsets / replacements: StrictYAML, HUML, and the author’s own RCL.
  • Trade-offs:
    • JSON is stable, ubiquitous, and good for interchange but poor for hand-editing without comments.
    • TOML is seen as good for small configs but awkward for deep nesting and multiple equivalent representations.
    • HCL/CUE/Dhall/Nix-style languages add types and logic but are heavier and less widely supported.

Why YAML Persisted

  • Explanations include:
    • Human readability and easy hand-editing, especially for configs.
    • Support for comments (unlike standard JSON).
    • Greater expressiveness (anchors, complex structures) than simple INI-like formats.
  • Several commenters still consider YAML “irreparably broken” and advocate migration toward JSON(+comments) or simpler, stricter formats, but acknowledge ecosystem inertia (Kubernetes, existing configs) makes that difficult.