YAML: The Norway Problem (2022)

Root cause and YAML versions

  • The “Norway problem” comes from YAML 1.1’s implicit typing rules (e.g., NO → boolean), and from libraries like PyYAML using the full 1.1 schema by default.
  • Workarounds mentioned:
    • Use loaders that treat everything as strings (e.g., “base”/safe loaders).
    • Switch to parsers that support YAML 1.2, where implicit typing is drastically reduced and conversion is configurable.
  • YAML 1.2 (since 2009) effectively fixes this by treating scalars as opaque strings plus optional, user‑defined schemas, but many popular libs (libyaml, PyYAML) are still stuck on 1.1.
  • Some newer implementations (e.g., libfyaml) and tools like StrictYAML follow a string‑first + schema approach.

How often and where it bites

  • Many commenters have never hit the issue in years of YAML use; others saw it repeatedly in:
    • Country lists (geo IP whitelists, signup allowlists).
    • OpenAPI specs and cross‑team configs.
    • Ansible playbooks (countries, file modes, booleans).
    • Environment variables (e.g., platform auto‑coercing “true”).
  • It’s described as a “scissor bug”: invisible for most, catastrophic for the unlucky subset (Norway, “NA”, certain MACs, etc.).

Mitigations and alternatives

  • Common advice:
    • Quote all nontrivial strings (country codes, IDs, hashes, MACs, dates, IPs, names).
    • Use YAML tags (e.g., !!boolean) or schemas where supported.
    • Run linters (yamllint, ansible-lint) to flag truthy/typing pitfalls.
  • Some argue this shows “too much YAML”: configs should be small; large manifests should be generated from real languages (Python, Dhall, Terraform, etc.) into JSON/YAML.
  • JSON is favored by several participants as safer (stricter, mandatory quotes, no comments though) and often accepted anywhere YAML is.
  • Other alternatives mentioned: NestedText, protobuf text format, custom config formats (e.g., conl.dev).

Design criticism and robustness debate

  • YAML is widely criticized as overcomplicated, “too clever,” with many ways to express the same thing and surprising coercions (Norway, exponent hashes, sexagesimal times breaking all-decimal MACs).
  • Some tie this to misapplied “be liberal in what you accept”; others argue the issue is simply poor spec design and ambiguous implicit typing, not real robustness.
  • Consensus in the thread leans toward: parsers should be stricter, conversion rules explicit, and humans shielded from magical auto-typing in configuration files.