YAML: The Norway Problem (2022)
Root cause and YAML versions
- The “Norway problem” comes from YAML 1.1’s implicit typing rules (e.g.,
NO→ boolean), and from libraries like PyYAML using the full 1.1 schema by default. - Workarounds mentioned:
- Use loaders that treat everything as strings (e.g., “base”/safe loaders).
- Switch to parsers that support YAML 1.2, where implicit typing is drastically reduced and conversion is configurable.
- YAML 1.2 (since 2009) effectively fixes this by treating scalars as opaque strings plus optional, user‑defined schemas, but many popular libs (libyaml, PyYAML) are still stuck on 1.1.
- Some newer implementations (e.g., libfyaml) and tools like StrictYAML follow a string‑first + schema approach.
How often and where it bites
- Many commenters have never hit the issue in years of YAML use; others saw it repeatedly in:
- Country lists (geo IP whitelists, signup allowlists).
- OpenAPI specs and cross‑team configs.
- Ansible playbooks (countries, file modes, booleans).
- Environment variables (e.g., platform auto‑coercing “true”).
- It’s described as a “scissor bug”: invisible for most, catastrophic for the unlucky subset (Norway, “NA”, certain MACs, etc.).
Mitigations and alternatives
- Common advice:
- Quote all nontrivial strings (country codes, IDs, hashes, MACs, dates, IPs, names).
- Use YAML tags (e.g.,
!!boolean) or schemas where supported. - Run linters (yamllint, ansible-lint) to flag truthy/typing pitfalls.
- Some argue this shows “too much YAML”: configs should be small; large manifests should be generated from real languages (Python, Dhall, Terraform, etc.) into JSON/YAML.
- JSON is favored by several participants as safer (stricter, mandatory quotes, no comments though) and often accepted anywhere YAML is.
- Other alternatives mentioned: NestedText, protobuf text format, custom config formats (e.g., conl.dev).
Design criticism and robustness debate
- YAML is widely criticized as overcomplicated, “too clever,” with many ways to express the same thing and surprising coercions (Norway, exponent hashes, sexagesimal times breaking all-decimal MACs).
- Some tie this to misapplied “be liberal in what you accept”; others argue the issue is simply poor spec design and ambiguous implicit typing, not real robustness.
- Consensus in the thread leans toward: parsers should be stricter, conversion rules explicit, and humans shielded from magical auto-typing in configuration files.