JSON vs YAML for Configuration: Which Format Should You Use?
A practical comparison of JSON and YAML for configuration files, CI/CD pipelines, and API schemas — with concrete guidance on when each format creates more problems than it solves.
The same data in both formats: what you are actually choosing between
JSON and YAML can represent the same data structures. The choice between them is a choice about syntax, tooling, and — critically — the failure modes you are willing to accept. YAML was explicitly designed to be a superset of JSON (all valid JSON is valid YAML), but the features that make YAML more human-writable are the same features that have caused some of the most notorious configuration bugs in software history.
This is not a theoretical debate. The YAML Norway problem, the YAML type coercion of version numbers, and the YAML reference injection vulnerabilities are real issues that teams encounter in production CI/CD pipelines, Kubernetes configurations, and application settings. Understanding them is what lets you make an informed choice rather than a fashionable one.
The same configuration in both formats
# YAML
server:
host: localhost
port: 8080
tls: true
database:
url: postgres://user:pass@db/myapp
pool_size: 10
features:
- analytics
- billing
- notifications
---
// JSON
{
"server": {
"host": "localhost",
"port": 8080,
"tls": true
},
"database": {
"url": "postgres://user:pass@db/myapp",
"pool_size": 10
},
"features": ["analytics", "billing", "notifications"]
}YAML is clearly more concise — no quotes around strings, no commas, no curly braces for nested objects. For configurations written and maintained by humans, this conciseness is genuinely useful. But the parser doing the work behind YAML's clean surface is significantly more complex than JSON's parser, and that complexity introduces surprising behavior.
YAML's type coercion traps
YAML attempts to infer types from unquoted values. Most of the time this is convenient. Sometimes it is catastrophic.
# YAML type coercion surprises version: 1.10 # parsed as float 1.1, not string "1.10" country_code: NO # parsed as boolean false (Norway problem) port: 8080 # parsed as integer — usually fine octal_port: 0777 # parsed as integer 511 in YAML 1.1 is_active: yes # parsed as boolean true is_active: on # also parsed as boolean true api_key: 0xDEAD # parsed as integer 57005 # The Norway problem: ISO 3166-1 alpha-2 country codes # NO (Norway), NO parses as false # In YAML 1.2 (newer), this was fixed — but many parsers still use 1.1 # Safe: always quote strings that could be misinterpreted version: "1.10" country_code: "NO" api_key: "0xDEAD"
The version number problem is especially common in CI/CD configurations. A Node.js version specified as 16.10 gets parsed as 16.1 (float). A Docker image tag of 1.20 becomes 1.2. These bugs are hard to spot because the YAML file looks correct — it is only when you inspect the parsed value in code that you discover the coercion happened.
YAML anchors: power and footgun
YAML supports anchors (&) and aliases (*) that allow you to define a value once and reference it multiple times. This is a genuinely useful feature for CI configurations where multiple jobs share the same environment setup.
# GitHub Actions: shared step definition via YAML anchor
.common-setup: &common-setup
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
jobs:
test:
steps:
- *common-setup
- run: npm test
build:
steps:
- *common-setup
- run: npm run buildThe risk: YAML anchors create hidden dependencies between sections of a file. A change to the anchored block affects every alias silently. And some YAML parsers — particularly in security-sensitive contexts — deliberately disable anchors because deeply nested or recursive anchors can cause billion-laughs-style denial of service attacks against the parser itself. JSON has no equivalent feature, which is a feature not a limitation.
JSON's advantages: strictness and tooling
JSON's terseness makes it harder to write by hand, but its strictness makes it much easier to validate, diff, and process programmatically. JSON has a formal schema standard (JSON Schema) with wide tooling support. JSON parsers are simpler and more consistent across languages. A JSON syntax error (missing comma, unclosed bracket) produces an immediate parse error. A YAML type coercion bug may parse successfully and produce wrong behavior silently.
JSON also wins for machine-generated configuration. If your application writes configuration files, generates OpenAPI specs, exports settings, or communicates via REST APIs, JSON is the correct format. No program should generate YAML by string concatenation — the indentation rules, quoting requirements, and type coercion interactions are too subtle to get right without a proper YAML library.
When to use each format
Use JSON when: the configuration is machine-generated or machine-read, you need JSON Schema validation, the format is used in an API response or request body, you want the strongest guarantee of consistent parsing across environments, or the configuration is simple enough that YAML's extra features offer no real benefit.
Use YAML when: humans will write and maintain the file frequently, the configuration is complex enough that comments and anchors meaningfully reduce duplication (CI/CD pipelines, Kubernetes manifests, Ansible playbooks), and your team is aware of the type coercion pitfalls and quotes strings defensively. YAML is the standard for Docker Compose, Kubernetes, GitHub Actions, GitLab CI — these choices are already made for you by the ecosystem.
Consider TOML as a third option for application configuration that humans write but does not need the complexity of YAML. TOML has explicit types (strings must be quoted, integers are integers), no type coercion surprises, and is the default for Rust's Cargo, Python's packaging (pyproject.toml), and Hugo. It occupies the readability niche of YAML without the footguns.
Defensive practices for YAML
If you are writing YAML, quote all string values that could plausibly be misinterpreted — version numbers, country codes, ON/OFF flags, hex strings, values starting with digits. Use a YAML linter (yamllint, Prettier with YAML plugin) in CI to catch formatting issues before they reach production. Validate YAML configuration files against a schema where possible — Kubernetes does this via OpenAPI, and tools like cfn-lint do it for CloudFormation. When in doubt, explicitly test what your YAML parser produces: paste a suspicious value into a quick test script and print the type and value before relying on it in production.