2025-08-05

Spotting base64 encoded JSON, certificates, and private keys

Recognizing Base64 Patterns

Many commenters relate to “seeing” structures in base64 after enough exposure, especially JWTs, X.509 certs, keys, and Kubernetes secrets.
Common telltale prefixes:
- eyJ / eyJhbG → JSON / JWT header (“{” + " and typically "alg").
- LS0 / tLS → sequences of ----- (PEM headers/footers, YAML ---).
- MI / MII → ASN.1 DER SEQUENCE with long length (certs, keys, CRLs).
- AQAB → RSA exponent 65537.
- Also listed: R0lGOD (GIF), iVBOR (PNG), /9j/ (JPEG), PD94 (XML).
Some note quasi-fixed points and “self-similar” base64 strings, and explain the bit-level mechanics behind {" → ey.

Wastefulness of JSON + Base64 (Especially in JWTs)

Strong criticism of stacking JSON + base64 (often twice) + HTTP headers:
- Base64 adds ~33% per encoding; double encoding ≈ 78% overhead before JSON.
- For security tokens, this bloat hits every request header or HTTP/2 connection.
Example: a few fixed-size fields could be a compact binary TLV block, instead of kilobyte-scale JWT-like blobs.
Some call embedding base64 inside JSON that’s itself base64-encoded “laughable” and “Russian nesting dolls.”

Alternatives to JSON/Base64 for Structured/Binary Data

Suggestions:
- MessagePack, CBOR, BSON: JSON-like but binary and support native binary blobs.
- Simple TLV / IFF-style formats (AIFF/RIFF/PNG-like) as easy, efficient, schemaless encodings.
- ASN.1 and protobuf for structured data, albeit with schema overhead.
Several argue binary formats are underrated and far faster to parse than JSON.

Security and Misuse of Base64

Repeated reminder: base64 is an encoding, not encryption or obfuscation.
Storing secrets base64-encoded in repos or JWT payloads is unsafe unless separately encrypted.
Some suggest light obfuscation (even ROT13-level) can reduce obvious leak visibility, but others implicitly see that as weak “security by obscurity.”

Experience, “Obviousness,” and Curiosity

Split reactions: some say these patterns are “obvious” to anyone who’s handled certs/JWTs; others appreciate the post as a new, useful heuristic.
Anecdotes about reading ASCII from hex, EBCDIC from logs, or sendmail.cf / core dumps highlight how pattern recognition grows with experience.
Minor debate about whether the author should have explained why the patterns arise, and whether this reflects broader “incuriosity” in modern CS education.

Related topics