Spotting base64 encoded JSON, certificates, and private keys

Recognizing Base64 Patterns

  • Many commenters relate to “seeing” structures in base64 after enough exposure, especially JWTs, X.509 certs, keys, and Kubernetes secrets.
  • Common telltale prefixes:
    • eyJ / eyJhbG → JSON / JWT header (“{” + " and typically "alg").
    • LS0 / tLS → sequences of ----- (PEM headers/footers, YAML ---).
    • MI / MII → ASN.1 DER SEQUENCE with long length (certs, keys, CRLs).
    • AQAB → RSA exponent 65537.
    • Also listed: R0lGOD (GIF), iVBOR (PNG), /9j/ (JPEG), PD94 (XML).
  • Some note quasi-fixed points and “self-similar” base64 strings, and explain the bit-level mechanics behind {"ey.

Wastefulness of JSON + Base64 (Especially in JWTs)

  • Strong criticism of stacking JSON + base64 (often twice) + HTTP headers:
    • Base64 adds ~33% per encoding; double encoding ≈ 78% overhead before JSON.
    • For security tokens, this bloat hits every request header or HTTP/2 connection.
  • Example: a few fixed-size fields could be a compact binary TLV block, instead of kilobyte-scale JWT-like blobs.
  • Some call embedding base64 inside JSON that’s itself base64-encoded “laughable” and “Russian nesting dolls.”

Alternatives to JSON/Base64 for Structured/Binary Data

  • Suggestions:
    • MessagePack, CBOR, BSON: JSON-like but binary and support native binary blobs.
    • Simple TLV / IFF-style formats (AIFF/RIFF/PNG-like) as easy, efficient, schemaless encodings.
    • ASN.1 and protobuf for structured data, albeit with schema overhead.
  • Several argue binary formats are underrated and far faster to parse than JSON.

Security and Misuse of Base64

  • Repeated reminder: base64 is an encoding, not encryption or obfuscation.
  • Storing secrets base64-encoded in repos or JWT payloads is unsafe unless separately encrypted.
  • Some suggest light obfuscation (even ROT13-level) can reduce obvious leak visibility, but others implicitly see that as weak “security by obscurity.”

Experience, “Obviousness,” and Curiosity

  • Split reactions: some say these patterns are “obvious” to anyone who’s handled certs/JWTs; others appreciate the post as a new, useful heuristic.
  • Anecdotes about reading ASCII from hex, EBCDIC from logs, or sendmail.cf / core dumps highlight how pattern recognition grows with experience.
  • Minor debate about whether the author should have explained why the patterns arise, and whether this reflects broader “incuriosity” in modern CS education.