2024-09-10

A good day to trie-hard: saving compute 1% at a time

Header handling strategy & risks

Many are surprised Cloudflare uses a “list of header names that are internal” instead of structural separation or strict prefixes.
Concerns raised: name collisions with user headers, inconsistent lists across services, sanitization bugs, and issues with Connection header semantics.
Some argue this pattern is common in large enterprises and edge proxies; others say that doesn’t make it less fragile.
Several suggest prefixing all internal headers (CFInt, X-CF-) and stripping by prefix, but others note legacy systems, early headers, third‑party appliances, and acquisitions make global renaming hard.
It’s stated that a longer‑term plan is to stop using HTTP headers for internal IPC entirely; some worry that makes the trie work a short‑lived stopgap.

Alternative designs proposed

Separate metadata channel: dedicated internal protocol (e.g., Protobufs or custom encapsulation) instead of overloading HTTP headers.
Structural approaches: maintain a list of “allowed to exit” headers instead of “internal to strip” (deny‑by‑default), or record original inbound headers and only emit those.
Data‑structure tweaks: force internal headers to the front and remove first N; use a header-count sentinel; or tag headers as internal at creation rather than inferring later.

Tries vs hashes vs regex

Discussion on why a trie beats hash tables here: hashing strings requires touching every byte; tries often reject on the first character, and most lookups are misses.
Alternatives floated: custom fast hash functions, perfect hashing, Bloom/binary-fuse filters, hardware CRC32, or specialized hash maps; others point out these still need substantial hashing work.
Regex/Aho‑Corasick and DFAs are mentioned as conceptually similar; regex libraries carry general-purpose overhead, DFAs can be faster but use more memory and build time.
Some critique the article’s Big‑O characterizations, arguing they blur key factors like cache behavior versus comparison counts.

Performance impact & ROI debate

The function optimized is in an extremely hot path, and small per‑request wins aggregate across tens of millions of requests per second.
Some see saving hundreds of cores as modest vs overall fleet and question engineering ROI compared to tackling larger architectural issues.
Others counter that recurring CPU, power, and capacity savings, plus improved headroom, justify micro‑optimizations in hot code, and that the write‑up also has marketing and educational value.

Related topics