Malloc broke Serenity's JPGLoader, or: how to win the lottery (2021)
Hash tables, ordering, and randomness
- Core issue: someone stored inherently ordered data in a hash table and then relied on iteration order, which later changed and broke things.
- Many implementations either:
- Randomize hash table iteration to flush out accidental order dependence and mitigate hash DoS attacks, or
- Guarantee a stable iteration order (often insertion order).
- Several commenters argue that leaving iteration order “unspecified but predictable” invites bugs (Hyrum’s Law).
Language and library behavior
- JavaScript and Python both evolved from unspecified map order to de facto insertion-ordered, then standardized it.
- Rust offers multiple map types (unordered, key-ordered, insertion-ordered) with different memory and performance characteristics; dynamic JSON deserialization with insertion-ordered maps can significantly bloat memory.
- PHP arrays and some custom C++ containers also preserve insertion order.
- Techniques to preserve order include:
- Hash table + doubly linked list, or
- Sparse array of indices into a dense vector of entries.
Determinism vs performance and randomness
- Some prefer ordered/deterministic maps to avoid non-deterministic bugs and improve reproducibility.
- Others prefer explicit choice: use unordered maps unless there is a clear ordering requirement.
- There is disagreement over randomized hashing:
- Pro: protects against hash-based DoS and exposes hidden order dependencies.
- Con: harms debuggability when the randomness is not seedable/reproducible.
- Subtle bugs can arise around per-process vs per-interpreter randomization, especially across
fork.
Anecdotes and professionalism
- One commenter admits to intentionally writing code that depended on undocumented map insertion order as petty revenge after giving notice; others push back as unprofessional and harmful to colleagues.
CPU performance and build experience
- Debate over how much faster modern CPUs are than a 2011 Sandy Bridge laptop:
- Some claim “not that much” per core; others point to large gains from more cores, SIMD width, memory bandwidth, and cache.
- Comparison to 1992–2007 CPU evolution shows earlier decades had more dramatic single-socket performance jumps.
- Several report that even 2011-era machines can still handle modern dev workloads, though with slower builds and some UI lag.
Bug cause, debugging style, and tooling
- Thread agrees: the malloc change only exposed the underlying misuse of a hash table for ordered data.
- Some think additional targeted logging might have been faster than long bisection, though bisection was ultimately effective.
- Mention that future C++ may gain something like
malloc_good_size.