2024-07-07

Malloc broke Serenity's JPGLoader, or: how to win the lottery (2021)

Hash tables, ordering, and randomness

Core issue: someone stored inherently ordered data in a hash table and then relied on iteration order, which later changed and broke things.
Many implementations either:
- Randomize hash table iteration to flush out accidental order dependence and mitigate hash DoS attacks, or
- Guarantee a stable iteration order (often insertion order).
Several commenters argue that leaving iteration order “unspecified but predictable” invites bugs (Hyrum’s Law).

Language and library behavior

JavaScript and Python both evolved from unspecified map order to de facto insertion-ordered, then standardized it.
Rust offers multiple map types (unordered, key-ordered, insertion-ordered) with different memory and performance characteristics; dynamic JSON deserialization with insertion-ordered maps can significantly bloat memory.
PHP arrays and some custom C++ containers also preserve insertion order.
Techniques to preserve order include:
- Hash table + doubly linked list, or
- Sparse array of indices into a dense vector of entries.

Determinism vs performance and randomness

Some prefer ordered/deterministic maps to avoid non-deterministic bugs and improve reproducibility.
Others prefer explicit choice: use unordered maps unless there is a clear ordering requirement.
There is disagreement over randomized hashing:
- Pro: protects against hash-based DoS and exposes hidden order dependencies.
- Con: harms debuggability when the randomness is not seedable/reproducible.
Subtle bugs can arise around per-process vs per-interpreter randomization, especially across fork.

Anecdotes and professionalism

One commenter admits to intentionally writing code that depended on undocumented map insertion order as petty revenge after giving notice; others push back as unprofessional and harmful to colleagues.

CPU performance and build experience

Debate over how much faster modern CPUs are than a 2011 Sandy Bridge laptop:
- Some claim “not that much” per core; others point to large gains from more cores, SIMD width, memory bandwidth, and cache.
- Comparison to 1992–2007 CPU evolution shows earlier decades had more dramatic single-socket performance jumps.
Several report that even 2011-era machines can still handle modern dev workloads, though with slower builds and some UI lag.

Bug cause, debugging style, and tooling

Thread agrees: the malloc change only exposed the underlying misuse of a hash table for ordered data.
Some think additional targeted logging might have been faster than long bisection, though bisection was ultimately effective.
Mention that future C++ may gain something like malloc_good_size.

Related topics