Malloc broke Serenity's JPGLoader, or: how to win the lottery (2021)

Hash tables, ordering, and randomness

  • Core issue: someone stored inherently ordered data in a hash table and then relied on iteration order, which later changed and broke things.
  • Many implementations either:
    • Randomize hash table iteration to flush out accidental order dependence and mitigate hash DoS attacks, or
    • Guarantee a stable iteration order (often insertion order).
  • Several commenters argue that leaving iteration order “unspecified but predictable” invites bugs (Hyrum’s Law).

Language and library behavior

  • JavaScript and Python both evolved from unspecified map order to de facto insertion-ordered, then standardized it.
  • Rust offers multiple map types (unordered, key-ordered, insertion-ordered) with different memory and performance characteristics; dynamic JSON deserialization with insertion-ordered maps can significantly bloat memory.
  • PHP arrays and some custom C++ containers also preserve insertion order.
  • Techniques to preserve order include:
    • Hash table + doubly linked list, or
    • Sparse array of indices into a dense vector of entries.

Determinism vs performance and randomness

  • Some prefer ordered/deterministic maps to avoid non-deterministic bugs and improve reproducibility.
  • Others prefer explicit choice: use unordered maps unless there is a clear ordering requirement.
  • There is disagreement over randomized hashing:
    • Pro: protects against hash-based DoS and exposes hidden order dependencies.
    • Con: harms debuggability when the randomness is not seedable/reproducible.
  • Subtle bugs can arise around per-process vs per-interpreter randomization, especially across fork.

Anecdotes and professionalism

  • One commenter admits to intentionally writing code that depended on undocumented map insertion order as petty revenge after giving notice; others push back as unprofessional and harmful to colleagues.

CPU performance and build experience

  • Debate over how much faster modern CPUs are than a 2011 Sandy Bridge laptop:
    • Some claim “not that much” per core; others point to large gains from more cores, SIMD width, memory bandwidth, and cache.
    • Comparison to 1992–2007 CPU evolution shows earlier decades had more dramatic single-socket performance jumps.
  • Several report that even 2011-era machines can still handle modern dev workloads, though with slower builds and some UI lag.

Bug cause, debugging style, and tooling

  • Thread agrees: the malloc change only exposed the underlying misuse of a hash table for ordered data.
  • Some think additional targeted logging might have been faster than long bisection, though bisection was ultimately effective.
  • Mention that future C++ may gain something like malloc_good_size.