2024-05-14

Fast linked lists

Alternative list structures

Several commenters discuss hybrids like unrolled linked lists and “ropes” (trees of arrays) as ways to reduce pointer chasing and improve indexing.
B‑trees / B+ trees and persistent vectors are highlighted as superior for large collections, offering logarithmic indexing and good iteration when leaves are linked.
Intrusive linked lists (pointers stored inside user objects) are proposed to cut allocations, shrink memory footprint, and improve locality.

Performance & Cache Locality

A recurring theme: classic linked lists are slow mostly due to poor cache locality and scattered allocations.
Arena/bump allocation or GC compaction can pack nodes contiguously, which significantly improves traversal speed.
Some Lisps and runtimes try to allocate cons cells sequentially and then compact/move lists to restore locality.
Others argue that in many runtimes, arrays of pointers don’t give much more locality than intrusive lists, since data is still elsewhere.

Use Cases in Systems vs High-Level Code

One side claims linked lists are rarely needed in everyday application/web work; vectors and hash maps dominate.
Another side emphasizes that kernels, drivers, embedded systems, intrusive free lists, message queues, job queues, and LRU lists still rely heavily on linked lists, especially when allocations must be tightly controlled or non‑blocking.
There’s debate over how representative those low‑level domains are relative to all programmers.

JSON Validation & Data Representation

Multiple comments note that in the article’s context, the real bottleneck is JSON parsing, especially via generic serde_json::Value.
Suggestions include:
- Using a compact, schema‑aware AST or token stream instead of full Value.
- Encoding token type and position into small fixed structs or integers to maximize bytes/second throughput.
- Leveraging serde’s Visitor API to validate during deserialization, avoiding full materialization.
- Considering faster JSON libraries or lazy parsing.

Benchmarks, Methodology, and Critique

Some criticize the article’s “linked lists faster than Vec” framing as misleading:
- The Vec baseline creates a new vector per validation, incurring repeated allocations.
- Alternatives like reusing a Vec<&str>, pre‑reserving capacity, or storing raw bytes could be both simpler and faster.
- Pulling in a persistent vector library for this narrow problem is seen as overkill relative to a mutating stack‑like Vec.
The author’s incremental, “try naive things then refine” process is acknowledged, but critics stress that benchmark design should not unfairly handicap Vec.

Hardware and Compiler Considerations

Some propose hardware help (marking pointers for prefetch). Others respond that naive pointer‑based prefetching can thrash caches and often yields little benefit.
Apple Silicon’s speculative pointer chasing is mentioned as both an optimization for pointer‑heavy code and a source of side‑channel risk.
There’s discussion of loop unrolling for linked‑list traversal and of compilers potentially representing “list‑like” APIs with non‑list internal layouts when provably safe.