Ask HN: Fast data structures for disjoint intervals?

Problem framing

  • Data: many disjoint, often adjacent time intervals (usually “free” slots) with day‑level granularity over tens of years.
  • Core query: find the nearest interval (often in either direction) of at least duration X from a given time.
  • Scale: thousands of intervals; hundreds of thousands of queries in a real‑time, client‑side, interactive scheduler.
  • Current baseline: an ordered map (Rust BTreeMap<u32, u32>) that coalesces adjacent intervals; already “very fast” but still a bottleneck for larger instances.

Baseline structures and trade‑offs

  • Ordered map / sorted vector:
    • Many commenters argue a sorted array or ordered map is hard to beat for a few thousand intervals due to cache locality.
    • OP’s benchmarks: Vec<(u32,u32)> with binary search is roughly on par or slightly worse than BTreeMap.
  • Interval trees, R‑trees, KD‑trees, BVHs:
    • Often suggested, but multiple experiments (1D R‑tree, BVH‑like structures) failed to outperform a simple ordered map for this workload, especially with updates.
    • Interval trees are noted as better for overlapping intervals; here intervals are disjoint.

Structures exploiting extra properties

  • Priority search trees and range trees:
    • Proposed to handle queries with a bound in one dimension (time) and inequality in another (duration).
    • Range trees with per‑node “max gap” are highlighted as a way to avoid linear scans over many too‑small intervals and still get O(log n) queries.
  • Augmented trees / monoid‑cached trees:
    • Idea: store additional metadata per node (e.g., longest free/occupied block under that node) to jump directly to intervals long enough for duration X.
    • Requires custom tree or hacking a B‑tree; no ready‑made Rust library identified.

Bitsets, roaring bitmaps, and SIMD

  • Dense bitsets and roaring bitmaps are discussed as allocator‑like representations.
  • OP’s tests: dense bitsets underperformed the BTreeMap due to time spent searching within words; roaring may help in some sparsity regimes, but API and run‑length needs are a concern.
  • Several bit‑hack/SIMD schemes are proposed to find long runs of 1s, possibly aided by multiresolution summaries.

Higher‑level design ideas

  • Analogies to malloc and job‑shop/RCPSP scheduling: problem is structurally hard; data‑specific tuning may matter more than asymptotics.
  • Suggestions include:
    • Multi‑bucket free lists by size class (e.g., ≥2^n).
    • Coarse “summary” arrays or trees over time buckets to skip hopeless regions.
    • Separate structures for free vs used intervals or connected “clusters” of use.
  • Some skepticism that anything can significantly beat a well‑tuned B‑tree without custom, domain‑specific augmentation.