2025-01-11

Nearly all binary searches and mergesorts are broken (2006)

Scope of the Original Bug (Binary Search Overflow)

Core issue: mid = (low + high) / 2 can overflow when low + high > INT_MAX, leading to wrong indices or exceptions.
Safer patterns discussed: low + (high - low) / 2, using wider types (e.g., 64‑bit indices), or unsigned + logical shifts.
Some argue this is a clear bug because it fails for valid inputs of the declared type; others see it as acceptable if the intended scale is smaller and documented.

Array Sizes, Architectures, and Practicality

Many comments contrast 32‑bit vs 64‑bit eras: when the article appeared (2006), arrays approaching 2³⁰ elements were rare; now large data is more plausible.
Java arrays are limited by int length; some find this “sad,” others say if you get near 2³¹ elements, many other constraints break first.
Debate over whether using huge arrays is itself a “code smell” vs. entirely valid in domains like databases and scientific computing.

“Is It Really Broken?” vs Engineering Tradeoffs

One camp: if a function’s type signature suggests it supports the full integer range, silently failing or throwing unexpected exceptions on large inputs is a real bug.
Counterpoint: engineering targets realistic scales with safety margins; handling astronomically large edge cases can be unnecessary overhead, akin to overdesigning a building for meteor strikes.
Agreement that if limits exist, they should be explicit and ideally enforced (e.g., argument checks, clear exceptions).

Languages, Types, and Tools

Bounded integers are framed as a pervasive hazard; ints behave like modular rings, not mathematical integers.
Languages with checked or arbitrary‑precision arithmetic (or better integer models) can avoid this specific failure, though memory remains finite.
Examples mentioned: size_t, ssize_t, Java’s constraints, JavaScript’s differing array and integer limits, Rust’s checked math and lints, Go’s standard library comment, C++20’s std::midpoint.
Some advocate formal verification and property/property-based testing to catch overflow and bounds issues; others note experience and code review often suffice.

Meta: Age of Article and HN Mechanics

Multiple comments stress the article is from 2006 and request the year in the title to avoid presenting it as a new discovery.
Discussion about automating year detection and duplicate submission handling, possibly with simple crawlers or metadata, with or without AI.

Related topics