Nearly all binary searches and mergesorts are broken (2006)
Scope of the Original Bug (Binary Search Overflow)
- Core issue:
mid = (low + high) / 2can overflow whenlow + high > INT_MAX, leading to wrong indices or exceptions. - Safer patterns discussed:
low + (high - low) / 2, using wider types (e.g., 64‑bit indices), or unsigned + logical shifts. - Some argue this is a clear bug because it fails for valid inputs of the declared type; others see it as acceptable if the intended scale is smaller and documented.
Array Sizes, Architectures, and Practicality
- Many comments contrast 32‑bit vs 64‑bit eras: when the article appeared (2006), arrays approaching 2³⁰ elements were rare; now large data is more plausible.
- Java arrays are limited by
intlength; some find this “sad,” others say if you get near 2³¹ elements, many other constraints break first. - Debate over whether using huge arrays is itself a “code smell” vs. entirely valid in domains like databases and scientific computing.
“Is It Really Broken?” vs Engineering Tradeoffs
- One camp: if a function’s type signature suggests it supports the full integer range, silently failing or throwing unexpected exceptions on large inputs is a real bug.
- Counterpoint: engineering targets realistic scales with safety margins; handling astronomically large edge cases can be unnecessary overhead, akin to overdesigning a building for meteor strikes.
- Agreement that if limits exist, they should be explicit and ideally enforced (e.g., argument checks, clear exceptions).
Languages, Types, and Tools
- Bounded integers are framed as a pervasive hazard; ints behave like modular rings, not mathematical integers.
- Languages with checked or arbitrary‑precision arithmetic (or better integer models) can avoid this specific failure, though memory remains finite.
- Examples mentioned:
size_t,ssize_t, Java’s constraints, JavaScript’s differing array and integer limits, Rust’s checked math and lints, Go’s standard library comment, C++20’sstd::midpoint. - Some advocate formal verification and property/property-based testing to catch overflow and bounds issues; others note experience and code review often suffice.
Meta: Age of Article and HN Mechanics
- Multiple comments stress the article is from 2006 and request the year in the title to avoid presenting it as a new discovery.
- Discussion about automating year detection and duplicate submission handling, possibly with simple crawlers or metadata, with or without AI.