Unsigned sizes: A five year mistake

Signed vs unsigned for sizes and indices

  • Many argue sizes/indices should be signed: subtraction is common, negative results should be representable or at least clearly erroneous.
  • With unsigned indices, underflow silently wraps, making bugs hard to spot (e.g., reverse loops, “index before this one”).
  • Others strongly prefer unsigned for sizes: sizes are inherently non-negative, and using signed wastes half the range and can limit addressable space or force larger types.
  • Some see signed vs unsigned as less important than having good bounds checks and clear overflow semantics.

Unsigned semantics: modular vs “non‑negative”

  • Several comments stress that “unsigned” in C-like languages means modular arithmetic (values are residues mod 2ⁿ), not “cannot be negative”.
  • This mismatch between intuition (“non-negative”) and reality (wraparound) is cited as a core footgun.
  • Some suggest languages should add true non‑negative integer types distinct from modular/bitfield types.

Overflow, undefined behavior, and diagnostics

  • Signed overflow being undefined in C/C++ is seen as both a feature (sanitizers/traps can catch bugs) and a hazard (optimizers can delete or mangle code).
  • Unsigned wraparound is well-defined but therefore harder to detect as an error automatically.
  • Some rely on sanitizers and strict warnings (-Wsign-conversion, traps on overflow) to make signed arithmetic safer than unsigned.

Language design comparisons

  • C/C++: criticized for dangerous implicit conversions and unsigned defaults for sizes; defended as pragmatic and close to hardware.
  • Rust: uses unsigned for sizes but forces explicit casts and has safe wrappers (checked_*, wrapping_*, saturating_*), reducing silent bugs.
  • Go: len is signed; arithmetic is defined; bounds checks apply regardless of index type, making signed vs unsigned largely a non-issue.
  • Zig: distinguishes wrapping vs non-wrapping operations and enforces explicitness on modulo/overflow behavior.
  • Java: mostly signed primitives; unsigned exposed via APIs; some miss native unsigned for bit-level work.
  • Pascal/Ada: cited as examples with range types and true non-negative integers.

Use cases favoring unsigned/modular types

  • Low-level and performance-sensitive domains (HPC graphics, simulation, bioinformatics, compression, succinct data structures) use unsigned to:
    • Exploit full bit-width for indices and counters.
    • Map cleanly onto bit patterns and modular algebra.
    • Implement ring buffers, sequence numbers, and bit packing.
  • Counterpoint: sizes of data structures and general program logic are argued to rarely need full unsigned range, and logic errors above signed max are common.

Higher-level abstractions for indexing

  • Some suggest indices should be treated as opaque handles or custom types, not raw integers.
  • Proposals include per-array index types, opaque structs that callers cannot do math on, or iterator-based patterns to avoid numeric indexing altogether.

Debate over C’s historical intent and consistency

  • There is disagreement over whether C’s unsigned was always specified as modular and whether later standards introduced inconsistencies with sizeof and conversions.
  • One side claims the standard is internally inconsistent; another cites early documentation that explicitly defines modulo-2ⁿ behavior.

Miscellaneous

  • A few comments criticize the blog’s light-grey-on-white typography; others note it becomes darker when JavaScript runs.