Unsigned sizes: A five year mistake
Signed vs unsigned for sizes and indices
- Many argue sizes/indices should be signed: subtraction is common, negative results should be representable or at least clearly erroneous.
- With unsigned indices, underflow silently wraps, making bugs hard to spot (e.g., reverse loops, “index before this one”).
- Others strongly prefer unsigned for sizes: sizes are inherently non-negative, and using signed wastes half the range and can limit addressable space or force larger types.
- Some see signed vs unsigned as less important than having good bounds checks and clear overflow semantics.
Unsigned semantics: modular vs “non‑negative”
- Several comments stress that “unsigned” in C-like languages means modular arithmetic (values are residues mod 2ⁿ), not “cannot be negative”.
- This mismatch between intuition (“non-negative”) and reality (wraparound) is cited as a core footgun.
- Some suggest languages should add true non‑negative integer types distinct from modular/bitfield types.
Overflow, undefined behavior, and diagnostics
- Signed overflow being undefined in C/C++ is seen as both a feature (sanitizers/traps can catch bugs) and a hazard (optimizers can delete or mangle code).
- Unsigned wraparound is well-defined but therefore harder to detect as an error automatically.
- Some rely on sanitizers and strict warnings (
-Wsign-conversion, traps on overflow) to make signed arithmetic safer than unsigned.
Language design comparisons
- C/C++: criticized for dangerous implicit conversions and unsigned defaults for sizes; defended as pragmatic and close to hardware.
- Rust: uses unsigned for sizes but forces explicit casts and has safe wrappers (
checked_*,wrapping_*,saturating_*), reducing silent bugs. - Go:
lenis signed; arithmetic is defined; bounds checks apply regardless of index type, making signed vs unsigned largely a non-issue. - Zig: distinguishes wrapping vs non-wrapping operations and enforces explicitness on modulo/overflow behavior.
- Java: mostly signed primitives; unsigned exposed via APIs; some miss native unsigned for bit-level work.
- Pascal/Ada: cited as examples with range types and true non-negative integers.
Use cases favoring unsigned/modular types
- Low-level and performance-sensitive domains (HPC graphics, simulation, bioinformatics, compression, succinct data structures) use unsigned to:
- Exploit full bit-width for indices and counters.
- Map cleanly onto bit patterns and modular algebra.
- Implement ring buffers, sequence numbers, and bit packing.
- Counterpoint: sizes of data structures and general program logic are argued to rarely need full unsigned range, and logic errors above signed max are common.
Higher-level abstractions for indexing
- Some suggest indices should be treated as opaque handles or custom types, not raw integers.
- Proposals include per-array index types, opaque structs that callers cannot do math on, or iterator-based patterns to avoid numeric indexing altogether.
Debate over C’s historical intent and consistency
- There is disagreement over whether C’s
unsignedwas always specified as modular and whether later standards introduced inconsistencies withsizeofand conversions. - One side claims the standard is internally inconsistent; another cites early documentation that explicitly defines modulo-2ⁿ behavior.
Miscellaneous
- A few comments criticize the blog’s light-grey-on-white typography; others note it becomes darker when JavaScript runs.