No way to parse integers in C (2022)
State of the C standard library
- Many commenters see C’s stdlib, especially string and number functions, as fundamentally unsafe or poorly designed (lack of bounds checking, locale issues, ambiguous errors).
- Others argue the library is weak but acceptable if wrapped; “C is not its standard library,” and serious C projects often build their own safer utility layers.
- There’s regret that C never got a widely adopted “Boost-like” common library or a single dominant package manager, leading to every shop reinventing utilities.
Integer parsing pitfalls in C
- Built-ins like
atoi/atol,strtol/strtoul/strtoull, andsscanfare criticized for:- Silent truncation / overflow, or using max values (e.g.,
ULONG_MAX) as sentinels. - Accepting negative input for unsigned parses and wrapping instead of erroring.
- Stopping at first invalid character and returning a partial value (e.g.
"123timmy"). - Legacy behaviors like octal interpretation of leading
0.
- Silent truncation / overflow, or using max values (e.g.,
- A concrete example shows
strtoullon large negative literals yielding small positive numbers by wraparound, which many consider simply “the wrong answer.”
Workarounds and alternatives
- Common patterns: write your own parser, wrap stdlib functions, use return-code-plus-output-parameter APIs, or error via
errno, negative codes, or abort. - Some propose pre-validating with regex or string comparison round-trips, though that’s seen as ugly or inefficient.
- OpenBSD’s
strtonumis noted as better but limited (whitespace handling, only signed long long). - Example custom parsers are shared; even those have subtle UB bugs pointed out (e.g., negating
INT64_MIN).
Language design, UB, and portability
- Strong criticism of UB: compilers can legally drop checks (e.g., null checks, overflow) leading to surprising crashes.
- Debate over C’s “portable assembly” role; some argue its flexible integer sizes undermine true portability, others say efficiency justified the design historically.
- One view: standard functions are lexeme scanners optimized for unbounded Unix text streams, not full validators; proper parsing should be a layer above.
Teaching and philosophy
- Anecdotes: courses assigning “parse integers correctly” as a semester-long exercise to expose edge cases.
- Split perspectives: some say the article nitpicks edge cases; others insist correct, unambiguous parsing is a baseline requirement, not perfectionism.