No way to parse integers in C (2022)

State of the C standard library

  • Many commenters see C’s stdlib, especially string and number functions, as fundamentally unsafe or poorly designed (lack of bounds checking, locale issues, ambiguous errors).
  • Others argue the library is weak but acceptable if wrapped; “C is not its standard library,” and serious C projects often build their own safer utility layers.
  • There’s regret that C never got a widely adopted “Boost-like” common library or a single dominant package manager, leading to every shop reinventing utilities.

Integer parsing pitfalls in C

  • Built-ins like atoi/atol, strtol/strtoul/strtoull, and sscanf are criticized for:
    • Silent truncation / overflow, or using max values (e.g., ULONG_MAX) as sentinels.
    • Accepting negative input for unsigned parses and wrapping instead of erroring.
    • Stopping at first invalid character and returning a partial value (e.g. "123timmy").
    • Legacy behaviors like octal interpretation of leading 0.
  • A concrete example shows strtoull on large negative literals yielding small positive numbers by wraparound, which many consider simply “the wrong answer.”

Workarounds and alternatives

  • Common patterns: write your own parser, wrap stdlib functions, use return-code-plus-output-parameter APIs, or error via errno, negative codes, or abort.
  • Some propose pre-validating with regex or string comparison round-trips, though that’s seen as ugly or inefficient.
  • OpenBSD’s strtonum is noted as better but limited (whitespace handling, only signed long long).
  • Example custom parsers are shared; even those have subtle UB bugs pointed out (e.g., negating INT64_MIN).

Language design, UB, and portability

  • Strong criticism of UB: compilers can legally drop checks (e.g., null checks, overflow) leading to surprising crashes.
  • Debate over C’s “portable assembly” role; some argue its flexible integer sizes undermine true portability, others say efficiency justified the design historically.
  • One view: standard functions are lexeme scanners optimized for unbounded Unix text streams, not full validators; proper parsing should be a layer above.

Teaching and philosophy

  • Anecdotes: courses assigning “parse integers correctly” as a semester-long exercise to expose edge cases.
  • Split perspectives: some say the article nitpicks edge cases; others insist correct, unambiguous parsing is a baseline requirement, not perfectionism.