Linux eliminates the strncpy API after six years of work, 360 patches
Null-Terminated Strings vs Length-Prefixed / “Real” String Types
- Many argue NUL-terminated C strings are one of computing’s worst design choices: easy to overflow, hard to reason about, and performance traps (e.g., repeated
strlenturning O(N) into O(N²)). - Others note they were a pragmatic trade-off on tiny 1970s machines; sentinel-terminated sequences (strings, pointer arrays, linked lists) were cheap and matched assembly practice.
- Pascal-style or length-prefixed strings avoid missing-terminator bugs but bring issues: fixed-size length fields (historically 255 chars), confusion between bytes/codepoints/glyphs, and difficulty doing substrings without copying unless you move to “fat pointers” (pointer+length).
- Modern variants discussed: D’s
struct { size_t length; T* ptr; }, BSTRs and Pascal/Delphi/Free Pascal headers, dynamic/variable-length length encodings, and “span”/slice types. Trade-offs: memory overhead vs speed vs zero-copy slicing.
NULL, NUL, and Option Types
- Thread repeatedly distinguishes NUL (byte 0 terminator) from NULL (invalid pointer).
- Some see NULL as fine and necessary; the real problem is not being able to declare non-null pointers.
- Others emphasize modern “option”/sum types (e.g.,
Option<T>) as the right way to represent “unset” values, enforced by the type system at compile time. - Debate over whether low-level environments can or should replace NULL with such constructs versus using them as a higher-level abstraction that compiles down to null-able representations.
Historical and Standards Context
- Several comments stress C’s origins: tiny RAM (tens of KiB), single-pass compilers, PDP-11 addressing limits, and reuse of existing idioms. In that context, null-terminated strings and pointer/array decay were seen as clever compromises, not mistakes.
- Later proposals like fat pointers/slices and safer APIs (e.g.,
strlcpy) are cited as missed opportunities; committees are criticized for adding complex features (VLAs,_Generic) while leaving the core C string model and stdlib essentially frozen.
strncpy, Kernel APIs, and Safety
strncpyis described as widely misused and counterintuitive: it pads with zeros, may not NUL-terminate when truncating, and was meant originally for fixed-size, padded fields (e.g., old directory entries), not as a “safestrcpy”.- Kernel’s replacement functions (
strscpy,strscpy_pad,strtomem_pad,memcpy_and_pad,memcpy) encode clearer intent (terminated vs non-terminated, padding vs raw copy), at the cost of more APIs but better safety and performance signaling. - Some find this proliferation “convoluted”; others argue a single “Swiss army knife” would be slower and blur intent.
AI and Automated Refactoring
- One line of discussion asks whether LLM-based coders could have automated most of the
strncpyremoval work. - Supporters report good experiences using LLMs to refactor many C programs and argue six years is too long for essentially mechanical changes.
- Skeptics counter that in a kernel-scale project, the main bottlenecks are review, coordination, and testing, not just editing; subtle downstream behavior and reliance on old semantics must be carefully audited.
- There is visible fatigue with AI being injected into every programming discussion, alongside curiosity about its potential for large-scale refactors.