2025-07-30

Writing memory efficient C structs

Alignment, padding, and portability

Several commenters say the article’s “CPU needs 4-byte alignment” framing is oversimplified. Each primitive type has its own alignment; struct alignment is typically the max of its members, but all of this is implementation-defined.
Real-world examples show big variation: some CPUs/compilers enforce strict 4/8-byte packing and fault on misaligned access; x86 is more tolerant; some old/mainframe/embedded platforms have surprising size vs alignment relationships.
ABIs usually define alignment so different compilers can interoperate, but niche platforms sometimes have only one idiosyncratic compiler.
There’s disagreement on how much alignment still matters for performance on modern CPUs: some claim it’s largely irrelevant within a cache line; others cite measurements showing small or no gains, but still note edge cases (e.g., crossing cache-line boundaries, GPUs, special alignments).

Tools and language features

pahole/dwarves is highlighted as a “standard” tool (e.g., in kernel work) to inspect struct layout; newer clangd can show padding inline.
Other references include Beej’s guide and older struct-packing writeups.
Newer C/C++ features like _Float16, float16_t, and bfloat16_t are mentioned as additional levers for shrinking fields.

Bitfields vs bitmasks

Multiple comments stress that relying on bitfields to fill padding or have a specific layout is non-portable: packing, ordering, alignment, and even signedness are implementation-defined and sometimes ABI-specific.
Safer pattern suggested: use integer flag fields plus explicit masks (flags & CAN_FLY) when layout matters.
Bitfields are still used in some niches (embedded, memory-mapped I/O, binary protocols), usually with packed attributes, but people warn about:
- Non-atomic updates.
- Interaction with atomics and concurrency.
- Difficulty reasoning about exact bit positions.

Cache behavior, layout strategies, and ECS

Several argue the real win is often cache efficiency, not raw byte count. Smaller structs may help, but bitfields and tiny types can add overhead when loading into registers. Profiling is recommended.
Common advice:
- Group frequently accessed fields together (hot vs cold data).
- Sort fields by decreasing alignment, and cluster same-typed members to reduce padding.
- Consider struct-of-arrays (SoA) / columnar layouts instead of array-of-structs (AoS), especially when iterating over one field across many objects (e.g., all health values).
This naturally connects to Entity Component Systems and broader data-oriented design, which several commenters reference.

Safety, unsigned types, and concurrency

There’s a debate about using unsigned types for quantities like health/speed. One side cites C++ guidelines recommending signed types for arithmetic to avoid underflow surprises; others say choice should be case-specific.
Packed structs and tightly packed bitfields can worsen false sharing on multicore systems; explicit alignment / padding to cache-line size (or C++’s interference-size constants) is suggested when concurrency is a concern.

Education and misc.

Some are surprised such basic struct-layout material makes the front page; others note many developers are self-taught and never saw this in a course.
Various small corrections are noted: miscomputed padding, wrong powers-of-two text, and minor typos in the article.

Related topics