G2Labs Grzegorz Grzęda
Understanding Data Sizes in C: Insights for the Embedded Programme
March 12, 2023
In the world of programming, especially when delving into the realms of C and embedded systems, an understanding of how much space your data occupies in memory is not just beneficial – it’s essential. This is particularly true in embedded systems where you’re often working with limited resources, and every byte counts.
Common Data Sizes in GCC (x64)
In the GCC compiler for a 64-bit system, we’re accustomed to certain data sizes:
This understanding forms the foundation for memory allocation in standard applications.
A Real-World Scenario: MSP430 CPU
Consider my experience with compiling code for the MSP430 CPU, a low-power microcontroller from Texas Instruments. I attempted to fit an array of 5000 characters into a 4kB RAM model of this CPU. Despite char
being typically 1 byte, I consistently encountered linking errors. The .data
section, essentially the stack, couldn’t accommodate my data.
The Misconception about sizeof
Here lies a crucial misunderstanding: sizeof()
does not necessarily return the byte size of a type. As per the C standard, sizeof
measures data size in “char-sized” storage units. While char
is guaranteed to be 1
, it represents the smallest unit of data the CPU architecture can handle, which is not always 1 byte.
In my MSP430 scenario, the 16-bit architecture meant that the smallest addressable unit was 16 bits. Therefore, declaring char t[5000]
in a system with 4kB RAM was essentially trying to fit 10000 bytes into 4000 bytes - a clear impossibility.
What Does the ISO C Standard Say?
The ISO C - C99 standard states that the implementation of sizeof
is vendor-specific. The size of data types like int
, char
, float
, etc., can vary based on the compiler and the hardware architecture.
Navigating This Ambiguity
Use Fixed-Width Integer Types: Replace traditional data types with
uint8_t
,uint32_t
, etc., from<stdint.h>
introduced in C99.Runtime Checks: Use defines in
<stdint.h>
to check sizes and limits of data types at runtime.Custom Type Definitions: Create a
types.h
header file where you define custom types, tailored for each platform you work with.
Dealing with Structures and Pointers
Structures can be padded by the compiler for optimization, leading to unexpected size increases. For instance, a struct with a single 3-byte array might occupy 4 bytes due to alignment to 4-byte boundaries.
You can use the __attribute__ ((packed))
in GCC to prevent padding, but this might impact performance and compatibility with existing code.
Pointers in Different Architectures
The size of a pointer is determined by the CPU’s address bus. For instance, in x64 architecture, it’s 8 bytes, while for MSP430 and AVR 8-bit microcontrollers, it’s 16 bytes.
Conclusion
In C programming, especially in embedded systems, understanding the true size of your data types is crucial. It’s not just about memory efficiency but about being aware of how your choice of data types can significantly impact your application’s functionality and compatibility across different architectures. Remember, in the world of low-level programming, not every char
is just a byte.