Blog Datasheets Home About me Clients My work Services Contact

G2Labs Grzegorz Grzęda

Understanding Data Sizes in C: Insights for the Embedded Programme

March 12, 2023

In the world of programming, especially when delving into the realms of C and embedded systems, an understanding of how much space your data occupies in memory is not just beneficial – it’s essential. This is particularly true in embedded systems where you’re often working with limited resources, and every byte counts.

Common Data Sizes in GCC (x64)

In the GCC compiler for a 64-bit system, we’re accustomed to certain data sizes:

1
2
3
4
sizeof(char);          // 1 byte
sizeof(short);         // 2 bytes
sizeof(int);           // 4 bytes
sizeof(long long int); // 8 bytes

This understanding forms the foundation for memory allocation in standard applications.

A Real-World Scenario: MSP430 CPU

Consider my experience with compiling code for the MSP430 CPU, a low-power microcontroller from Texas Instruments. I attempted to fit an array of 5000 characters into a 4kB RAM model of this CPU. Despite char being typically 1 byte, I consistently encountered linking errors. The .data section, essentially the stack, couldn’t accommodate my data.

The Misconception about sizeof

Here lies a crucial misunderstanding: sizeof() does not necessarily return the byte size of a type. As per the C standard, sizeof measures data size in “char-sized” storage units. While char is guaranteed to be 1, it represents the smallest unit of data the CPU architecture can handle, which is not always 1 byte.

In my MSP430 scenario, the 16-bit architecture meant that the smallest addressable unit was 16 bits. Therefore, declaring char t[5000] in a system with 4kB RAM was essentially trying to fit 10000 bytes into 4000 bytes - a clear impossibility.

What Does the ISO C Standard Say?

The ISO C - C99 standard states that the implementation of sizeof is vendor-specific. The size of data types like int, char, float, etc., can vary based on the compiler and the hardware architecture.

1
2
3
4
5
6
7
8
9
Sizes of data types (minimum sizes):

         char >= 1 byte
    short int >= 2 bytes
          int >= 2 bytes
     long int >= 4 bytes
long long int >= 8 bytes
        float >= 4 bytes
       double >= 8 bytes
  1. Use Fixed-Width Integer Types: Replace traditional data types with uint8_t, uint32_t, etc., from <stdint.h> introduced in C99.

  2. Runtime Checks: Use defines in <stdint.h> to check sizes and limits of data types at runtime.

  3. Custom Type Definitions: Create a types.h header file where you define custom types, tailored for each platform you work with.

Dealing with Structures and Pointers

Structures can be padded by the compiler for optimization, leading to unexpected size increases. For instance, a struct with a single 3-byte array might occupy 4 bytes due to alignment to 4-byte boundaries.

1
2
3
4
5
6
7
// Example in PC x64
typedef struct SomeStruct
{
    char aTable[3];
};

sizeof(SomeStruct); // Might be 4 bytes due to padding

You can use the __attribute__ ((packed)) in GCC to prevent padding, but this might impact performance and compatibility with existing code.

Pointers in Different Architectures

The size of a pointer is determined by the CPU’s address bus. For instance, in x64 architecture, it’s 8 bytes, while for MSP430 and AVR 8-bit microcontrollers, it’s 16 bytes.

Conclusion

In C programming, especially in embedded systems, understanding the true size of your data types is crucial. It’s not just about memory efficiency but about being aware of how your choice of data types can significantly impact your application’s functionality and compatibility across different architectures. Remember, in the world of low-level programming, not every char is just a byte.


➡️ Exploring the Single Responsibility Principle in depth


⬅️ Understanding the DRY principle: Don't Repeat Yourself


Go back to Posts.