Print the byte representation of C objects

Statement of Figure 2.4
What does it do?
The byte_pointer typedef
Understanding the show_bytes arguments
How show_bytes works
Calling the show functions: address of
Calling the show functions: casting
Conclusion

Figure 2.4 of "Computer Systems: A programmer's perspective" shows how to print the byte representations of ints, floats, and a pointer. Partly because of how the book shows code before it explains it, and partly because I wanted to check my understanding of each line, this article goes into detail about everything in the figure.

Statement of Figure 2.4

Here's what Figure 2.4 says,

 1: #include <stdio.h>
 2: 
 3: typedef unsigned char *byte_pointer;
 4: 
 5: void show_bytes(byte_pointer start, size_t len) {
 6:   int i;
 7:   for (i = 0; i < len; i++)
 8:     printf(" %.2x", start[i]);
 9:   printf("\n");
10: }
11: 
12: void show_int(int x) {
13:   show_bytes((byte_pointer) &x, sizeof(int));
14: }
15: 
16: void show_float(float x) {
17:   show_bytes((byte_pointer) &x, sizeof(float));
18: }
19: 
20: void show_pointer(void *x) {
21:   show_bytes((byte_pointer) &x, sizeof(void *));
22: }

Figure 2.4 Code to print the byte representation of program objects. This code uses casting to circumvent the type system. Similar functions are easily defined for other data types.

What's going on here, exactly?

First, let's run it.

What does it do?

There are three functions defined, one to show the byte representation of an int, a float, and a void pointer. There isn't a main, so we make our own which creates automatic¹ variables for an int, a float, and a void pointer. We then call the functions using each of these values.

#include <stdio.h>

typedef unsigned char *byte_pointer;

void show_bytes(byte_pointer start, size_t len) {
  int i;
  for (i = 0; i < len; i++)
    printf(" %.2x", start[i]);
  printf("\n");
}

void show_int(int x) {
  show_bytes((byte_pointer) &x, sizeof(int));
}

void show_float(float x) {
  show_bytes((byte_pointer) &x, sizeof(float));
}

void show_pointer(void *x) {
  show_bytes((byte_pointer) &x, sizeof(void *));
}

int main() {
  int a;
  float f;
  void *v;

  a = 5;
  f = 3.0;

  show_int(a);
  show_float(f);
  show_pointer(v);
}

05 00 00 00
00 00 40 40
00 10 00 00 00 00 00 00

And it runs!

What does the output mean?

The book goes on to explain. The authors run each of the functions on different platforms and CPU architectures. The results show how byte order is swapped on different devices (big-endian versus little-endian). They pass in 12,345 which is (3)16³ + (0)16² + (3)16 + 9 = 0x00003039. Some of the results appear as 3039 while others are 3930. You wouldn't get that from just looking at my results.

How does it work?

Let's go through the code.

The `byte_pointer` typedef

The byte_pointer is a user defined alias for an unsigned pointer to char. The char type is always the smallest addressable unit on the host platform².

For me, the size of char is:

printf("The sizeof 'char' is %d byte(s).", sizeof(char));

The sizeof 'char' is 1 byte(s).

Instead of thinking of byte_pointer as pointing to a char, we can think of it as pointing to a byte (and hence the name).

What about signed versus unsigned? We ultimately want the hex representation of the object. Said differently, we want to know how the value is stored in memory. Recall that hex is used as a shorthand for binary. We could also do this using binary.

Since the figure is illustrating endianness, it's simpler to use positive numbers (no need to get into two's compliment). A byte is 8-bits which can either represent 2⁸=256 unsigned values or 2⁷=128 signed³. The type byte_pointer is unsigned, so the values pointed to range from [0, 255]. Remember, byte_pointer is a pointer. As seen above, pointers (on my system) are an 8-byte value–an address–that that gives the location in memory of a value. Anyway, byte_pointer is unsigned because we've decided to use a positive value.

So, where were we?

We've unpacked the first line of code:

3: typedef unsigned char *byte_pointer;

It defines an alias for a pointer to a byte.

Understanding the `show_bytes` arguments

The next line defines a function called show_bytes. It takes a byte_pointer to some start location and a len of size_t. What's size_t? K & R might have said it best,

"The type size_t is the unsigned integral type produced by the sizeof operator."

–K & R 2nd Ed., pg. 242

That may be the best way to say it…but what does it mean? For starters, size_t hasn't always been in C. It was introduced at some point for portablity⁴. The size_t type is unsigned which means no negatives allowed. It's also an integer, so it's good for counting. This makes sense since sizeof is used to count bytes. It actually turns out that there's no "one size fits all" type to best describe the size of an object⁵. This is why C provides size_t, which can be defined by the implementation.

"Each Standard C implementation is supposed to choose the unsigned integer that's big enough–but no bigger than needed–to represent the size of the largest possible object on the target platform.⁶"

You might also see people say that the sizeof of any object is limited by SIZE_MAX⁷. That doesn't tell us anything about the type, though.

Just for kicks, let's dig a little deeper. The size_t type is defined in several places, including <stddef.h>. We can use sizeof(size_t) to get its size. For me, it's

#include <stddef.h>

printf("The sizeof 'size_t' is %d byte(s).", sizeof(size_t));

The sizeof 'size_t' is 8 byte(s).

To be even more pedantic,

echo | gcc -E -xc -include 'stddef.h' - | grep size_t

typedef long unsigned int size_t;

How `show_bytes` works

Anyway, the show_bytes function takes a pointer to a byte and a length. Here's the function:

 1: #include <stdio.h>
 2: 
 3: typedef unsigned char *byte_pointer;
 4: 
 5: void show_bytes(byte_pointer start, size_t len) {
 6:   int i;
 7:   for (i = 0; i < len; i++)
 8:     printf(" %.2x", start[i]);
 9:   printf("\n");
10: }

The function walks along memory in byte-size increments, beginning at the byte pointed to by start and ending after len steps. At each step, it prints the value as hex ("x" is the string format code for hexadecimal⁸). Recall that array notation a[i] is shorthand for *(pa+i) where a is an array (a continuous block of memory) and pa is a pointer to the array. Pointers are constrained to pointing to a specific data type. For this reason, incrementing a pointer moves it "element-wise", per the type's size, through memory.

Calling the show functions: address of

The last part of the Figure 2.4 code defines several functions that call show_bytes:

12: void show_int(int x) {
13:   show_bytes((byte_pointer) &x, sizeof(int));
14: }
15: 
16: void show_float(float x) {
17:   show_bytes((byte_pointer) &x, sizeof(float));
18: }
19: 
20: void show_pointer(void *x) {
21:   show_bytes((byte_pointer) &x, sizeof(void *));
22: }

The arguments could use some explanation. The first argument is the start location which must be a byte_pointer. Memory addresses are obtained using the unary address of operator, "&". The K & R definition says,

The unary & operator takes the address of its operand.

–K & R 2nd Ed., pg. 203

I find the use of the word "takes" confusing. Often we use "takes" in terms of arguments to mean "requires". Here, I think it's just bad wording. I believe they mean "gets", as in "The unary & operator gets the address of its operand." For example,

int x = 1;
printf("The address of an int 'x' is: 0x%X", &x);

The address of an int 'x' is: 0xCD8576FC

The definition goes on to say,

If the type of the operand is T, the type of the result [of &] is "pointer to T."

–K & R 2nd Ed., pg. 203

In the previous example, since x was of type int, the result of &x is of type int*. In the case of Figure 2.4, the result of &x will be the type of whatever was passed into the function (i.e. int, float, or void pointer).

Calling the show functions: casting

The result of the address of operator is a pointer to whatever type the operand is. However, the first argument of show_bytes must be of type byte_pointer.

The authors do an explicit cast from the parameter's type to a byte_pointer. A cast is done by putting the type-name in parentheses to the left of the expression. In this case the expression is &x and the type we want to convert to is byte_pointer:

12: void show_int(int x) {
13:   show_bytes((byte_pointer) &x, sizeof(int));
14: }

It turns out that the cast would have been handled implicitly. To quote K & R,

"If arguments are declared by a function prototype, as they normally should be, the declaration causes the automatic coercion of any arguments when the function is called."

–K & R 2nd Ed., pg. 45

We can verify that this is the case by rerunning the code we originally used to test it, but this time removing the explicit casts.

#include <stdio.h>

typedef unsigned char *byte_pointer;

void show_bytes(byte_pointer start, size_t len) {
  int i;
  for (i = 0; i < len; i++)
    printf(" %.2x", start[i]);
  printf("\n");
}

void show_int(int x) {
  show_bytes(&x, sizeof(int));
}

void show_float(float x) {
  show_bytes(&x, sizeof(float));
}

void show_pointer(void *x) {
  show_bytes(&x, sizeof(void *));
}

int main() {
  int a;
  float f;
  void *v;

  a = 5;
  f = 3.0;

  show_int(a);
  show_float(f);
  show_pointer(v);
}

05 00 00 00
00 00 40 40
00 10 00 00 00 00 00 00

The results are basically the same. Only the pointer differs and that's because the pointer hasn't been assigned an address. Its contents are wherever was in the memory location when the space was allocated.

Conclusion

I think I beat that one to death. We got a mini-lesson on what size_t is used for and some of its properties. We reviewed how arrays and pointers are related. Finally, we found some stylistic wiggle room in whether or not to cast the arguments of functions.

I hope you learned something and had as much fun reading it as I did writing it. Was there anything I missed or got wrong? Let me know.

Footnotes:

"automatic" is C speak for "existing only within scope".

The smallest addressable unit may or may not be the word size. Wikipedia says the smallest addressable unit is at least 8-bits, but that's not really true. There are microcontrollers that have bit level addressing and, as far as I know, you could write code in C and compile for those systems.

Signed means we took one bit and are using it to indicate positive or negative.

⁴

I saw somewhere that size_t wasn't in the first edition of K & R from 1978. I have a 2nd edition which does contain size_t.

⁵

Dan Saks gives a great explanation of why size_t matters. He considers the memcpy function which copies bytes from an object beginning at one memory address into the object beginning at another. Depending on the type defining the length of bytes to copy, as well as the system architecture the function is run on, using a fixed type would result in less performant code or limitations on what can be copied.

https://www.embedded.com/why-size_t-matters/

⁶

⁷

SIZE_MAX is not in K & R. This is because the famed 2nd edition was published in 1989, well before C99.

https://en.cppreference.com/w/c/types/limits

⁸

To see binary is more complicated. The printf function doesn't have a converter for binary like it does for hex. It looks like you need to implement something yourself. That sounds like a fun idea for an article (but not this one :).

2022-04-04