Print the byte representation of C objects
Table of Contents
Figure 2.4 of "Computer Systems: A programmer's perspective" shows how to print the byte representations of ints, floats, and a pointer. Partly because of how the book shows code before it explains it, and partly because I wanted to check my understanding of each line, this article goes into detail about everything in the figure.
Statement of Figure 2.4
Here's what Figure 2.4 says,
1: #include <stdio.h> 2: 3: typedef unsigned char *byte_pointer; 4: 5: void show_bytes(byte_pointer start, size_t len) { 6: int i; 7: for (i = 0; i < len; i++) 8: printf(" %.2x", start[i]); 9: printf("\n"); 10: } 11: 12: void show_int(int x) { 13: show_bytes((byte_pointer) &x, sizeof(int)); 14: } 15: 16: void show_float(float x) { 17: show_bytes((byte_pointer) &x, sizeof(float)); 18: } 19: 20: void show_pointer(void *x) { 21: show_bytes((byte_pointer) &x, sizeof(void *)); 22: }
Figure 2.4 Code to print the byte representation of program objects. This code uses casting to circumvent the type system. Similar functions are easily defined for other data types.
What's going on here, exactly?
First, let's run it.
What does it do?
There are three functions defined, one to show the byte representation
of an int, a float, and a void pointer. There isn't a main
, so we
make our own which creates automatic1 variables for an int, a
float, and a void pointer. We then call the functions using each of
these values.
#include <stdio.h> typedef unsigned char *byte_pointer; void show_bytes(byte_pointer start, size_t len) { int i; for (i = 0; i < len; i++) printf(" %.2x", start[i]); printf("\n"); } void show_int(int x) { show_bytes((byte_pointer) &x, sizeof(int)); } void show_float(float x) { show_bytes((byte_pointer) &x, sizeof(float)); } void show_pointer(void *x) { show_bytes((byte_pointer) &x, sizeof(void *)); } int main() { int a; float f; void *v; a = 5; f = 3.0; show_int(a); show_float(f); show_pointer(v); }
05 00 00 00 00 00 40 40 00 10 00 00 00 00 00 00
And it runs!
What does the output mean?
The book goes on to explain. The authors run each of the functions on different platforms and CPU architectures. The results show how byte order is swapped on different devices (big-endian versus little-endian). They pass in 12,345 which is (3)163 + (0)162 + (3)16 + 9 = 0x00003039. Some of the results appear as 3039 while others are 3930. You wouldn't get that from just looking at my results.
How does it work?
Let's go through the code.
The byte_pointer
typedef
The byte_pointer
is a user defined alias for an unsigned pointer to
char. The char type is always the smallest addressable unit on the
host platform2.
For me, the size of char is:
printf("The sizeof 'char' is %d byte(s).", sizeof(char));
The sizeof 'char' is 1 byte(s).
Instead of thinking of byte_pointer
as pointing to a char, we can
think of it as pointing to a byte (and hence the name).
What about signed versus unsigned? We ultimately want the hex representation of the object. Said differently, we want to know how the value is stored in memory. Recall that hex is used as a shorthand for binary. We could also do this using binary.
Since the figure is illustrating endianness, it's simpler to use
positive numbers (no need to get into two's compliment). A byte is
8-bits which can either represent 28=256 unsigned values or 27=128
signed3. The type byte_pointer
is unsigned, so the values
pointed to range from [0, 255]. Remember, byte_pointer
is a
pointer. As seen above, pointers (on my system) are an 8-byte
value–an address–that that gives the location in memory of a value.
Anyway, byte_pointer
is unsigned because we've decided to use a
positive value.
So, where were we?
We've unpacked the first line of code:
3: typedef unsigned char *byte_pointer;
It defines an alias for a pointer to a byte.
Understanding the show_bytes
arguments
The next line defines a function called show_bytes
. It takes a
byte_pointer
to some start
location and a len
of size_t
.
What's size_t
? K & R might have said it best,
"The type
size_t
is the unsigned integral type produced by thesizeof
operator."–K & R 2nd Ed., pg. 242
That may be the best way to say it…but what does it mean? For
starters, size_t
hasn't always been in C. It was introduced at some
point for portablity4. The size_t
type is unsigned which means
no negatives allowed. It's also an integer, so it's good for
counting. This makes sense since sizeof
is used to count bytes. It
actually turns out that there's no "one size fits all" type to best
describe the size of an object5. This is why C provides
size_t
, which can be defined by the implementation.
"Each Standard C implementation is supposed to choose the unsigned integer that's big enough–but no bigger than needed–to represent the size of the largest possible object on the target platform.6"
You might also see people say that the sizeof
of any object is
limited by SIZE_MAX
7. That doesn't tell us anything about the
type, though.
Just for kicks, let's dig a little deeper. The size_t
type is
defined in several places, including <stddef.h>
. We can use
sizeof(size_t)
to get its size. For me, it's
#include <stddef.h> printf("The sizeof 'size_t' is %d byte(s).", sizeof(size_t));
The sizeof 'size_t' is 8 byte(s).
To be even more pedantic,
echo | gcc -E -xc -include 'stddef.h' - | grep size_t
typedef long unsigned int size_t;
How show_bytes
works
Anyway, the show_bytes
function takes a pointer to a byte and a
length. Here's the function:
1: #include <stdio.h> 2: 3: typedef unsigned char *byte_pointer; 4: 5: void show_bytes(byte_pointer start, size_t len) { 6: int i; 7: for (i = 0; i < len; i++) 8: printf(" %.2x", start[i]); 9: printf("\n"); 10: }
The function walks along memory in byte-size increments, beginning at
the byte pointed to by start
and ending after len
steps. At each
step, it prints the value as hex ("x" is the string format code for
hexadecimal8). Recall that array notation a[i]
is shorthand
for *(pa+i)
where a
is an array (a continuous block of memory) and
pa
is a pointer to the array. Pointers are constrained to pointing
to a specific data type. For this reason, incrementing a pointer
moves it "element-wise", per the type's size, through memory.
Calling the show functions: address of
The last part of the Figure 2.4 code defines several functions that
call show_bytes
:
12: void show_int(int x) { 13: show_bytes((byte_pointer) &x, sizeof(int)); 14: } 15: 16: void show_float(float x) { 17: show_bytes((byte_pointer) &x, sizeof(float)); 18: } 19: 20: void show_pointer(void *x) { 21: show_bytes((byte_pointer) &x, sizeof(void *)); 22: }
The arguments could use some explanation. The first argument is the
start
location which must be a byte_pointer
. Memory addresses are
obtained using the unary address of operator, "&". The K & R
definition says,
The unary & operator takes the address of its operand.
–K & R 2nd Ed., pg. 203
I find the use of the word "takes" confusing. Often we use "takes" in terms of arguments to mean "requires". Here, I think it's just bad wording. I believe they mean "gets", as in "The unary & operator gets the address of its operand." For example,
int x = 1; printf("The address of an int 'x' is: 0x%X", &x);
The address of an int 'x' is: 0xCD8576FC
The definition goes on to say,
If the type of the operand is T, the type of the result [of &] is "pointer to T."
–K & R 2nd Ed., pg. 203
In the previous example, since x
was of type int, the result of &x
is of type int*
. In the case of Figure 2.4, the result of &x
will
be the type of whatever was passed into the function (i.e. int, float,
or void pointer).
Calling the show functions: casting
The result of the address of operator is a pointer to whatever type
the operand is. However, the first argument of show_bytes
must be
of type byte_pointer
.
The authors do an explicit cast from the parameter's type to a
byte_pointer
. A cast is done by putting the type-name in
parentheses to the left of the expression. In this case the
expression is &x
and the type we want to convert to is
byte_pointer
:
12: void show_int(int x) { 13: show_bytes((byte_pointer) &x, sizeof(int)); 14: }
It turns out that the cast would have been handled implicitly. To quote K & R,
"If arguments are declared by a function prototype, as they normally should be, the declaration causes the automatic coercion of any arguments when the function is called."
–K & R 2nd Ed., pg. 45
We can verify that this is the case by rerunning the code we originally used to test it, but this time removing the explicit casts.
#include <stdio.h> typedef unsigned char *byte_pointer; void show_bytes(byte_pointer start, size_t len) { int i; for (i = 0; i < len; i++) printf(" %.2x", start[i]); printf("\n"); } void show_int(int x) { show_bytes(&x, sizeof(int)); } void show_float(float x) { show_bytes(&x, sizeof(float)); } void show_pointer(void *x) { show_bytes(&x, sizeof(void *)); } int main() { int a; float f; void *v; a = 5; f = 3.0; show_int(a); show_float(f); show_pointer(v); }
05 00 00 00 00 00 40 40 00 10 00 00 00 00 00 00
The results are basically the same. Only the pointer differs and that's because the pointer hasn't been assigned an address. Its contents are wherever was in the memory location when the space was allocated.
Conclusion
I think I beat that one to death. We got a mini-lesson on what
size_t
is used for and some of its properties. We reviewed how
arrays and pointers are related. Finally, we found some stylistic
wiggle room in whether or not to cast the arguments of functions.
I hope you learned something and had as much fun reading it as I did writing it. Was there anything I missed or got wrong? Let me know.
Footnotes:
"automatic" is C speak for "existing only within scope".
The smallest addressable unit may or may not be the word size. Wikipedia says the smallest addressable unit is at least 8-bits, but that's not really true. There are microcontrollers that have bit level addressing and, as far as I know, you could write code in C and compile for those systems.
Signed means we took one bit and are using it to indicate positive or negative.
I saw somewhere that size_t
wasn't in the first edition of K
& R from 1978. I have a 2nd edition which does contain size_t
.
Dan Saks gives a great explanation of why size_t
matters. He
considers the memcpy
function which copies bytes from an object
beginning at one memory address into the object beginning at another.
Depending on the type defining the length of bytes to copy, as well as
the system architecture the function is run on, using a fixed type
would result in less performant code or limitations on what can be
copied.
SIZE_MAX
is not in K & R. This is because the famed 2nd
edition was published in 1989, well before C99.
To see binary is more complicated. The printf
function
doesn't have a converter for binary like it does for hex. It looks
like you need to implement something yourself. That sounds like a fun
idea for an article (but not this one :).