Different strlen depending on other variable assignment
Table of Contents
Statement of problem1
I'm using strlen
to get the length of a char array, sentence
.
When I run the following, the sentence
's length
is output as 12,
despite being only 10 bytes wide:
/* mre.c */ #include <stdio.h> #include <string.h> int main (void) { int length, i, beg; char sentence[10] = "this is 10"; length = strlen (sentence); printf ("length: %d\n", length); for (i = 0; i < length; i++) { beg = 0; } return 0; }
length: 12
When beg = 0;
is removed, it returns the expected result:
/* mre.c */ #include <stdio.h> #include <string.h> int main (void) { int length, i, beg; char sentence[10] = "this is 10"; length = strlen (sentence); printf ("length: %d\n", length); for (i = 0; i < length; i++) { /* beg = 0; */ } return 0; }
length: 10
I notice that if I print the sentence
a char at a time within a
shell within Emacs, I see two extra chars:
/* mre.c */ #include <stdio.h> #include <string.h> int main (void) { int length, i, beg; char sentence[10] = "this is 10"; length = strlen (sentence); printf ("length: %d\n", length); for (i = 0; i < length; i++) { beg = 0; printf ("%c", sentence[i]); } return 0; }
length: 12 this is 10^@^@
I'm at a loss for how to explain this.
Conclusion
The strange behavior happens because of undefined behavior. The
string lacks a null terminator and is, therefore, ill-defined. The
strlen
function doesn't know how to handle the bad string.
C strings
String constant
Quoting (gnu-c-manual) String Constants
(which is almost verbatim to
K & R),
A string constant is a sequence of zero or more characters, digits, and escape sequences enclosed within double quotation marks. A string constant is of type "array of characters". All string constants contain a null termination character ('\0') as their last character.
It goes on to say,
The null termination character lets string-processing functions know where the string ends.
There is no way to tell the length of a string without a null terminator. Library functions expecting a string expect a null terminator!
There are two common practices:
- allocate enough space to include the null terminator
/* one more than the number of characters */ char str[6] = "hello"; /* sometimes written as follows out of courtesy */ char str[5+1] = "hello";
- let the compiler allocate memory
When written without specifying the length, the compiler will automatically allocate the right amount.
/* compiler automatically allocates 6 bytes */ char str[] = "hello";
It's advised to "ask the compiler". That is, let the compiler
allocate space and then use sizeof
to get the size.
Two kinds of quotes
Single-quotes
Single quotes define character constants.
The constant has type
int
, and its value is the character code of that character.[Although] the character constant’s value has type
int
…the character code is treated initially as achar
value, which is then converted toint
. If the character code is greater than 127 (0177
in octal), the resultingint
may be negative on a platform where the typechar
is 8 bits long and signed.
Double-quotes – string constant (a.k.a. string literal).
A string constant has type "array of characters" (that is, char *
)
and storage class static
which means it persists across the exit of
a block. Its value should not be changed (that is, the array is
probably stored in a read-only area of memory). Trying to change a
string literal may result in SIGSEGV
or, if written to a location
that's not read-only, may cause unexpected results. Double quotes
tells the compiler to append a null terminator automatically.
Sometimes the null terminator is rendered as ^@
, other times it's
written like '\0'
or simply 0.
#include <stdio.h> char str[5+1] = "hello"; printf ("%c", str[4]); printf ("%c", str[5]);
o^@
strlen
The strlen
function,
returns the offset of the terminating null byte within the array
When no terminating null byte exists, you get undefined behavior. The function with continue scanning memory until it reaches something that terminates it–a null byte, a memory access violation (seg fault), or something else. It's impossible to tell what will happen.