Understanding while(*s++ = *t++)
Consider the following implementation of
K & R says the idiom of
strcpy should be mastered. So does
Joel1. Indeed, understanding it requires a surprisingly deep
knowledge of C.
Order of evaluation
The actual order of evaluation is:
- postfix increment
The effective order is:
The first thing evaluated is the while expression. Appendix A9.5 of K & R says that the substatement will execute repeatedly so long as the expression remains unequal to 0.2 Since the substatement doesn't exist, everything here happens in the expression.
The expression is
It involves two variables,
t, and three operators,
*), increment (
++), and assignment (=). The C spec
fully specifies the precedence and associativity of operators, or
the sequence in which operators are evaluated. The evaluation order
of operands is largely undefined.3 Fortunately, since
dereference and increment are unary operators (they have one operand),
this consideration only applies to the assignment.
How postfix-increment operates in this context is a little tricky. At
first glance, it seems that by immediately incrementing
we would not copy the first character. Consider the following6,
f t i h
Here, we incremented the value and then dereferenced. This is
fundamentally different from how
*s++ = *t++ works! Postfix-
increment evaluates first and then increments after the value is
used. So, it is the current value of
t that is
dereferenced8. The assignment then occurs, putting the value of
s and the new value of
s (which is
returned9. Finally, the increment of
t happens. Since
t are pointers, incrementing moves them according to their
type. As pointers to char, they point at the next respective char in
The process repeats.
while checks the expression, the result of the
assignment. If it's the end of the string, \0 or 0, the loop
Let's see it all in action:
Before copy: first sentence the second one After copy: the second one the second one
A9.5 Iteration StatementsIteration statements specify looping. iteration-statement: while (expression) statement do statement while (expression); for (expression_opt; expression_opt; expression_opt) statement
In the while and do statements, the substatement is executed repeatedly so long as the value of the expression remains unequal to 0; the expression must have arithmetic or pointer type. With while, the test, including all side effects from the expression, occurs before each execution of the statement; with do, the test follows each iteration.
The precedence and associativity of operators is fully specified, but the order of evaluation of expressions is, with certain exceptions, undefined, even if the subexpressions involve side effects. That is, unless the definition of the operator guarantees that its operands are evaluated in a particular order, the implementation is free to evaluate operands in any order, or even to interleave their evaluation.
The precedence of expression operators is the same as the order of the major subsections of this section, highest precedence first.
Dereference is also called "indirection" and that's how it's referred to in the specification. The relevant sections are:
A7.3 Postfix Expressions A7.3.4 Postfix Incrementation
A7.4 Unary Operators A7.4.3 Indirection Operator
A7.17 Assignment Expressions
It's not obvious why things need to be defined this way. First, since we want to modify the arrays (by copying one into the other), we must define them using array syntax.
This creates an array which is initialized using the string literal "first sentence".
Second, the name of the array,
s, is not a variable. Even though
it decays to a pointer to the array's initial element, it's not a
modifiable lvalue7. We can't operate on the array name as though it
were a pointer. Things like
s++ are undefined behavior and will
cause problems like segmentation faults (or worse). It's not entirely
clear why C is specified this way. The reason is likely historical.
To modify the array, we need to use either array indexing or a pointer
to the array. In the case of our function, we can use the name
because when it's passed to a function, the pointer is copied, we're
able to modify the copy directly, and it's an lvalue.
We may be tempted to declare the string explicitly using a pointer,
This creates a pointer to a string literal. However, string literals
static. Their values are retained across block exits. This
means we can't modify the object the pointer references.
An lvalue is an addressable object, something that can appear on the left side of an assignment. An rvalue is anything else.
It's a little confusing because the spec says that the result of
postfix incrementation is not an lvalue (A7.3.4). It's unclear
what's meant by "result". Is it the value of the operand, such as
t, which are objects and lvalues? Or, is it the incremented
thing, whatever that is?
Indirection doesn't explicitly require an lvalue. Instead, indirection results in an lvalue if the operand is a pointer to an object of arithmetic, structure, union, or pointer type (A7.4.3). Again, it's unclear if indirection results in an lvalue.
Assignment requires an lvalue as its left operand. So, the result of the indirection must be an lvalue. This implies that the operand of the indirection is a pointer to an object of one of the listed types. Which one?
A7.17 Assignment ExpressionsThere are several assignment operators; all group right-to-left. assignment-expression: conditional-expression unary-expression assignment-operator assignment-expression assignment-operator: one of = *= /= %= += -= <<= >>= &= ^= |=
All require an lvalue as left operand, and the lvalue must be modifiable…
The type of an assignment expression is that of its left operand, and the value is the value stored in the left operand after the assignment has taken place.