Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementers of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091 or via UUCP at uunet!aussie!rex or aussie!rex@uunet.uu.net.
Every useful C program contains statements which in turn, contain expressions. According to ANSI C "An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof." As such, the C programs you write are full of operators and their operands.
In this issue, I'll look at some of the more interesting aspects of C's operators and the operator precedence table.
The Operator Precedence Table
In many languages such things as type conversion, procedure invocation, and array subscription are achieved via keywords, punctuators, or intrinsic functions. Not so with C, however. One of the most elegant aspects of C is that all operations are performed via operators. So by mastering the operator precedence table you can understand how to build expressions, then statements, and finally, programs.Operators in C are given precedence "values" according to their position in Table 1. (It is impossible to accurately represent this table without using footnotes or other such comments so these will follow below.)
Each operator is shown only once in the table. However, some operators are overloaded and therefore, appear as many times as necessary. (For example, the unary indirection operator * is in the second row while the binary multiplication * is in the third row.) Note also, that the postfix versions of ++ and -- have higher precedence than their prefix counterparts with the former being in row one and the latter in row two.
Let's see how to read the table correctly by using the following expression:
a = b + c * dThis expression contains the three operators: assignment, addition, and multiplication. According to the precedence table multiplication has the highest precedence (it's in the row closest to the top, row three), followed by addition in row four, and finally by assignment in row 14. The table indicates the precedence of these operators in the absence of grouping parentheses. Let's rewrite this expression explicitly showing the grouping dictated by the table.
(a = (b + (c * d)))Of course, you can always use grouping parentheses yourself to either document or change the default precedence. For example:
((a = (b + c)) * d)produces an entirely different result.Since most table rows contain more than one operator it is possible to have an expression containing operators at the same precedence. To resolve this, we must look at the associativity column. This column tells us whether the operators in that row associate left-to-right or right-to-left in the expression as we have written it not as the operators exist in the table's row. For example:
x * y / z a = b = ccan be rewritten as:
((x * y) / z) (a = (b = c))Even though multiplication and division have the same precedence we generally say that multiplication has higher precedence in this expression since these operators associate left-to-right.The precedence table is used by the compiler to construct an expression tree and has nothing whatsoever to do with order of evaluation. It is a very common mistake to say "order of evaluation" when you really mean "precedence." Consider the following example:
a = b + c * dThe precedence of these operators is clear from the table. However, the order of evaluation of these operators is unspecified by the language. In this case it doesn't matter since the only side effect is the assignment and that can only be done after all the other expressions are evaluated. Let's use a different version of this same expression to show the problem.
((*(a())) = (b() + (c() * d())))What is the order of evaluation of the function call side effects? That is, in what order are the four functions called? The order is undefined and cannot be made explicit via grouping parentheses.
Postfix Operators
The first row of the table contains the six postfix operators. C uses an operator to call a function. It's an unusual operator in that it consists of a postfix set of parentheses () which contain a possibly empty set of comma separated expressions having an object type. To call a function one simply names it and follows it with the function-call operator and an argument list. Since function calling is achieved via a run-time operator the operands' values need only be known at runtime. Specifically, the name of the function is not needed at compile time. What is needed is an expression that designates a function. This then permits a function to be called indirectly using a pointer to a function. For example:
(*jmptable[i] ) ()calls the function pointed to by the ith element in an array of function pointers called jmptable. The function actually called depends on the run-time value of the index i. This approach (and its corresponding flexibility) is not possible in most high-level languages.Similarly, the subscript operator [] is much more flexible in C than most other languages. It requires that one of its operands have an object pointer type (so that excludes pointers to incomplete and function types) and the other have an integer type. It does not require that the name of an array be present. Since subscripting is nothing more than dereferencing an object at some integer offset from a base address, only a base pointer is needed. As a consequence, you may arbitrarily subscript a pointer expression to one level. (Of course, whether the resulting expression produces defined behavior or not, depends on whether you go outside the bounds of the object to which you are pointing.)
The ability to subscript a pointer gives rise to the identity
a[i] is equivalent to *(a + i)It also allows expressions such as "abcd"[i] and f() [j]. One of the biggest advantages of subscripting pointers is that space allocated by malloc and friends can be treated the same as that allocated statically or automatically. Subscripting pointers also permits multidimensional arrays to be referenced with less than the maximum number of dimensions. For example, given int i [5] [4] [6];
i i [2] i [2] [3] i [2] [3] [4]are all valid expressions.Note that the order of the operands of [] is unspecified. That is, a[i] and i [a] are equivalent. This is not to suggest that you should write 2[i] instead of i[2], however, both are valid expressions under K&R as well as ANSI C. Some mainstream compilers will not accept 2[i] and while this in itself is not a big problem, it can make you suspicious about other possible (and illegal) shortcuts the compiler writers may have made. With [] being commutative some other interesting expressions are possible. For example:
a[i] [j] [k] j[i[a]] [k] k[j[i[a]]]are all equivalent.The member selection operators --> and . can always be written in terms of each other. For example, s.m is the same as (&s)-->m and ps-->m is equivalent to (*ps).m. The dot operator requires its left operand to designate a structure although not necessarily by name. This makes expressions like f( ).m and (*g( )).m possible.
ANSI C clearly indicates that the postfix versions of ++ and -- have different precedence than their prefix counterparts. Historically, both prefix and postfix versions were combined in the second row. However, this gave rise to problems when faced with expressions such as:
ps++-->mIf the --> has higher precedence that ++, this expression is ill-formed. However, many compilers treated it like
(ps++)-->mBy making postfix ++ and -- the same precedence as -->, this problem was resolved since postfix operators associate left-to-right giving the same grouping as was previously assumed by these compilers. The promotion of postfix ++ and -- from row two to row one should break no existing code. It either now sanctions what your compiler might already be doing or it allows expressions previously rejected by your compiler.
Unary Operators
All of the unary operators are in row two of the table. They are all prefix operators.The unary plus was an ANSI C invention. Originally, it had special grouping semantics but these were removed in a later draft version of the standard. The result of the unary plus operator is the value of its operand. The integral promotion is performed on the operand, and the result has the promoted type.
An expression such as -32768 consists of two source tokens; the unary minus operator and the integer constant 32768. Note there is no such thing as a negative constant in C. The constant is non-negative and it is preceded by a unary minus operator. An interesting situation exists on 16-bit twos-complement machines where -32768 is the smallest value that can be stored in an int. It so happens, that the type of -32768 when written in this form is not int; it's long int, but that's another story.
ANSI C has now made it possible to take the address of an array by using something like &arrayname. Historically, compilers treated this as though you had written &arrayname[0], although some rejected it outright. (For a detailed discussion on pointers to arrays, refer to my column in the May 1990 issue of The C Users Journal.)
In the early days of C, structures and unions were always passed by address just like arrays. However, once structure and union passing by value was introduced, an explicit & was needed to construct the address of a structure. Some compilers issue a warning message if you pass a structure or union by value since they "believe" you might have omitted the & by mistake.
To call a function pointed to by a function pointer, you use the syntax:
(*funptr) (arg-list)ANSI C also permits this to be written as:
funptr (arg-list)so it looks like a "regular" function call. This is quite reasonable since an expression that designates a function is converted to a pointer to that function and as a result, the following expressions are equivalent.
printf ("Hi there\n"); (*printf) ("Hi there\n"); (**printf) ("Hi there\n"); (***printf) ("Hi there\n");The sizeof operator is rather unusual in that it uses a keyword rather than a symbol and it is evaluated at compile time. A sizeof expression produces a value of type size_t, an unsigned integer type defined in the standard headers stddef.h, stdio.h, stdlib.h, string.h, and time.h. This value represents the size in bytes of an object having the type specified. sizeof can only be applied to expressions having an object type. (It cannot be used with expression having incomplete or function type.) sizeof has two forms:
sizeof expression sizeof (type)Programmers almost always use parentheses even when they are not required. For example, in sizeof(i), the parentheses are redundant grouping parentheses whereas in sizeof(int) they are a necessary part of the syntax. (For a detailed discussion on sizeof, refer to my column in the February 1988 issue of CUJ.)The cast operator can be used to convert an expression of one type to another with the following restrictions. You cannot cast to or from a structure or union type. You cannot cast to an array or function type. When you cast from an array or function type, the operand is first converted to a pointer. You may cast an expression to its own type and you may also cast an expression to type void. The latter explicitly discards the value of the expression and its use is mostly limited inside function-like macros that replace void functions.
Strictly speaking, the cast operator has precedence lower than the unary operators but higher than the multiplicative operators. However, when shown in tabular form, it is always written along with the unary operators.
Other Operators
The rest of the operators are quite straightforward, however, a few comments are in order.ANSI C states that expressions that both associate and commute are still controlled by the precedence table. For example:
a * b * cmust be treated as being grouped like:
(a * b) * cThe rules in K&R permitted such expressions to be arbitrarily rearranged. Of course, different orderings might cause integer overflow (which may or may not prove fatal or erroneous). With floating-point operands, the results can be significantly different. On implementations where integer overflow is silent and recoverable, a compiler is still permitted to rearrange the grouping since you cannot tell the difference. Likewise for the bit operators |, &, and ^.In the early days of C, the compound assignment operators were written in the reverse order. For example, += and >>= were written as =+ and =>>, respectively. Unfortunately, this causes problems with expressions like i =-5. While you probably wanted to assign -5 to i, you were, in fact, subtracting 5 from the current value of i. These old operators were declared archaic in the first edition of K&R and are only supported by a handful of mainstream compilers. They are not part of ANSI C.
The comma operator is rather special and powerful and is discussed in my columns in the August 1988 and the November 1989 issues of CUJ. It is typically used only inside macros or in the first and third expressions of a for construct.
Miscellaneous Issues
Only five operators give any guarantee about the order of evaluation. They are: &&, ||, ?:, (), and comma operator. (For a detailed discussion on order of evaluation and sequence points, refer to my column in the July 1989 issue.)Only four operators are able to produce an lvalue expression. They are: *, [], ->, and dot. (For a detailed discussion on lvalues, refer to my column in the August 1989 issue.)