Columns


Standard C

Quiet Changes, Part II

P.J. Plauger


P.J. Plauger has been a prolific programmer, textbook author, and software entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standards committee.

Last month, I began the process of describing all of the quiet changes in Standard C. (See "Quiet Changes, Part I," CUJ February '90.) A quiet change is a change in the meaning of Standard C versus some earlier (and presumably popular) dialect of C. It is a change that converts an acceptable program with one behavior to an acceptable program with different behavior. You get no diagnostic to warn you that your program may have to be altered to keep its old behavior.

Needless to say, quiet changes are not nice. Committee X3J11 did its best to minimize them. It considered each such change very carefully. And it documented all the quiet changes it felt compelled to make in the rationale that accompanies the standard.

My goal with this column and the previous one is to show you all the quiet changes. With each one, I give an example of code that might be affected. I also endeavor to explain how the committee was led to introduce the change. Last issue, I covered about half of the quiet changes. Here are the rest.

More Quiet Changes

"A program that depends upon unsigned preserving arithmetic conversions will behave differently, probably without complaint.This is considered the most serious semantic change made by the committee to a widespread current practice."

For example,

unsigned char uc = digit;

if (uc - '0' > 'g')
     printf("not a digit\n");
The message is no longer printed for all cases where digit is out of range. Some older implementations always performed an unsigned compare and hence cleverly got the test right for all cases.

The fact that I devoted a whole column to this issue should tell you that it has a lot of ramifications. (See "Standard C Promotes Types According to Value Preserving Rules," CUJ August '88.) Here I will simply summarize the issue. It required the committee to choose between two divergent classes of dialects. Neither could be dropped without causing quiet changes in programs written for the other dialect.

The divergence occurred when C acquired the additional unsigned types besides unsigned int. These new types required additional rules for promoting integer types, such as when you subtract an unsigned char and an int. The decision inside Bell Labs was to preserve unsignedness. That is, if either operand is an unsigned type, both operands are promoted to the "cheapest" computational unsigned type that is at least as wide as the wider operand. (The computational types are the signed and unsigned versions of int and long.) So unsigned char minus int yields a result of type unsigned int.

What I and some other implementors chose to do was different. We chose to promote both operands to the cheapest computational type that would represent all the values of each of the two operand types. So unsigned char minus int yields a result of type int, so long as unsigned char has a narrower representation than int.

Unlike the "unsigned preserving" promotion rules, the result type is different for different target machines. On the other hand, the result more often has a type that correctly reports a negative value as a negative signed result, not some huge unsigned result. That is why the second set of promotion rules has been dubbed "value preserving."

After much heated debate, the value preserving faction won. The most convincing argument was that value preserving promotions produced surprising results less often. The most convincing counter argument was that UNIX, and lots of other important code, was written using unsigned preserving promotions. Groups had successfully ported UNIX, however, using a compiler with value preserving promotions. That quelled enough fears for the committee to reach consensus.

My belief is that this was a tempest in a teapot. (Naturally, it didn't feel that way to those of us in the teapot at the time.) It's hard to contrive realistic examples that quietly change along with these promotion rules. I gave the most compelling one I could think of above, in the spirit of fair play. If you're still worried, however, go back and read the full diatribe in my original column on the subject.

"Expressions with float operands may now be computed at lower precision. The Base Document specified that all floating point operations be done in double."

For example,

float x, y;
x = x - y;
The subtraction can now be performed to float precision, which could retain less significance than in the past.

C has traditionally performed all floating point arithmetic in double. This minimized type mismatches for floating point arguments back before there were function prototypes. It also happened to model the behavior of the PDP-11 floating point hardware used in the first implementation of C.

Unfortunately, that traditional behavior has been one of the principal reasons why more FORTRAN programs have not been converted to C. The performance penalty can be substantial. What you gain in retained precision is often not worth the cost, particularly to programmers skilled in juggling precisions.

The committee had little trouble changing the promotion rules to match FORTRAN more closely. We felt that few programs depend critically on the higher intermediate precision promised in the past when adding operands of type float.

"A program that uses #if expressions to determine properties of the execution environment may now get different answers."

#if -1U/2+1 == 1U<<31
/* int is 32-bits, 2's-complement */
The comment is not necessarily true for the target environment.

Some C programs determined properties of the execution environment by testing how the C preprocessor performed integer arithmetic. The assumption was that preprocessor arithmetic was the same as for the target environment. That happens not to be true for most cross compilers. It is also not true for many compilers designed to support numerous target environments.

The committee decided that target-independent preprocessors were a Good Thing. To be sure, compile time arithmetic must retain at least the same precision and range as arithmetic on the target. But it need not slavishly match all of the foibles of the target.

Instead of writing clever (and unreadable) expressions in #if directives, programmers are now urged to test the values of appropriate macros. The standard header <limits.h> defines macros that tell you all sorts of interesting things about the representation of integers on the target. The standard header <float.h> defines macros that tell you more than you're likely ever to want to know about the representation of floating point types. The above example should be changed to

#include <limits.h>
#if INT_MAX == 2147483647 && \
    INT_MIN == -2147483648
/* int is 32-bits, 2's-complement */
An implementation can still promise to model the target arithmetic in the preprocessor. In that case, the clever programs need not change. If you want to port them to another implementation, however, you'd probably have to change the #if expressions anyway.

"The empty declaration struct x; is no longer innocuous.

For example,

f() {
    struct x; /* special meaning */
    struct y {
        struct x *px;
        ..... };
    struct x {
        struct y *py;
        ..... };
The first declaration now assures that the two structures point at each other, regardless of any outer context.

Just as block structure does not mix well with external linkage, it collides at times with forward references as well. C lets you declare a structure tag before you declare the contents of the structure. (It is an incomplete type.) You need to make such a forward reference when you declare two structures each of which contains a pointer to the other.

Unfortunately, C had no way to shield a forward reference from a structure tag definition visible in an outer block. Should you wish to plunk down a patch of code containing forward references to tags in an arbitrary code environment, you ran the risk of having the patch misbehave in some contexts. This defeats much of the purpose of block structuring to protect name spaces.

This is an esoteric problem. You've probably never encountered it and you probably never will. Nevertheless, the committee decided to solve it by giving special meaning to an otherwise empty declaration such as struct x;. You can contrive a program that breaks because this esoteric problem has been fixed. You will have trouble convincing me that it is a program worth writing.

"Code which relies on a bottom-up parse of aggregate initializers with partially elided braces will not yield the expected initialized object."

You can, of course, initialize structures containing structures, or arrays of arrays, or arrays of structures. For any complex initializer involving aggregates, you write braces around the stuff for each aggregate to set it off. Unfortunately, C has a long tradition of letting you omit all but the outermost set of braces. For a compiler writer, this is a nightmare.

When implementors on the committee started comparing nightmares, matters got even worse. It seems that people had settled on at least two different ways to parse aggregate initializers with partially elided braces. In terms of parsing theory, the two general camps can be characterized as "top-down" and "bottom-up." I will not bore you with detailed examples to illustrate the subtle differences.

What you need to know is that the committee eventually endorsed the top-down approach to parsing initializers. You also need to know that omitting braces is a great way to confuse yourself and future maintainers. Never mind that Standard C translators are all supposed to guess the same way. If an initializer is sufficiently complex, you are asking for trouble if you omit any of the internal braces.

Any program that suffers from this quiet change already faced portability problems in the past.

"Type long expressions and constants in switch statements are no longer truncated to int."

For example, on an implementation where type int occupies 16 bits,

long lo = 0x20001;
switch (lo) {
case 0x0001:   /* no longer
matches */
The switch comparisons are now performed with long arithmetic, so the first case does not match a truncated value.

The committee entertained proposals for all sorts of improvements to switch statements. We decided against permitting floating point or pointer expressions to control switch statements. On the other hand, we found little justification for continuing to rule out the other integer computational types besides int. A switch statement whose control expression is of type long will now do all its comparisons against case values with long arithmetic.

Conceivably, an existing program contains a switch expression of type long. Conceivably, the program depends upon the value being altered when it is converted to type int for the comparisons. If so, then the program will quietly change its behavior. The likelihood of such an occurrence is reasonably remote.

"Functions that depend on char or short parameter types being widened to int, or float to double, may behave differently."

For example,

f(x, y)
    char x;
    {
    if (y == 0)
        x = 500;
In Standard C, x remains type char. Hence, the stored value would probably be truncated. Many past implementations would promote x to type int.

One school of thought in the past was that a parameter of type char was silly. Everyone knew that the argument value was actually passed as type int, so why not just rewrite the type of the parameter to match the passed value?

The alternate school was that the programmer's wishes should be obeyed. However the argument value was passed, it should be stored in a data object of the declared type. You get fewer surprises that way. Besides, for most integer parameters, you can make the conversion "free" just by picking the right piece of the argument value as the parameter data object.

The second school of thought prevailed in the end. That causes trouble for programs written for translators that rewrite parameter types. (Again, the trouble was already there for people who tried to move such programs between implementations.)

Look for places where you declare a parameter as other than the "widened" type (the type actually passed). If the function stores values too large for the declared type, you will have a quiet change. If the function takes the address of that parameter, you may well have a quiet change. Surprisingly, however, the change often has no effect.

"A macro that relies on formal parameter substitution within a string literal will produce different results."

For example,

#define pr(msg) printf("msg\n")
Standard C will not alter the string when it expands the macro. Some earlier implementations would do so.

The folks at Berkeley decided that macros should be able to generate tailored string literals. Consequently, they adopted the convention that a macro parameter name within a string literal should be replaced by the actual parameter. (This happens only for string literals written as part of the expansion text of a macro, never outside macro expansions.)

There was general support for such a mechanism among members of the committee. Some of us, however, objected to this particular approach. (I was the loudest of the objectors.) It would mean greatly complicating the lexical description of string literals. They were already complicated enough with escape sequences and string concatenation. Adding the concept of embedded identifiers was appalling.

Instead, the committee adopted a new convention. Any parameter name you precede with a # gets turned into a string literal. With string concatenation, you can paste the "stringized" parameter into a larger string. The capability is the same, just the machinery differs. The example above must be changed to

#define pr(msg) printf(#msg "\n")
Nevertheless, any program that depends upon this practice will suffer a quiet change.

"A program which relies on size-0 allocation requests returning a non-null pointer will behave differently."

For example,

sscanf("%u", &size);
p = malloc(size);
If size is zero, the behavior is now undefined. The value stored in p may be a null pointer, or the program may abort, for example.

An unfortunate schism developed within the committee over the proper behavior of malloc(0). Should this produce a non-null pointer to an object of zero size? Or should it be considered invalid so that an implementation must diagnose it (perhaps at run time)? It is a religious issue, touching on people's basic beliefs as to what constitutes elegant behavior. Like most religious issues, many bystanders quickly tire of arguments on either side of the matter. That made it all the more difficult for the committee to achieve an informed resolution.

The net result was that both sides lost. The "compromise" was to label malloc(0) undefined behavior. This means that programmers can't depend on its working right. They also can't depend on its being diagnosed. A program that allocates objects of varying size may now suffer a quiet change. Where once it could handle the occasional zero-size object without special code, now it can fail.

Conclusion

If you look back over this set of quiet changes, you will find much cause for hope. The hope is that nearly all of the changes need not remain quiet. A good translator can have extra checks to look for these cases and emit warning messages. Only the last, concerning zero-size objects, must be augmented by run-time checking.

I have not yet heard of an implementation that offers to check for quiet changes. One that does will be a valuable migration tool. (You don't want the checks turned on all the time, only for old C code that you are upgrading.) Vendors please take note.