June 1995/Questions and Answers

Columns

Questions and Answers

Surprising Promotion Effects

Pete Becker

Pete Becker is Senior QA Project Manager for C++ at Borland International. He has been involved with C++ as a developer and manager at Borland for the past six years, and is Borland's principal representative to the ANSI/ISO C++ standardization committee.
The following questions were taken from various online sources, including the Internet and CompuServe. To ask Pete a question about C or C++, send e-mail to pbecker@po.borland.com, use subject line: Questions and Answers; or write to Pete Becker, C/C++ Users Journal, 1601 W. 23rd St., Ste. 200, Lawrence, KS 66046.
Q
Given the following code snippet:
   unsigned char a = 0x01;
   unsigned char b = 0xFE;

   if( a == ~b ) ...
What should the comparison evaluate to — true or false? I've been told that I need to code the comparison as

if( a == (unsigned char) ~b ) ...
because, in the first comparison, both a and b will be promoted to unsigned ints (in our case 32 bits, so a == 0x00000001 and b == 0x000000FE), then b will be bitwise negated to 0xFFFFFF01, thus resulting in the first comparison evaluating to false when I expected it to evaluate to true. I've been told that this behavior is required by the ANSI spec. I have a copy of the 20 September 1994 draft, and I can't seem to find anything saying this yet.
A
The comparison should evaluate to false. You must be careful in both C and C++ when you try to do any operation on integral types smaller than an int, because the compiler promotes the operands to a larger size before doing the operation. The effect of these promotions becomes most obvious in cases like the above, where the code is twiddling individual bits, but promotion can also lead to surprises for ordinary arithmetic operations.
Tracking through the language definitions to understand these effects, we should begin by looking at the ~ operator. The C Standard defines ~ in section 3.3.3, Unary Operators, which says:
The result of the ~ operator is the bitwise complement of its operand (that is, each bit in the result is set if and only if the corresponding bit in the converted operand is not set). The integral promotion is performed on the operand, and the result has the promoted type.
(To find corresponding language in the ANSI/ISO C++ working paper, look in section 5.3.1.) Note the first clause of the second sentence: "The integral promotion is performed..." That's the key to the behavior seen here. To find out what that means, turn to the definition of "integral promotions," found in section 3.2.1.1 (section 4.5 in the C++ working paper):
A char, a short int, or an int bit-field, or their signed or unsigned varieties, or an enumeration type, may be used in an expression wherever an int or unsigned int may be used. If an int can represent all values of the original type, the value is converted to int; otherwise, it is converted to an unsigned int. These are called the integral promotions. All other arithmetic types are unchanged by the integral promotions.[footnote omitted]
In the above case an int can represent every value that an unsigned char can hold, so the unsigned char that is the operand of ~ will be promoted to int. The result is exactly what he describes: 0xFE becomes 0x00000FE, and applying ~ to this value produces 0xFFFFFF01.
We're not done yet, though. Now let's figure out how to handle the == operator. Section 3.3.9, Equality Operators (section 5.10 in the C++ working paper), refers us to section 3.3.8, Relational Operators (section 5.9 in the C++ working paper). In this case, both operands to the == operator have arithmetic type (that is, neither one is a pointer), so the language definition tells us that "the usual arithmetic conversions are performed." The purpose of these usual arithmetic conversions is to bring the operands of a binary operator to a common type. The definition, found in section 3.2.1.5 (section 5 in the C++ working paper), is a bit tedious, but in this case it means that since one operand of the == operator is of type int and the other is smaller, the smaller one should be promoted to int. So the value in a is promoted to int, giving 0x00000001. Comparing this with 0xFFFFFF01 (~b), the two values are clearly not equal.
Integral promotions are designed to preserve values. The compiler promotes the unsigned char 0xFE to the integer 0x000000FE, because 0X000000FE has the same numeric value as 0xFE. Preserving value does not guarantee, however, that bit manipulations on the two variables will produce the same results.
The questioner's suggestion works because the conversion back to unsigned char throws away the high-order bits that the compiler added in order to convert 0xFE into an int. Personally, I prefer to mask out the extra bits instead of using a cast:

if( a == ((~b)&UCHAR_MAX)) ...
This seems clearer to me, because it is applying a bitwise operation to the result of another bitwise operation. If the code were negating b instead of complementing it I would use the cast, as the question suggested:

if( a == (unsigned char)(-b) ) ...
Here we're dealing with an arithmetic operation, and the cast strikes me as fitting more naturally with arithmetic operations. Of course, if this code proved to be a performance bottleneck in an application, I would rewrite it in whatever way produced the fastest code.
Q
This is my understanding, given the following code fragments:

class foo { public: void ConstFunc () const; void NonConstFunc (); }; foo aFoo; const foo *p = &aFoo; // 1 const foo * const q = &aFoo; // 2
1. In //1 above, p is const, but the Foo object to which p points is non-const.
2. In //2 above, both q and the referenced object are const.
Am I correct?
A
Not quite. It's a bit tricky to read declarations that mix pointers and const qualifiers, and programmers often get confused by declarations like these. The best way to approach these declarations is to start with the identifier being declared and read the declaration from right to left. For example, in the declaration marked //1, the identifier being declared is p. Reading from right to left, we discover that p is a pointer to an object of type foo which is const. So your comment that p is const is incorrect. p can be changed to point to a different object, but the object that p points to at any time cannot be changed through the pointer p. The following code fragment demonstrates this:

foo bFoo; p = &bFoo; // valid: p is not const p->NonConst Func( ); // invalid: *p is const p->ConstFunc( ); // valid
In the declaration marked //2, the identifier being declared is q. Reading from right to left, q is a const pointer to an object of type foo which is const. q itself cannot be changed, and the object that q points to cannot be changed through the pointer q. Thus,

q = &bFoo; // invalid: q is const
I want to point out a little quirk in all this const business, which complicates the syntax interpretation rule I just presented. Consider the following declaration:

foo const *r = &aFoo;
Once again, reading from right to left beginning with the identifer that's being declared, we find that r is a pointer to a const object of type foo. Compare this with the description that we found for p above. But don't spend too much time trying to figure out what the difference is between a "const object of type foo" and an "object of type foo which is const." They're the same thing. In fact, you can also write the following declaration, and it too means the same thing as the declaration of r.

const foo const *s = &aFoo;
s is a pointer to a const object of type foo which is const. Just as the two references to const in this verbal description are redundant, the two uses of const in the declaration itself are redundant.
In general, a const qualifier applies to the thing to its left. When a const qualifier is the first thing in a declaration it doesn't have anything to its left, so it applies to the thing to its right. Some programmers adopt a coding convention that does not allow use of const as the first thing in a declaration. They require that the const qualifier always come after the thing it applies to. These programmers would not use the form we used to define p, but would use the form used for r. Personally, I'm in the habit of writing the const first when it applies to a type name, so I would have a hard time with such a coding convention. But if you find that you are confused by these declarations, it might be useful to adopt this convention so that you always write things in the same way.
It's also important to understand that creating a pointer to a const object does not mean that that object can never be modified. It only means that the pointer itself cannot be used to modify the object. Consider the following:

const foo *t = &aFoo; t->NonConst Func( ); // invalid: calling a non-const // member function on a const object foo *u = &aFoo; u->NonConst Func(); // valid: not a const object
Both of these pointers point to the same object, aFoo. t is a pointer to a const foo, so it cannot be used to modify aFoo. u is a pointer to a non-const foo, so it can be used to modify aFoo.
Q
My question relates to assigning a derived to a derived. I have implemented assignment by defining assignment operators for base and derived classes. But I also have tried supplying an assignment operator only in the base (Base& Base::operator=(const Base&)) and having it call a virtual protected helper function, setEqual. However, the Annotated Reference Manual [ARM] says assignment operators are not inherited (see below) I am now wondering why the base class's assignment operator is invoked for a derived-to-derived assignment.
ARM section 13.4.3:
The assignment operator=() must be a nonstatic member function; it is not inherited (12.8). Instead, unless the user defines operator= for a class X, operator= is defined, by default, as memberwise assignment of the members of class X.
A
In order to understand what this paragraph means we need to look at the meaning of "inherited." Here's a simple pair of classes that will show how this works:

class Base { public: void Member( ); Base& operator = ( const Base& ); }; class Derived : public Base { public: int data; };
One thing we can do with these classes is create an object of type Derived and call the member function Member, like this:

Derived; d.Member( );
We can call Member even though it is not explicitly declared as a member of Derived because it is inherited from Base. Of course, if Derived had had a member of its own named Member, the call in this code snippet would have called that function and not the version declared in Base.
The assignment operator works a little differently. There's a good reason for that, though. Suppose the assignment operator was simply inherited, like any other member function. What would the following code do?

Derived d1; d1.data = 3; Derived d2; d2.data = 0; d2 = d1;
If the assignment operator was simply inherited, the assignment in the last line would call the assignment operator defined in Base, just as the previous example did with Member. This assignment operator knows nothing about the members of Derived, so it cannot copy the value of data from d1 into d2. The result would be that d2's member, data, would still contain the value 0. This doesn't seem right; assigning d1 to d2 should copy all of d1's values into d2.
In fact, that's what the quoted section of the ARM says should happen. Derived does not provide its own assignment operator, so the proper action for the compiler is to copy all of the members of Derived when this assignment is performed. This is made clearer in section 12.8, which says, in part,
Memberwise assignment and memberwise initialization implies that if a class X has a member or base of a class M, M's assignment operator and M's copy constructor are used to implement assignment and initialization of the member or base, respectively, in the synthesized operations.
Actually, this wording is also found in the ANSI/ISO C++ working paper, which clarifies many of the little issues in the language definition. In particular, the working paper introduces the notion of a synthesized operation, an important term that allows us to talk about these things more easily. In general, every class has a default constructor, a copy constructor, and an assignment operator. If the appropriate operation is not explicitly defined in the class definition, the compiler will synthesize one.
In our example, the assignment

d2 = d1;
calls the synthesized assignment operator for the class Derived. This assignment operator assigns the Base part of d1 to the Base part of d2 using the assignment operator for Base, then it assigns the data part of d1 to the data part of d2 using the assignment operator for int.
Now let's see how this applies in the example posed by your original question:

class Base { public: Base& operator = ( const Base& b ) { setEqual( b ); return *this; } private: virtual void setEqual( const Base& b ) { BaseData = b.BaseData; } int BaseData; }; class Derived : public Base { private: void setEqual( const Base& b ); int DerivedData; }; void Derived::setEqual( const Base& b ) { Base::setEqual( b ); Derived *d = dynamic_cast<Derived*>(&b); if ( d != 0) DerivedData = d->DerivedData; }
An important point here is the use of dynamic_cast in Derived:: set Equal(). This is for safety, in case someone explicitly calls Base's assignment operator with an object of type Base:

Base b; Derived d; d.Base::operator=(b);
If we had simply used a C-style cast or a static_cast, we'd end up trying to copy the DerivedData member from an object of type Base. Since there is no such member, the result would be unpredictable. By using dynamic_cast, we can check whether the argument we are getting is, in fact, of type Derived.
But let's go a step further, and look at what actually happens in the more usual case:

Derived d1, d2; d2 = d1;
Since Derived does not define its own assignment operator, the compiler will synthesize one. The synthesized assignment operator does memberwise assignment of the bases and the members. First it will call Base::operator=(), which, in turn, calls setEqual. setEqual() copies the value of DerivedData from d1 to d2 and returns, and then Base::operator=()returns. Next, the synthesized assignment operator assigns the data members of d1 to the data members of d2, simply copying the value of DerivedData from d1 to d2. The result of all this is that DerivedData gets set twice, first by setEqual and then by the synthesized assignment operator.
How can we avoid this double assignment? Easy: don't fight the compiler. Write each class to handle its own assignments, either through the synthesized assignment operator or through an explicit assignment operator. Don't worry about derived classes: they'll take care of themselves.