July 1995/Stepping Up To C++

Columns

Stepping Up To C++

Other Assorted Changes, Part 1

Dan Saks

Dan Saks is the president of Saks & Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suit for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or electronically at dsaks@wittenberg.edu.
Last month, I started describing those features of C++ that have different behaviors under the current draft standard than they do in the Annotated C++ Reference Manual (ARM) [1]. In particular, last month's column detailed the changes in the scope rules:

Conditional statements now introduce a new block scope.

The scope of a declaration in a for-init-statement is now restricted to the for-statement.

The "rewriting" rule and the rule limiting the context sensitivity of class member declarations have been replaced by comprehensive scope rules for class member declarations.

The point of declaration for an enumerator is now immediately after its enumerator-definition.
(See "Stepping Up to C++: Changes in Scope Rules," CUJ, June, 1995 for details.)
This month's column describes more of the language changes. Bear in mind that this list of changes does not include extensions to the C++ language. For descriptions of the extensions, refer to my earlier columns ("Stepping Up to C++," January through May, 1995).

Member Access Expressions
A member access expression is an expression that uses a . (dot) or -> (arrow) operator to access a class member. For example, a.m and p->f(x) are both member access expressions.
The left-hand side of a member access expression can by itself be a non-trivial expression, as in a[f(i)].m or (++p)->g(x). As you might expect, a program normally evaluates the left-hand side of a member access expression to determine the object containing the specified member. However, according to the ARM Section 9.4 [StaticMembers]:
When a static member is accessed through a member access operator, the expression on the left side of the . or -> is not evaluated.
For example, if m is a static data member, then according to the ARM, a[f(i)].m does not call f(i). Similarly, if g is a static member function, then (++p)->g(x) does not increment p.
Although this special rule for accessing static members eliminates some potentially inefficient code, the committees decided it led to inconsistent, and potentially surprising, results. Therefore, they eliminated that sentence from the draft. According to the current draft, if a program evaluates a member access expression, it must evaluate the expression on the left-hand side of the . or -> operator.
If this rule change makes sense to you already, and you know how to work around it, you can skip to the next section. If not, stay with me. The key insight comes from understanding static class members, which I'll explain with the following example.
Suppose your application uses a class called widget, and you want to track the number of widget objects in existence at any given time in the execution of your program. Simply define a counter, initialized to zero, that counts the number of objects. Then, add a statement to each widget constructor to increment the counter, as in

widget::widget() { ++counter; // initialize the widget }
You should also add a statement to the widget destructor to decrement the counter, as in

widget::~widget() { // discard the widget's resources --counter; }
Where do you declare the counter? You might try declaring the counter as an ordinary data member of the widget class:

class widget { public: widget(); ~widget(); // other public members... private: unsigned counter; // other private members ... };
but this just won't work. You get a separate counter member in each widget object (probably all set to one).
The counter variable must be statically allocated and separate from every widget object so there's one and only one counter for all widget objects. Declaring the counter as a global variable will work, but you run the risk that the name of the counter may conflict with some other global name elsewhere in the program. Also, if it's a global variable, you can't prevent parts of the program outside the widget class from accessing the counter directly. As always, global variables weaken abstractions and you should avoid them.
Static data members solve this dilemma. A static data member is in the scope of its class and is subject to access control (it can be private). However, a static data member is unlike an ordinary data member in that it is not a part of each class object; there's only one copy of the static member, and it is separate from every object. That one copy has static storage duration and external linkage, so that all objects of the class type share the same static member.
For example,

class widget { public: widget(); ~widget(); // other public members ... private: static unsigned counter; // other private members ... };
declares counter as a static member of widget. The fully-qualified name for the counter is widget: :counter, but the widget constructor and destructor (and every other widget member function) can refer to it as just plain counter.
The declaration of a static data member inside a class is only a declaration. The definition (and initialization) of the static member appears elsewhere, typically in a source file along with other members of the class. For widget::counter, the definition looks like

unsigned widget::counter= 0;
If the counter were public, non-member functions could access it using its fully-qualified name, widget::counter, as in

cout << "# of widgets = << widget::counter;
But then non-member functions could also modify widget::counter and invalidate the count. Thus, you should declare the counter private, and write a public member function, how_many, that returns the current counter:

unsigned widget::how_many() { return counter; }
Hence, a non-member function can only inspect the counter by calling how_many. This protects widget::counter from unauthorized access.
However, this implementation of how_many is less than perfect. The problem is that how_many is an ordinary member function. An ordinary member functions always has a hidden extra argument — the object addressed by its this pointer. Therefore, a call to how_many must be of the form w.how_many() (where w is a widget) or p->how_many()(where p is a pointer to a widget). But how_many doesn't need a this pointer to locate widget::count because widget::count is not in a widget object.
In fact, if how_many is an ordinary member function, the logic for testing that there are no widgets becomes a bit contorted:

{ widget w; if (w.how_many() == 1) // there weren't any widgets // w goes out of scope; destroy it }
You must declare a widget object, say w, just so you can call w.how_many(), even though how_many ignores the object. Furthermore, declaring a widget increments the counter, so rather than test for no widgets, you must test for one.
These problems go away if you declare how_many as a static member function:

class widget { public: widget(); ~widget(); static unsigned how_many(); // other public members ... private: static unsigned counter; // other private members ... };
A static member function does not have a this pointer, so it cannot access ordinary data members, but it can access static data members. Thus, you don't need a widget object to call how_many. You simply call it by its full name, as in

cout << "# of widgets =" << widget::how_many();
or in

if (widget::how_many() == 0) // there really aren't any widgets
If you wish, you can still call w.how_many() (where w is a widget object) or p->howmany() (where p is a pointer to a widget object). In either case, the translator only uses w or p to determine the class type of the static member; it does not bind a this pointer to the object as part of the call. Similarly, if counter were public, you could access the counter using w.counter or p->counter, as well as with widget::counter. Again, the translator only uses w or p to determine the class type of the static member; it does not use the object to locate the static member.
Therein lies the rationale for the ARM's rule that a program need not evaluate the left-hand side of a . or -> in a static member access expression. The access uses the left-hand side only to determine the member's class (at compile time), and then ignores any computational results in accessing the member (at run time). However, some committee members suggested this distinction was just too subtle, and that programmers might be surprised that the left-hand side of an access expression might not be evaluated.
The program in Listing 1 illustrates one such surprise. The first while loop uses pointer p to refer to each element of array a in ascending order. It works properly because the increment expression p++ appears on the left-side of an non-static member access expression. The second loop uses p to access each array element in descending order. According to the ARM, this loop does not terminate (at least, not normally). The program never evaluates the decrement expression --p because it appears on the left side of a static member access.
I believe the program in Listing 1 should terminate using the rules in the current draft. Unfortunately, I can't verify this because all the compilers I have at my disposal still employ the ARM's rule.
In a separate but related change, the committees also made the rules for member access expressions more regular by allowing enumerators to appear on the righthand side of a . (dot) or -> (arrow). That is, given:
class shape
   {
public:
   enum palette {BLUE, GREEN, RED};
   // ...
private:
   // ...
   }:
the current draft standard says you can refer to a value of type shape::palette outside class shape using an expression such as s.E or p->E (where s is a shape, p is a pointer to a shape, and E is either BLUE, GREEN, or RED). According to the ARM, the only way to refer to one of these values outside shape is by using a qualified name of the form shape::E.

Enumeration Types
The current draft changes the behavior of enumerations in other ways as well. According to the ARM, enumerations are integral types, just as they are in C. Thus, given

enum day {SUN, MON, TUE, WED, THU, FRI, SAT}; enum day d;
the ARM and standard C both permit arithmetic expressions such as

d = 0; d++; d += 2; d = d+1;
This is a very weakly-typed approach to enumerations when compared with other high-level languages such as Pascal, Modula-2, and Ada. These other languages treat each enumeration as a distinct type with a very limited set of predefined operations such as assignment and comparison. None of these languages permits arithmetic on enumerations other than increment and decrement.
The current C++ draft standard aspires to treat enumerations as distinct types in the mold of these other languages. However, for compatibility with the ARM as well as C, the current draft allows promotions from enumerations to integral types. Specifically, the draft says:
Enumerations are not integral, but they can be promoted to int, unsigned int, long, or unsigned long.
C++ no longer permits any of the expressions shown above, but it still permits declarations like

int n = WED; // promote WED to int
This change introduced a possibly widespread incompatibility with existing C++ code because the predefined arithmetic operators such as ++ and -- no longer apply to enumerations. This change rendered a previously valid loop such as

for (d = SUN; d <= SAT; ++d)
as an error (because the built-in ++ no longer applies to enumeration object d).
As I described in an earlier column ("Stepping Up to C++: Minor Enhancements to C++ as of CD Registration," CUJ, February 1995), the committees compensated for this incompatibility by extending C++ to permit overloading on enumerations. For example, you can write the for loop above by first defining

inline day &operator++(day &d) { return d = day(d + 1); }
as the prefix form of ++ for objects of type day.
You may, but need not, also declare relational operators for type day, such as

inline int operator<=(day d1, day d2);
In the absence of this declaration,

d <= SAT
promotes both operands to int and uses the built-in <= operator. In fact, these promotions even occur in relational expressions involving enumerations of differing types. For example, given

enum color {BLUE, GREEN, RED};
then

for (day d = SUN; d <= RED; ++d)
compiles into a loop that goes around three times.
Thus, the automatic promotion to an integer type weakens a C++ compiler's ability to keep enumerations distinct from each other. It's unfortunate, but the alternative of leaving a lot of C and older C++ code outside the C++ standard may be worse.
In a separate, but related minor extension, the C++ draft allows an enumeration constant to have a value of an integral type whose size is larger than int. The ARM, like C, states that:
The value of an enumerator must be an int or a value that can be promoted to int by integral promotion.
The current draft now simply says:
The constant-expression [used to initialize the enumerator] shall be of integral type.
Integral types include unsigned int, [signed] long int, and unsigned long int. Thus, C++ now allows enumeration declarations such as

enum big_numbers {MEGA = 1000000, GIGA: 1000000000};

The Lifetime of Temporary Objects
Both the ARM and the current draft standard let C++ compilers introduce temporary objects into the object program as needed to implement proper run-time behavior. If the program employs a temporary of a type with a non-trivial constructor, the program must call a constructor for that temporary. Similarly, if the temporary has a type with a non-trivial destructor, the program must call that destructor for the temporary.
For example, calling a function such as

String operator+(const String &s1, const String &s2)
typically creates a temporary String object. The function can't store the result in either operand, so it must place the result in a third String object. The function may construct the result in a named local object, but then it must also destroy that local object as it returns. For example, calling

String operator+(const String &s1, const String &s2) { String rv; // compute result in rv return rv; }
destroys rv upon function return. Therefore, the return must transmit the result in another String object, typically an unnamed temporary. I described the situations where temporaries arise in much greater detail in an earlier column, "Stepping Up to C++: Temporary Inconvenience, Part 1," CUJ, October, 1993.
The lifetime of a temporary object is the period of time during program execution from the temporary's construction until its corresponding destruction. The ARM simply says the lifetime of temporary objects created during expression evaluation is "implementation dependent." That is, a program might destroy each temporary almost immediately, or it might save them up and destroy them all at program termination.
Ideally, C++ programmers shouldn't have to concern themselves with exactly when the temporaries come and go. But they do. If you don't know how your compiler manages temporaries, your program might inadvertently destroy a temporary before it's done with that temporary. Or your program might consume mass quantities of memory keeping temporaries around long after they've outlived their usefulness. If, for example, the aforementioned String class includes a conversion operator

operator const char *() const;
then, given Strings s and t, an expression such as

(const char *)(s + t)
returns the representation of the temporary resulting from s + t as a null-terminated character array. Over the years, C++ programs have used such expressions as arguments to Standard C library functions, as in

printf("%s\n", (const char *)(s + t));
or

if (strcmp((const char *)(s + t), buf) == 0)
Unfortunately, the ARM gave no assurance that the temporaries would survive until the enclosing call (to printf or to strcmp) completed.
The current draft standard offers much better guidance about the lifetime of temporary objects. It says:
Ordinarily, temporary objects are destroyed as the last step in evaluating the full-expression that (lexically) contains the point where they were created. This is true even if that evaluation ends in throwing an exception.
As in C, a full-expression is an expression that is not part of another expression. It includes the following:

an expression statement

the controlling expression of an if or switch statement

the controlling expression of an while or do statement

each of the three (optional) expressions of a for statement

the optional expression in a return statement
Thus, the draft C++ standard now clearly states that

printf("%s\n", (const char *)(s + t));
works because the temporary resulting from s + t lasts until printf returns. Similarly,

if (strcmp((const char *)(s + t), buf) == 0)
also works because the temporary lasts until the end of the conditional expression. However, the following will fail

const char *p; p = s + t; printf("%s\n", p);
because the program destroys the temporary at the end of the assignment to p. For more such examples see my article "Stepping Up to C++: Temporary Inconvenience, Part 2," CUJ, November, 1993.
The draft also describes two contexts in which a program destroys temporaries other than at the end of a full-expression:
The first context is when an expression appears as an initializer for a declarator defining an object. In that context, the temporary that holds the result of the expression shall persist until the object's initialization is complete. The object is initialized from a copy of the temporary; during this copying, an implementation can call the copy constructor many times; the temporary is destroyed as soon as it has been copied.
The second context is when a temporary is bound to a reference. The temporary bound to the reference or the temporary containing the sub-object that is bound to the reference persists for the lifetime of the reference initialized or until the end of the scope in which the temporary is created, which ever comes first. A temporary holding the result of an initializer expression for a declarator that declares a reference persists until the end of the scope in which the reference declaration occurs. A temporary bound to a reference in a constructor's ctor-initializer persists until the constructor exits. A temporary bound to a reference parameter in a function call persists until the completion of the call. A temporary bound in a function return statement persists until the function exits.
Next month, I'll describe more of these language changes.

References
[1] Margaret A. Ellis and Bjarne Stroustrup. The Annotated C++ Reference Manual (Addison-Wesley, 1990).