May 1995/Stepping Up To C++

Columns

Stepping Up To C++

Even More Minor Enhancements

Dan Saks

Dan Saks is the president of Saks & Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or electronically at dsaks@wittenberg.edu.
For the past several months, I've been describing the ways that C++ as specified by the draft standard (Fall, 1994) differs from C++ as specified in the ARM [1]. These differences include four major extensions:

templates

exception handling

run-time type information (including dynamic_cast)

namespaces
which I summarized in "Stepping Up to C++: C++ at CD Registration," CUJ, January, 1995. These extensions also include numerous minor enhancements:

new keywords and digraphs for C++ as alternate ISO646- compliant spellings for existing tokens

operator overloading on enumerations

operator new[] and operator delete[]

relaxed restrictions on the return type of virtual functions

wchar_t as a keyword representing a distinct type

a built-in Boolean type

declarations in conditional expressions

new cast notation

qualified names in elaborated-type-specifiers

expressions of the form a.::B::c and p->::B::c

conversion from T ** to const T *const *

mutable class members
these have been the subject of my past three columns ("Minor Enhancements to C++ as of CD Registration," "More Minor Enhancements to C++ as of CD Registration," and "Mutable Class Members," CUJ, February through April, 1995.)
This month, I describe the remaining minor enhancements as of CD registration:

forward-declared nested classes

layout rules for POD-structs and POD-unions

compile-time member constants

default return value (of 0) from main

empty initializer-clauses
Plus, as an added bonus, I explain two more minor enhancements added after CD registration:

non-converting constructors

typename declarations

Forward-Declared Nested Classes
In my original list of minor enhancements (January, 1995), I inadvertantly left this one out. The omission is ironic because I probably use this enhancement more than any other.
A nested class is simply a class that's defined as a member of another class (its enclosing class). Nested classes are handy for reducing the number of global entities in a program, and with that the potential for global name conflicts.
For example, classes for common data structures, such as linked lists and trees, usually employ auxiliary classes to represent the node types in the structure. In most cases, these node types should be nested classes. Listing 1 sketches the definition for a typical tree class with its node type as a private nested class. (Although the definition uses the keyword struct, in C++ parlance, it's a class.)
Outside its enclosing class, you must refer to a nested class by its fully-qualified name. For example, outside class tree in Listing 1, you must refer to tree's node as tree::node. Nested classes are subject to access control. Thus, since tree::node in Listing 1 is private, only friends of class tree can access tree::node.
Nested classes add clarity to programs by making the relationships between classes more explicit. On the other hand, a nested class definition often disrupts the flow of its enclosing class, thus reducing readability. In many cases, you can avoid the disruption by using forward-declarations for nested classes. That is, you merely declare a nested class inside its enclosing class, and then define it outside the enclosing class.
Listing 2 depicts the tree class definition rewritten with the node as a forward-declared nested class. The declaration
struct node;
declares node as a type member of class tree. Since this declaration is not a definition (it's absent a brace-enclosed body), tree::node is, for the moment, an incompletely-defined type. The type it not complete until a corresponding definition of the form:
struct tree::node
   {
   ...
   };
appears sometime later. The definition must refer to the nested class by its fully-qualified name, in this case tree::node.
You cannot declare objects of an incomplete type, because the compiler lacks the information it needs to allocate storage for such objects. However, you can declare pointers or references to an incomplete type. The definition for class tree in Listing 2 uses node only to declare a member, root, of type node *. Therefore, node need not be complete inside class tree. Had class tree declared a member of type node, as in:
class tree
   {
   ...
private:
   struct node;
   node n; // error
   };
then that member declaration would be an error.

Layout Rules for PODs
The C Standard has rules that define explicit layout requirements for structures and unions, ostensibly to support certain low-level programming styles. In particular, it says:

Within a structure object, the non-bit-field members and the units in which bitfields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa.

A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.

Two structures share a common initial sequence if corresponding members have compatible types (and for bit-fields, the same width) for a sequence of one or more initial members. If a union contains several structures that share a common initial sequence, and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them.
As described in the ARM, C++ had less stringent layout rules. In C++, a class is a struct and vice versa, and a class object might have additional hidden data members. In particular, a class object may have a vptr — a pointer to a table of virtual function addresses. Many implementations place vptrs at the beginning of objects, so class objects cannot meet the layout requirements of a C structure. In a C++ object, non-static data members with no intervening access specifiers (the keywords public, private, and protected) have addresses that increase in the order of declaration. However, the order of members separated by an access specifier is implementation dependent.
Thus, according to the ARM, the first member of a struct object might not be at the same address as the struct object itself. The C++ standards committees acknowledged that this poses a compatibility problem for programmers migrating their C code to C++. However, the committees agreed they couldn't extend these layout guarantees to all class objects without hamstringing C++ implementations.
The committees solved the problem by extending the C layout guarantees only to POD-structs and POD-unions. POD stands for "plain ol' data" and more or less means "something you could write in C." More precisely, the C++ draft defines PODs in terms of aggregates:
An aggregate is an array, or an object of a class with no user-declared constructors, no private or protected members, no base classes, and no virtual functions.
The draft defines POD-struct and POD-union as:
A POD-struct is an aggregate structure that contains neither references nor pointers to members. Similarly, a POD-union is an aggregate union that contains neither references nor pointers to members.
Thus, the layout rules in the current draft C++ Standard are nearly identical to the rules in the C Standard, at least when applied to any object you could also declare in C.

Compile-time Member Constants
Various constructs in C and C++ include expressions that must yield integral constant values. For example, in an array declaration such as

T a[MAX];
the dimension MAX must be an integral constant expression. An array dimension can be an elaborate expression with several operands and operators, such as

T a[(M + 1) * (N + 1)];
However, every operand in such an expression must be constant so that a compiler can determine the expression's value at compile time.
Both C and C++ require integral constant expressions in a few other contexts, namely, bit-field size specifiers, case statement labels, and enumerator definitions. C programmers typically define symbols representing constant values as macros, using definitions such as

#define MAX 99
Unfortunately, macros such as MAX do not obey the normal scope rules of the C language proper. An enumeration constant such as

enum { MAX = 99 };
has the advantage over a macro that it does obey the scope rules. That is, an enumeration constant declared at block scope (inside a function) is truly local to that function, whereas a macro declared inside a function is essentially global.
The disadvantage of an enumeration constant is that it can have only integral type. Thus, for example,

enum { PI = 3.1415926 };
is an error because it attempts to define enumeration constant PI with a value of type double. In C, your only choice for defining a floating or double compile-time constant is as a macro.
C++ also lets programmers define symbolic constants as const-qualified objects, such as

const int MAX = 99;
In C, MAX is an object which exists at run time. Even though its value is known during compilation, C does not consider MAX to be a compile-time constant. But C++ does. A C++ compiler need not generate run-time storage for MAX unless the program computes the address of or binds a reference to MAX.
In C++

const double PI = 3.1415926;
is also a compile-time constant. Since it is not integral, you cannot use PI as an integral constant expression (in an array dimension, case label, etc.) unless you cast PI to an integral type.
Unfortunately, C++ as defined in the ARM has no syntax for defining const-qualified objects at class scope as compile-time constants. For example, if a class X has an array member, as in

class X { ... private: T a[MAX]; };
you might want to define MAX inside the scope of X to avoid a potential name conflict with some MAX elsewhere in the program. Unfortunately, defining MAX as a member constant, as in

class X { public: const int MAX = 99; // error ... private: T a[MAX]; };
is a syntax error. The ARM does not allow initializers in data member declarations, even for const members. You can initialize MAX only with a member initializer in a constructor, such as

X::X(int m) : MAX(m) { ... }
In this case, MAX is not a compile-time constant, but rather a const data member that occupies run-time storage in each X object and requires run-time initialization.
C++ programmers generally use enumerations to define compile-time constants in class scope, as in:

class X { public: enum { MAX = 99 }; ... private: T a[MAX]; };
This works adequately for many common cases like the one above, but it breaks down at times. For example, enumeration constants appear to work fine in

class X { public: enum { LO = 'a', HI = 'z' }; ... private: char table[LO - HI + 1]; };
until you try to write LO (or HI) to cout using

cout << X::LO;
This displays X::LO as an int, not as a char, because enumerations promote to int in expressions.
Alternatively, you can declare LO and HI as static const members, as in:

class X { public: static const char LO; static const char HI; void f(); ... }; const char X::LO = 'a'; const char X::HI = 'z';
so that X::LO and X::HI are compile-time constants of type char. This too works well in many situations. For example, with these definitions for LO and HI, an output expression such as

cout << X::LO;
writes X::LO as a char (not an int). You can even use LO and HI in integral constants expressions, as in

void X::f() { char table[HI - LO + 1]; ... }
Unfortunately, the definitions for LO and HI cannot appear until after the end of the class definition — too late to use them as compile-time constants within the class. This means you cannot write:

class X { public: static const char LO; static const char HI; void f(); ... private: char table[HI - LO + 1]; // error };
At the time the compiler encounters the declaration for table, LO and HI are declared, but not yet defined.
To correct this problem, the committees extended C++ to allow initialization of static const data members inside class scope. For example:

struct X { public: static const char LO = 'a'; static const char HI = 'z'; void f(); ... private: char table[HI - LO + 1]; };
defines LO and HI as compile-time constants of type char.
A static const member initialized in class scope must still be defined outside the class. For example,

const char X::LO; // still need definition after class const char X::HI; // still need definition after class
provide appropriate definitions for LO and HI. Note that these definitions at file scope may not have initializers, even if they are identical to the corresponding ones at class scope.
Even though static const data members with initializers define compile-time constants, such members have external linkage (as do all other static data members) and linkers must allocate space for them. A program can take the address of or bind a reference to static const data members.

Default Return Value from main
Some, possibly many, C++ programmers are apparently annoyed when C++ compilers generate warnings about programs such as

#include <iostream.h> main() { cout << "hello, world\n"; }
Of course, the problem with the program is that main has a default return type of int, but no return value. Programmers who write

#include <iostream.h> #include <stdlib.h> main() { cout << "hello, world\n"; exit(0); }
may find this particularly annoying, because they expect the call to exit(0) to be sufficient. It usually isn't. Although both C and C++ define

return rv;
within main as equivalent to calling

exit(rv);
many C++ compilers nonetheless issue a warning when the return is missing.
The committees decided to silence these warnings by adding the following sentence to the draft standard:
If control reaches the end of main without encountering a return statement, the effect is that of executing

return 0;
This rule applies only to main. All other functions with non-void return types must return something explicitly.

Empty Initializer Clauses
The C++ grammar in the ARM disallows empty initializer-clauses. That is, according to the ARM,

const T x = { };
is a syntax error because there's nothing between the braces.
Some committee members argued that programmers sometimes create empty classes (classes without members) as stubs. Although you can initialize an object of empty type T using a declaration such as

const T x = T();
this declaration might incur a run-time cost. On the other hand, C++ does initialization with braces before run time, so programmers need empty initializer clauses to explicitly initialize objects of empty class types without undue run-time costs.
This minor enhancement seemed harmless enough, so the committee agreed to augment the grammar rule for initializer-clause from:

initializer-clause: assignment-expression { initializer-list,opt }
to

initializer-clause: assignment-expression { initializer-list ,opt } { }
This change implies that

int a[5] = { }; // now ok
initializes every element to 0.

But Wait! There's More
CD registration was only one of several major milestones on the way to a formal C++ Standard. Although the rate of change has slowed considerably, the committee did resolve a couple of problems after CD registration by adding new keywords to C++. These keywords are explicit and typename.
Normally, a constructor that accepts one argument is also a user-defined conversion. For example, if class complex has a constructor complex(double), then any function with a formal parameter of type complex, such as

void f(complex);
can be called with an argument of type double. That is, a call such as

f(1.0);
passes 1.0 to the complex(double) constructor to create a complex object, and then passes that complex to f. Similarly, a function with a formal parameter of type const complex &, such as

void g(const complex &)
can also be called with an argument of type double by similar magic using the complex(double) constructor.
For classes like complex (and many others), the implicit conversions provided by single-argument constructors are what you want. Sometimes they are not. For example, a class vector might provide a constructor vector(int) so that users can declare vectors of a specified size, such as:

vector v(10); // ok
However, users might be surprised and dismayed to find that, given a function

void f(const vector &);
they can accidentally call

f(10); // surprise!
The committee added the keyword explicit so that class designers can turn off implicit conversions for individual constructors.
The keyword explicit is a function-specifier, like inline or virtual. It may only appear in the declaration of a constructor within a class declaration, as in:

class vector { public: explicit vector(int); // ok ... };
A constructor declared explicit is called a non-converting constructor. As with the keyword virtual, you cannot specify explicit in function definitions appearing outside class definitions.
The committee added the keyword typename to correct a very subtle problem in the rules for resolving names used in templates. In short, the problem is that when parsing a template definition, a compiler can't always tell which names designate types and which do not. The template author can use the keyword typename to give the compiler a little more help.
For example, given

template <class T> class X { void f() { T::A *a; ... } };
a compiler can recognize that T::A designates a member of (as yet unseen) type T, but it can't recognize that T::A is a type. Consequently, it will interpret

T::A *a;
as a multiply expression. Prefixing T::A with the keyword typename, as in

typename T::A *a;
indicates that T::A designates a type, so a compiler can recognize the line as a declaration.

References
1] Margaret A. Ellis and Bjarne Stroustrup, The Annotated C++ Reference Manual. Addison-Wesley, 1990.