September 1993/Stepping Up to C++

Columns

Stepping Up to C++

Rewriting and Reconsidering

Dan Saks

Dan Saks is the founder and principal of Saks & Associations, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-406, by phone at (513)324-3601, or electronically at dsaks@wittenberg.edu.
My last column described the general rules by which a C++ translator looks up names as it translates C++ source into object form. (See "Stepping Up to C++: Looking Up Names," CUJ, August 1993. In addition to lexical scope rules adopted from C (covering file, function, and block scopes), C++ has rules that govern the added complexities of nested class scopes. However, the lookup rules stated in the Annotated C++ Reference Manual (ARM) (Ellis and Stroustrup, 1990) are incomplete and somewhat inconsistent in handling complex lookup situations. The C++ standards committee's Core Language working group has spent most of its time ironing out these problems.
Last month, I also explained a new rule that prohibits a friend declaration (in one class) of a member function (in another class) from defining that member. This rule eliminated a confusing overlap of scope regions that had little, if any, practical value. This month, I'll explain other recent rules that cope with another problem area: the interaction of class member declarations in general and inline function definitions in particular with nested constants and types.

Inline Functions
In C++, as in C, a typical function call generates code that passes arguments and jumps to a single copy of the object code for that function. For very short functions, the generated code for the call may be longer than the body of the function itself. In that case, you probably want the compiler to expand each call on that function inline — that is, to expand each call into a copy of the function body with the actual arguments substituted for each of the function's formal arguments. Although I covered inline functions in an earlier column, it's been a while (see "Stepping Up to C++: Rewriting Modules as Classes," CUJ, July, 1991). Here's a brief review.
C++ offers several different ways to declare a function (either a member or a non-member) as inline. You can place the keyword inline before the function declaration, as in:

class X { public: inline X(); ... };
or, you can place it before the function definition, as in:

inline X::X() { ... }
Alternatively, you can define the function inside the body of the class definition, as in

class X { public: X() { ... } // implicitly inline ... };
Regardless of how you request inlining, the compiler need not comply with your request. In that sense, inlining is like using the register storage-class specifier. A compiler may ignore register specifiers, if for no other reason than that the target architecture doesn't have enough registers to satisfy all such requests. Similarly, a compiler may ignore an inlining request for a particular function for reasons such as:

the function is recursive

the function is too long

the function contain complex flow structures
Furthermore, a compiler must generate out-of-line code for an inline function if the program ever computes the function's address.

The Rewriting Rule
Section 9.3.2 of the ARM attempts to specify name lookup for inline member by the "rewriting" rule. The rule states: "Defining a function within a class declaration is equivalent to declaring it inline and defining it immediately after the class declaration; this rewriting is considered to be done after preprocessing but before syntax analysis and type checking of the function definition." The ARM elaborates these words with this example:

int b; struct x { char*f() { return b; } char *b; };
is equivalent to

int b; struct x { char *f(); char *b; }; inline char *x::f() { return b; }
and explains that "the b used in x::f() is x::b and not the global b."
The rewriting rule permits functions defined inside a class to refer to other members declared later in that class. This rule gives you the freedom to place function definitions in a logical order inside class definitions, even if some of those definitions refer to members declared later in the class.
Many C++ programmers, including me, use this freedom to place all the public members first, as in Listing 1. But, as I've often stated in the past (most recently in "Stepping Up to C++: Nested Classes," CUJ, July 1993), I generally discourage defining functions inside class definitions of production quality code. For short class definitions typical of books and journal articles, defining functions inside the class yields more concise and often more readable code. But long class definitions peppered with function definitions can be hard to read. In that case, I usually recommend applying the rewriting rule explicitly, as shown in Listing 2.
The rewriting rule applies to friend as well as member functions defined inside a class. However, it doesn't apply to friend functions defined outside a class. This asymmetry coincides with the asymmetry of the scope rules for friend functions that I cited last month: a friend function defined inside a class definition is in the (lexical) scope of that class, but a friend function defined outside the class definition is not. Listing 3 (which appeared as Listing 5 last month) illuminates this distinction.
Class X in Listing 3 declares two friend functions, f and g. f's declaration inside X is also its definition. The body of f is in the scope of X, so the use of k in the function body refers to the member X::k. On the other hand, g's definition appears outside the X's definition. Even though g's body appears identical to f's, it is not in the scope of X. Thus, the use of k inside g refers to the global k (defined before X), not the static member X::k.
The ARM and the current (June, 1993) working draft of the C++ standard both state (in section 11.4) that the rewriting rule for member functions is applied to friend functions defined in a class definition. Taken literally, I think this says that you can rewrite class X in Listing 3 as shown in Listing 4. But this literal interpretation can't be right because it changes the semantics of f. When f's definition appears inside X, the k used in f refers to X::k. But when you "rewrite" f as an inline function defined after X, the k in f refers to ::k, just like the k in g.
Rather, the appropriate rewrite" for f is
inline void f(X &x) { x.i = ++X::k; }
which eliminates the confusion by referring to k by its fully-qualified name X::k. I believe the intent of the ARM is clear, even though the words need a little work. Just another little thing for the standards committee to fix.
The ARM does explain that you are not supposed to take the rewriting rule too literally. It is only a "semantic notion" and does not dictate any particular implementation technique. In fact, there are situations where applying the rewriting rule can literally produce syntax errors, as in the case of local classes.
A local class is a class declared inside a function. For example, Listing 5 shows local class X inside function f. The class name X, like any name with block scope, is local to f's body. Since you can't refer to X outside f, you must define X's member functions inside f as well. You can only define member functions of a local class inside the class definition, as in Listing 5.
You cannot apply the rewriting rule explicitly, as attempted in Listing 6, because C++, like C, doesn't allow nested function definitions. Even though you can't literally rewrite X's member functions, the rewriting rule still applies in that an inline member function definition may refer to a member declared later in the class.

The Scope of a Member
The rewriting rule establishes a general principle for interpreting member functions defined inside a class, but it doesn't precisely define the scope of a class member. Although the ARM never says so explicitly, the members of WG21+X3J16 (the C++ standards committee) generally agreed that the scope of a member extends at least from the point immediately after that member's declaration to the end of the class definition. The ARM's example accompanying the rewriting rule makes it pretty clear at the scope of a class member also includes the bodies of an functions defined inside the class, even those functions defined before the member's declaration. What isn't clear is whether the scope of a member includes other parts of the member declarations that precede it.
Consider class File in Listing 7, which provides an inline public member function, good, that reports the health of a File object, and another inline member, reset, for reviving a sick File. File and stream objects often provide such error-handling facilities to handle resource allocation failures (such as being unable to open a file or allocate adequate buffer space) or device errors. Function good returns a Boolean result — a File is either good or it isn't. Function reset sets the error status. By default, it puts a File in a good state.
Since C++, like C, has no standard Boolean type, File defines its own using
enum bool {FALSE, TRUE};
To avoid possible name conflicts with other parts of the program, File defines bool as a member (a nested type). bool appears as a private type after the definitions of all the public member functions.
Clearly, the scope of File::bool includes the declarations that follow it, up to the end of the class. Thus, the declaration
bool ok;
is in the scope of bool, and so ok is also the enumerated type.
The rewriting rule exam be in the ARM suggests that the body of the constructor
File() { ok = FALSE; }
compiles as if were written immediately after the class definition. Thus inside the constructor, ok refers to the data member File::ok, and FALSE refers to the member constant File::FALSE. But how can C++ compiler rewrite
bool good() { return ok; }
void reset(bool b = TRUE) { ok = b; }
after the class definition? Does the scope of bool include the return types and parameter lists of functions declared before bool?
For a time, the committee considered simply defining the scope of a member to include the entire class definition, not just the subsequent declarations. Unfortunately, this rule leads to some horrendous parsing problems, so the committee abandoned it. The committee finally settled on a revised scope rule that seems to fit existing practice pretty well. Basically, the scope of a class member includes the declarations that follow it (to the end of the class), plus all function bodies, default function arguments, and constructor initializers in that class (whether defined before el after that member).
This new rule explicitly states that all function bodies defined within the class definition are in the scope of every class member, thus preserving the rewriting rule. However, the rule omits function return types from the scope of members declared later in the class. Thus, good's return type specifier is outside the scope of member bool, so the function declaration is in error.
The scope of a class member includes default function argument values inside the class definition, but not the formal argument types. Thus, in
void reset(bool b = TRUE) { ok = bool(b); }
the default argument value TRUE refers to the member constant declared later. However, the formal argument type is not in the scope of the nested type bool, so this function is also in error. On the other hand, the function-like cast in the function body is a valid reference to the nested type.
The scope of a class member also includes all constructor initializers for that class. Thus, if you rewrite
File() { ok = FALSE; }
as
File() : ok(FALSE) { }
ok and FALSE still refer to the class members declared later.

Limiting Context Sensitivity
A class definition is sensitive to the context in which it appears. That is, names declared in the scope(s) surrounding a class definition affect that definition's meaning. For example, the string classes in Listing 1 and Listing 2 rely on a type definition for size_t in the surrounding context. This context sensitivity should come as no surprise — it's a direct consequence of the lexical scope rules. The problem with this context sensitivity is that, if unconstrained, it can mask programming errors.
For example, suppose the program in Listing 7 included some other header prior to the definition for File, and that header contained its own (distinct) definition for type bool at file scope. You can simulate this supposition by simply defining
typedef int bool;
before the definition of class File. Now, the previously erroneous forward references to nested type bool refer to this prior definition for bool in the surrounding context, and to and behold, two of the four compilers I tried (Borland 3.1 and Microsoft 7.0) compiled and executed the program. The other two compilers (Corneau 3.0 and Metaware 3.0) objected to the forward reference to TRUE as a default argument of reset. When I removed the default argument, they compiled and executed the program as well.
If you really want to see whether a particular use of bool binds to the global or the nested type, define the global type so that its size is different from the size of an enumeration type. For example, on the MS-DOS compilers I use, the size of an enumeration is either one or two bytes, so I defined the global bool using
typedef long bool;
Thus, when the program executes
cout << sizeof(f.good()) << '\n';
it displays the size of the return type of File::good as 4 (the size of a long), and when it executes
cout << sizeof(f) << '\n';
it displays the size of a File object (which contains a single member of type File::bool) as either 1 or 2 (the size of an enumeration).
Even though compilers will accept this code, you must admit it's pretty flaky. It's especially sinister that the two uses of bool in
void reset(bool b = TRUE) { ok = bool(b); }
refer to different type definitions. In fact, the ARM never intended this code to compile.
Section 9.9 of the ARM places limits on the context sensitivity of the rewriting rule and of member declarations in general. It says: "A class-name or a typedef-name or the name of a constant used in a type name may not be redefined in a class declaration after being used in the class declaration, nor may a name that is not a class name or a typeder-name be redefined to a class-name or a typeder-name in a class declaration after being used in the class declaration." The ARM illustrates this rule with an example shown in Listing 8.
This rule hints at a general principle which seems very reasonable, namely, that a class definition containing an erroneous forward reference to a class member should not be accepted by a C++ translator just because the name of the forward-referenced member happens to be declared in a scope enclosing the class definition. For example, Listing 8 shows struct Y declaring member a of type T. The rewriting rule does not apply to data member declarations, so this T refers to the global T defined immediate Y before Y. Then Y defines T as a nested type, and declares another member b of type T. Fortunately, the limits on context sensitivity rule out the nested type definition. Otherwise, you might have a struct (a class) with two members of type T actually having different types.
Unfortunately, the rule obscures the general principle by being too broad in some ways and too narrow in others. For example, class X in Listing 8 defines, among others, two members
int f() { return sizeof(c); }
char c;
The rewriting rule suggests that this usage is okay — c in the body of f should refer member c. But the member declaration for c redeclares a typedef-name in the scope enclosing class X, so the ARM says c's declaration is an error. However, had the c at file scope been an object rather than a type, the declaration of member c would have been just fine.
In contrast, consider
int c;
struct Z
   {
   char s[sizeof(c)];
   char c;
   };
The ARM's limits on context sensitivity don't prohibit this usage. Yet, if c at file scope had been a typedef instead of an object, the ARM would disallow the declaration of member c. The committee agreed that this definition for struct Z is bogus no matter how you declare c at file scope.
Note that, in the previous example (from class X in Listing 8)
struct X
   {
   int f() { return sizeof(c); }
   char c;
   };
is a legitimate forward reference, but in the last example
struct Z
   {
   char s[sizeof(c)];
   char c;
   };
is not. The difference is that function bodies (along with default arguments and constructor initializers) are subject to the revised class scope rule, but member declarators are not.

The Reconsideration Rule
In an effort to increase the apparent consistency of the language rules, WG21+X3J16 replaced the rule limiting the context sensitivity of the rewriting rule with the "reconsideration" rule. The reconsideration rule simply states that a name N used in a class S must refer to the same declaration when evaluated both in its context (the names declared prior to that use of N) and in the completed scope of S (the set of names available to a member function of S defined outside of S).
One of the compilers I tried (Borland 3.1) actually appears to enforce the reconsideration rule in its "ANSI" mode. I expect the others to follow suit.

Errata
An alert reader, Roger Zeitel of Teaneck, NJ, caught an error in one of my string class examples in my article on "Function Name Overloading" (CUJ, November, 1991). The same bug appears in the member function definitions for both String::cat(const char *) and String::cat(const String &) in Listing 3 on page 107 of that issue. Listing 9 in this article shows the problem and the repair.
Both functions correctly build a new character array containing the result, but fail to update the len data member to reflect the new string length. Simply adding the line
len = n;
as the last line in each function body should correct the problem. This is a good example of a common error, namely, leaving an object in an inconsistent state. Cargill (1992) has a nice treatment of these kinds of errors in chapter 2 of his book on C++ style.
Another problem which appears several times in that listing, as well as other listings in that same article, is that I did not use the delete [] notation to delete arrays. In most implementations, deleting a pointer to the first element of array of objects without destructors works even if you omit the []. But strictly speaking, such delete expressions have undefined behavior — compilers need not get them right. So I should have put the [] in the delete expressions. You should too.

References
Cargill, Tom. C++ Programming Style. Reading, MA: Addison-Wesley, 1992.
Ellis, Margaret A. and Bjarne Stroustrup. The Annotated C++ Reference Manual. Reading, MA: Addison-Wesley, 1990.