August 1993/Stepping Up To C++

Columns

Stepping Up To C++

Looking Up Names

Dan Saks

Dan Saks is the founder and principal of Saks & Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or electronically at dsaks@wittenberg.edu.
Last month I introduced nested classes (see "Stepping Up to C++: Nested Classes", CUJ, July 1993). A nested class is one declared inside another class. Early C++ implementations treated nested classes just as C treats nested structs, namely, as if they were not nested. The ARM (Ellis and Stroustrup, 1990) revised the C++ language definition so that each class defines a full-fledged scope. Now, every identifier declared inside a class is in the scope of that class.
With the advent of nested classes, the name lookup rules for C++ programs became much more complicated. Name lookup is the process by which a translator (a compiler or an interpreter) matches a use of a symbolic name in a translation unit with a declaration for that name in the same translation unit. The lookup rules stated in the ARM seem to work fine for most simple cases, but they are incomplete and inconsistent in handling many complex cases.
The C++ standards committee's Core Language working group has spent much of its time these last three years ironing out name lookup problems. This month's column explains these problems and the new language rules designed to solve them.

Basic Concepts
When you use an identifier in a program, the translator matches that use with the identifier's declaration. This matching process is called "name resolution" or "name lookup." The translator uses the type and access attributes established in the declaration to insure that you've used the identifier properly. Then it uses other attributes (like address or alignment of a variable) to produce object semantics (like machine code or symbolic debugging information).
C++ insists that you declare an identifier before using it (except if it's a statement label). C is a bit more lax. It lets you refer to some identifiers, like function names, before declaring them. If the compiler can't find a declaration, it simply assumes that the name has certain default attributes. For example, it assumes an undeclared function returns an int.
C++, like C, lets you declare the same identifier with completely different attributes, as long as the declarations are in different scopes. The scope of a declaration is the region of program text over which that declaration remains in effect. A name must be unique in its scope. That is, you cannot declare the same name with different attributes in the same scope.
Although function name and operator overloading appear to violate this restriction, they do not. C++ lets you use the same identifier as the name of two different functions in the same scope. However, any two such overloaded functions must have distinct signatures. A function's signature is the sequence of types in its formal parameter list. For example, the signature of

int fputs(const char *s, FILE *stream);
is

(const char *, FILE *)
Thus, the name of a function in C++ is not just its identifier, but its identifier combined with its signature. For any other declared entity, such as a type or an object, its name is just its identifier. But, because of overloading, we speak of name lookup rather than identifier lookup. It any case, it remains true that a name must be unique in its scope. (For more on overloading in C++, see "Stepping Up to C++: Function Name Overloading", CUJ, November, 1991.)
C++ and C share a common set of scope regions:

A name declared outside any function or class (or struct in C) has file scope, which terminates at the end of the translation unit.

A name declared inside a block (enclosed in braces) has block scope, which terminates at the brace that closes the block. Names in a function's formal parameter list are in the scope of the outermost block of that function.

A statement label has function scope that encompasses the entire body of the function in which it appears.
Together, file, block, and function scopes are sometimes called lexical scopes.
Scopes may be nested. In fact, all block scope regions are nested inside the file scope region, and blocks may be nested inside other blocks. A name declared in an outer scope may be declared differently in an inner scope. The declaration in the inner scope hides the declaration at the outer scope during name lookup, as shown in the program fragment in Listing 1.
For example, variable i declared on line 6 has block scope. It hides the i declared at file scope on line 1 so that the reference to i on line 8 refers to the i declared on line 6.
As another example, formal parameter p on line 4 has block scope that terminates at the end of function on line 15. However, inside the brace-enclosed block on lines 9 through 12, variable p declared on line 10 hides formal parameter p.

Class Scope
C++ introduces yet another category of scope regions:

A name declared inside a class (that is, a name that is a class member) has class scope.
Classes introduce complications in the scope rules because class scope regions are not necessarily contiguous regions. Lexical scope regions are reasonably tidy. They may have other scopes nested inside that temporarily hide names, but they are always contiguous. Class scopes can be, and often are, broken into disjoint pieces. A class scope consists not only of the class definition itself, but also the non-inline member function definitions (including constructor initializers) and static data member initializations.
Outside its scope, you must refer to a class member by explicitly prefixing it with any of

X: (where X is the class name)

y. (where y is an object, or reference to an object, of class X)

p-> (where p is a pointer to an X)
Inside its scope, you can refer to a class member by using its unadorned member name (that is, its name without any of the above prefixes).
Listing 2 shows some of the disjoint regions that make up a class scope. The scope of each member of class X begins at its declaration and continues to the brace (on line 9) that closes X's definition. This includes the inline function definition for member reset (on line 8), where the unadorned references to k and MAX refer to the static member X:: k and the member constant X::MAX, respectively.
Both the body and the constructor initializer of the non-inline constructor X::X() are also in the scope of class X. Thus, i and MAX in the initializer list (on line 13) refer to the corresponding members of X. k and reset in the constructor's body (lines 15 and 16) also refer to members of X. MIN refers to the constant declared at file scope on line 11.
Finally, the initializing expression in the definition for static member X::k is also in the scope of X. Thus, the unadorned use of MAX on line 19 refers to X::MAX.

Lookup in Nested Classes
Adding nested classes to C++ complicated the lookup rules even further. The ARM simply didn't say enough to rule out alternative interpretations of the rules. The C++ committee's Core Language working group worked with a variety of simple examples that revealed the differences of opinion. Some of these examples appeared in the committee meeting minutes captured by yours truly. Scott Turner of Liant Software collected others in a committee paper summarizing the lookup issues. Listing 3 shows one of the examples on which the working group members finally based their agreement.
Listing 3 shows class X at file scope with class Y nested inside. The listing actually declares X and Y as structs, but in C++ a struct is a class, and a class is a struct. A struct is convenient in these examples because struct members are public by default. This eliminates any concerns about access violations that only cloud the name lookup discussion.
X::Y has a member function that's defined out-of-line (lines 13 through 17). The listing declares i in four different places:

as a static member of X (X::i)

as an ordinary member of X::Y (X::Y::i)

as a global variable (::i)

as a local variable of X::Y::
The question is: to which declaration does the i on line 16 (in the body of X::Y::f) refer?
Obviously, it refers to the local variable declared on line 15. But if you remove that declaration, which declaration wins? Name lookup finds the member of X::Y declared on line 6. If you remove that declaration, the next choice is the member of X declared in line 3. Finally, if you delete that declaration, name lookup finds the global variable declared on line 11.
The numbered comments to the right of the listing summarize the hiding pattern. A declaration commented by a given number hides all declarations with higher numbered comments.
By the way, the i member of X is declared static to satisfy a restriction I presented last month in my discussion of nested classes. The this pointer in function X::Y::f points to an object of type X::Y. It does not have any objects of type X handy, so it has no way to locate an ordinary data member of an X object. By making X::i static, X::Y::f can access it even without any X objects lying around.
Removing the keyword static from that declaration does not affect name lookup. That is, if you removed the declarations on lines 6 and 15, then the use of i on line 16 refers to X::i, even if it's non-static. However, the translator produces a diagnostic complaining that a nested class cannot access an ordinary data member of an enclosing class.
In summary, the algorithm for resolving unadorned names in non-inline members is more-or-less as follows:
1. Look in block scopes from the innermost block to the block that is the function body.
2. For each class name that appears right to left in the fully-qualified name of the function, look in that class scope. For example, for function X::Y::f, look in Y, and then look in X.
3. Look in enclosing scopes, from the innermost scope out to file scope.

Derivation and Nesting
Listing 4 shows another Core Language group example that combines derivation and nested classes. It declares class B1 inside class A1 and class B2 inside class A2. Furthermore, A2::B2 is publicly derived from A1::B1. Yup, you really can do that.
The listing declares i in four different places:

as a global variable (::i)

as a static member of A1 (A1::i)

as an ordinary member of A1::B1 (A1::B1::i)

as a static member of A2 (A2::i)
And again the question is: to which declaration does the i in the body of A2::B2::f (on line 23) refer?
Exactly as written, it refers to A2::B2::i — thei declared on line 8 that A2::B2 inherited from A1::B1. If you remove that declaration, then the i on line 23 refers to the static member A2::i declared on line 14. And, if you delete that declaration, it refers to the global i declared on line 1.
The i on line 23 can never refer to A1::i declared on line 5. A2::B2 may inherit members of A1::B1, but it can't see names in the scopes enclosing A1::B1. Stated another way, although A1::B1 is in the scope of A1:: i, classes derived from A1::B1 are not.

Inline Friend Definitions
A function that is a friend of a class is not a member of that class, but it can access private and protected members of that class. (I introduced friends in "Stepping Up to C++: Operator Overloading, Part 3", CUJ, May 1992). A function cannot demand friendship. Only a class can grant friendship by a friend declaration inside the class definition, as in

class X { friend void f(X &x); private: int; // ... };
In this case, the function definition must appear later, such as in

void f(X &x) { ++x.i; }
A friend declaration can also be a definition. A friend function defined inside a class definition is implicitly declared inline, and is in the (lexical) scope of that class. A friend function defined outside the class definition is not in the class scope. Class X in Listing 5 illustrates this point.
Class X declares two friend functions, f and g. The declaration for f on line 7 is also its definition. The body of f is in the scope of X, so the use of k in the function body refers to the member X::k. On the other hand, the definition of function g appears outside the definition of class X. Even though g's body (on line 11) appears identical to f's, it is not in the scope of X. Thus, the k on line 11 refers to the global k (defined on line 1), not the static member X::k.
A class can declare a member of another class as a friend. For example,
class X
   {
   void f();
   };

class Y
   {
   friend void X::f();
   };
declares member f of class X as a friend of class Y. Hence, X::f can access private members of Y as well as X objects.
As they waded through the name lookup issues, members of the Core Language group discovered that a friend declaration of a member function can also define that member. Actually, they discovered that even though they didn't know what it meant, nothing in the ARM precluded such a definition. Of course, the addition of nested classes only compounded the confusion.
For example, Listing 6 shows class Y nested inside class X, and a separate class Z that defines member f of X::Y as a friend. The listing declares i in two separate places: as a static member of X and as a static member of Z. And now the question: to which declaration does the i inthe body of X::Y::f refer?
The fact is, nobody knew. Nearly everyone on the committee agreed that this sort of declaration was so confusing, and of such little value, that we simply banned it by adding this rule:

A friend declaration of a member may not define that member. However, a friend declaration of a non-member may still define that non-member.

References
Ellis, Margaret A. and Bjarne Stroustrup. 1990. The Annotated C++ Reference Manual. Reading, MA: Addison-Wesley.