Columns


Stepping Up To C++

Changes in the Scope Rules

Dan Saks


Dan Saks is the president of Saks & Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or electronically at dsaks@wittenberg.edu.

In July 1994, the ISO C++ standards committee, in concert with the ANSI C++ standards committee, voted to submit their working draft of the C++ Standard for registration as a committee draft (CD). CD registration is the first of three ballots that a proposed standard must pass before it becomes a bona fide international standard. In March 1995, the C++ committees voted to submit a further revised draft for the next round of approval, called the CD ballot (not to be confused with the previous CD registration ballot).

In the past five installments of my column, I explained the standardization process and the state of the C++ language itself. I described the state of the language by contrasting it with C++ as presented in the base document, the Annotated C++ Reference Manual (ARM) [1]. In those five installments, I managed to cover all the extensions to C++, both major and minor, approved by the C++ standards committees for inclusion in the draft standard.

In addition to extending the base language, the current draft also changes the behavior of numerous constructs from their previous behaviors specified by the ARM. Some changes simply prohibit features that were never intended to be, yet somehow slipped through the cracks. Others changes actually redefine the behavior of some constructs that already had well-defined behavior.

I listed all the changes as of CD registration in "Stepping Up to C++: C++ at CD Registration," CUJ, January 1995. Now I'd like to explain them. There have been so many changes, I can't describe them all in one column. This month I focus on those changes that affect the scope rules.

Declaration Scope within Conditionals

The scope rules in the ARM gave rise to the following peculiarity, described in Section 6.7 [Declaration Statement]:

An auto variable constructed under a condition is destroyed under that condition and cannot be accessed outside that condition. For example,

if (i)
    for (int j = 0; j < 100; j++) {
        // ...
    }
if (j != 100)
    // error: j outside condition
    // ...
Here, j is an auto variable, and it's initialized (constructed) within an if statement (under a condition). The scope of j is the block enclosing the for statement, which in this example, is the block enclosing both if statements.

If the conditional expression in that first if is true (i is non- zero), then the program constructs j upon entering the statement controlled by the if statement, and destroys j upon leaving. If the condition is false, then the program never constructs j. Thus, by the time execution reaches the second if in the example, either j has been constructed and destroyed, or it has never been constructed. Either way, the program can't access j in the second if statement, or anytime thereafter. (The same problem would occur if the for statement were enclosed in some other conditional statement, such as a do, for, switch, or while.)

This example highlights a very uncommon situation in C++, one in which the lifetime and scope of a named object do not end at the same time. After the end of the first if statement, the name j is still in scope, but the program can't access j either because its lifetime has ended or because its lifetime never began.

The committees decided to eliminate this strangeness by changing the scope rules. The C++ draft now states that the conditionally-executed statement inside an if, do, for, while, or switch implicitly defines a local scope. In other words, the body of each conditional statement compiles as if it had curly braces around it. Thus, the previous code example should compile as if written with an extra set of curly braces:

if (i) {
   for (int j = 0; j < 100; j++) {
      // ...
}

}
if (j != 100) // still an error, but now
   // ...   // because j is not in scope
The committees also decided that this change eliminated the need for the ARM's restriction that a single conditionally-executed statement cannot be a declaration (ARM Sections 6.4 [Selection Statements] and 6.5 [Iteration Statements]). For example, the ARM prohibits

if (i)
   int i = 0; // error
An annotation in the ARM explains that:

If a declaration could be the only statement after an if or else, it would introduce a name of uncertain scope and definite uselessness.

The committees reasoned that the revised scope rule removed the uncertainty about the scope of names declared under a condition. They also decided that such declarations were not necessarily useless, because some C++ programs use declarations merely to gain the side effects of executing a constructor and/or destructor. Thus, C++ now permits the conditionally-executed statement in both selection-statements (if and switch) and iteration-statements (do, for, and while) to be a declaration.

Declaration Scope within a for-init-statement

The previous rule changed the scope of names declared in a conditional statement, but it did not change the scope of a name declared in the for-init-statement (the initialization step) of a for statement. The Tun-Time Type Identification (RTTI) proposal introduced new rules for declarations that conflicted with the existing rule for declarations in for-init-statements, prompting the committees to change the latter. (I introduced RTTI in "Stepping Up to C++: C++ at CD Registration," CUJ, January, 1995 and declarations in conditionals in "Stepping Up to C++: Minor Enhancement to C++ as of CD Registration," CUJ, February, 1995.)

The ARM describes the scope rule for a for-init-statement in Section 6.5.3 [The for Statement]:

If the for-init-statement is a declaration, the scope of the names declared extends to the end of the block enclosing the for-statement.

For example,

{
for (int i = 0; i < N; ++i)
   {
   // ...
   }
if (i == N) // OK, i is still in scope
   // ...
}           // i is now out of scope
One consequence of this rule (which some programmers find annoying) is that

{
for (int i = 0: i < N; ++i)
   {
   // ...
   }
for (int i = 0; i < N; ++i) // error
   {
   // ...
   }
}
is an error because the second for-init-statement redefines i. According to the ARM, you must write the second for statement as

for (i = 0; i < N; ++i)
   {
   // ...
   }
The RTTI (Run-Time Type Identification) proposal included an extension to allow declarations in conditional expressions of for, if, switch, and while statements. The scope of a name declared in a condition is the statement(s) controlled by that condition. For example,

if (D *dp = dynamic_cast<D *>(bp))
   {
   // dp is in scope here...
   }
// ...but not here
This new rule combined with the ARM's rule for the scope of names in a for-init-statement to produced arguably surprising behavior — namely, that two objects seemingly declared in the same construct have different scopes. For example,

for (int i = 0; int j = f(i); ++i)
    {
    // ...
    }
// i is in scope here, but j is not
Many committee members thought this behavior would be hard to defend. After weighing several alternatives, the committees decided to change the scope of a name declared in a for-init-statement to be the same as the scope of a name declared in the condition of the corresponding for statement. The C++ draft now states:

Names declared in the for-init-statement, condition, and controlling expression parts of if, while, for, and switch statements are local to the if, while, for, or switch statement (including the controlled statement), and shall not be redeclared in a subsequent condition or controlling expression of that statement nor in the outermost block of the controlled statement.

With this change, i and j in the previous example have the same scope. Furthermore, previously valid (though perverse) code like the following is now ill-formed:

for (int i = 0; j = f(i); ++i)
   {
   int i = 1; // was OK; isn't anymore
   // ...
   }
because it redefines i.

The Scope of Class Members

The ARM introduces the basic rules for names defined in class scope in Section 3.2 [Scopes], and adds many refinements in Chapter 9 [Classes]. Unfortunately, the ARM's scope rules are both incomplete and inconsistent. (In the politically-correct tone of the 90's, Anthony Scian of Watcom once referred to the ARM as "completeness challenged.") Many of the problems revolved around two rules in the ARM: the "rewriting" rule in Section 9.3.2 [Inline Member Functions] and the rule limiting the context sensitivity of class member declarations in Section 9.9 [Local Types].

The ARM's rewriting rule states that:

Defining a function within a class declaration is equivalent to declaring it inline and defining it immediately after the class declaration; this rewriting is considered to be done after preprocessing but before syntax analysis and type checking of the function definition. Thus,

int b;
struct x {
    char* f() { return b; }
    char* b;
};
is equivalent to

int b;
struct x {
     char* f();
     char* b;
};
inline char* x::f()
{ return b; }
Thus the b used in x::f() is x::b and not the global b.

The rewriting rule permits functions defined inside a class to refer to other members declared later in that class. This gives programmers considerable freedom in arranging in situ function definitions (function definitions appearing inside a class definition), even if some of those definitions refer to members declared later in the class.

The rewriting rule applies only to in situ function definitions; it does not apply to any other member declarations. For example,

struct x
   {
   int v[N];      // error
   enum { N = 10 };
   };
is an error because the declaration for data member v refers to member N prior to N's definition.

The commentary in the ARM explains that the rewriting rule is a "semantic notion" and should not be taken literally. Rather, the rule establishes a general principle for interpreting in situ member function definitions. Unfortunately, this rule doesn't spell out the behavior of some constructs very well. The rule limiting context sensitivity of class member declarations (which I'll describe momentarily) clarifies some cases, but adds confusion to others.

As with any declaration, a class definition is sensitive to the context in which it appears. That is, names declared in the scope(s) surrounding a class definition can affect that definition's meaning. For example, in

typedef int T;
class X
   {
public:
   T f(T t) { T u = t + 1; return u; }
   // ....
   };
the declaration of X::f uses the definition of T from the surrounding context. This context sensitivity is a normal consequence of the nested scope rules in C++. However, if left completely unconstrained, this context sensitivity can produce some pretty confusing results, or even mask programming errors.

For example, suppose class X in the previous example has a nested type T defined after member function f:

typedef int T;
class X
   {
public:
   T f(T t) { T u = t + 1; return u; }
   typedef double T;
   };
The ARM's example accompanying the rewriting rule suggests that the scope of a class member includes the bodies of all functions defined inside its class, even those functions defined before the members declaration. Thus, the T in the body of X::f refers to X::T (which is double). But the ARM offers no clue as to whether the rewriting rule also extends to other parts of the function definition, such as the parameter list and the return type.

The ARM's rule limiting the context sensitivity of member declarations rules out many, but not all, of the questionable cases. That rule states:

A class-name or a typedef-name or the name of a constant used in a type name may not be redefined in a class declaration after being used in the class declaration, nor may a name that is not a class name or a typedef-name be redefined to a class-name or a typedef-name in a class declaration after being used in the class declaration.

The ARM illustrates this rule with the example shown in Listing 1.

Class X in Listing 1 contains two violations of the rule. X' s member declaration

char v[i];
refers to the enumeration constant i defined in the enclosing scope. The subsequent member declaration

enum { i = 2 };
violates the limits on context sensitivity by redefining i as a member after i has already appeared in v's declaration. The other violation occurs because X's member function definition

int f() { return sizeof(c); }
refers to the typedef-name c from the enclosing scope, but the next member declaration

char c;
redefines c after begin used.

Struct Y in Listing 1 contains another violation. The rewriting rule does not apply to data member declarations, so the declaration

T a;
refers to the global T defined immediately before Y. Then Y defines T as a nested type, and declares another member b of type T. Were it not for the limits on context sensitivity, you might wind up with a struct with two members of type T that actually have different types.

The rule limiting context sensitivity of member declarations hints at a general principle, namely, that a C++ compiler should not accept a class definition containing an erroneous forward reference just because the name of the forward-referenced member happens to be declared in some scope enclosing the class definition. Unfortunately, the rule obscures this general principle by being too broad in some areas and too narrow in others.

Consider class X in Listing 1, again. Class X defines, among others, two members

int f() { return sizeof(c); }
char c;
declared in that order. Forgetting the declarations in the surrounding context for the moment, the rewriting rule suggests that this is okay — c in the body of f should refer to member c. But the member declaration for c redeclares a typedef-name in the scope enclosing class X, so the ARM says c's declaration is an error.

The committees agreed that, in this case, the rule limiting context sensitivity contradicted the rewriting rule unnecessarily; it's clear that the c in the body of X::f refers to X::c, and the rewriting rule should prevail. Moreover, if you were to change the declaration for c at file scope from

typedef int c;  // c is a typedef-name
to:

int c;          // c is an object
then it would no longer exceed the limits on context sensitivity, and the declaration of member c would be just fine. This inconsistent treatment of type and non-type names bothered many committee members.

As another example of the ARM's inconsistency, consider

int c;
struct Z
   {
   char s[sizeof(c)];
   char c;
   };
The ARM's limits on context sensitivity don't prohibit this. Yet, if c at file scope had been a typedef instead of an object, then according to the ARM, the declaration of member c would be an error. The committees agreed that this definition for struct Z should be unacceptable no matter how you declare c at file scope.

The committee decided to remove the inconsistencies and eliminate the unresolved cases by replacing the rewriting rule and context sensitivity rule with a completely reworded set of class scope rules:

1) The scope of a name declared in a class consists not only of the text following the name's declarator, but also of all function bodies, default arguments, and constructor initializers in that class (including such things in nested classes).

2) A name used in a class S shall refer to the same declaration when re-evaluated in its context and in the completed scope of S.

3) If reordering member declarations in a class yields an alternate valid program under (1) and (2), the program's meaning is undefined.

The first rule effectively replaces the rewriting rule. It says that the scope of a member m of a class X includes all the member function bodies, including those bodies that appear before m' s declaration. The rule also clearly states that default argument expressions and constructor initializers are subject to this "rewriting," but by implication, function return types and parameter types are not.

The second rule effectively replaces the rule limiting the context sensitivity of class member declarations. It more or less says that if you apply the rewriting rule to a member declaration and, in so doing, change that declaration's meaning, then the declaration is in error.

The third rule handles some cases not covered by the first two rules. Consider this example:

class X
   {
   int f(int (T));
   typedef int T;
   };
Assuming there's no type T defined in the surrounding context, then function X::f has a parameter of type int named T (the parentheses around T are merely redundant). On the other hand, when evaluated in the completed scope of class X, X::f has a parameter of type "function taking int returning int."

At first you might think this violates rule 2, but it doesn't. Rule 2 says that the two interpretations for T can't refer to different declarations. But in the first interpretation for X::f, T does not refer to anything; it's the name of a formal parameter declared at that point. Therefore, this declaration does not "refer" to a declaration of T; it declares T, and thus can't violate rule 2. Now, since the scope of X::T does not include the parameter declaration of X::f (by rule 1), it follows that the first interpretation for X::f prevails, namely, X::f has a parameter of type int named T.

Now, suppose you rewrite the class definition as

class X
   {
   typedef int T;
   int f(int (T));
   };
Then clearly the T in f's declaration refers to X::T, so the other interpretation for X::f prevails, namely, X::f has a parameter of type "function taking int returning int."

This example exposes a situation where simply reordering the member declarations yields two distinct interpretations for the class, neither of which violates rule 2. The committees generally agreed that any class definition with this property was potentially troublesome and should be an error. However, the standard mandates that the compiler must produce a diagnostic for a program that contains an error. This means the compiler might have to try every ordering of the members for each class to determine if alternate valid interpretations exist. A class with just 20 members (not unusually large) has 20!=2.43 x 1018 permutations of those members. Assuming that a 100 MIPS machine can compute each permutation in one instruction (a rather optimistic assumption), it would only take 771.5 years to compute those permutations. That's more than enough time to go get a snack.

Since today's C++ translators can't do the computation in a reasonable time, the committees settled for rule 3, which says programs with this troublesome property have undefined behavior. Undefined behavior is an error that the translator need not diagnose. A translator may diagnose the error, or it may simply choose to ignore it. The onus of avoiding undefined behaviors usually falls on you, the programmer.

The new scope rules change the behavior of the ARM's examples demonstrating limits on context sensitivity for member declarations. The current draft's examples (updated from Listing 1) appear in Listing 2. Here, the member declaration for X::c is no longer an error. The other constructs that caused errors still cause errors, but the exact points of those errors have moved.

Point of Declaration for an Enumerator

When does the scope of an enumerator begin? The ARM gives two different answers (it's "consistency-challenged"). In Section 3.2 [Scopes] it says:

The point of declaration for an enumerator is immediately after the identifier that names it. For example,

               enum { x = x };
Here,... the enumerator x is initialized to its own (uninitialized) value.

However, in Section 7.2 [Enumeration Declarations] it says:

An enumerator is considered defined immediately after it and its initializer, if any, has been seen.

The committee had to choose between the two, so it chose the latter because, among other reasons, Standard C uses the same rule.

The current draft now says:

The point of declaration for an enumerator is immediately after its enumerator-definition. For example:

               const int x = 12;
               { enum { x = x }; }
Here, the enumerator x is initialized with the value of the constant x, namely 12.

References

[1] Margaret A. Ellis and Bjarne Stroustrup. The Annotated C++ Reference Manual (Addison-Wesley, 1990).