June 1996/C++ Theory and Practice

Columns

C++ Theory and Practice

Dan Saks

Abstract Declarators, Part 1

A thorough understanding of declarators is essential to mastering C++ syntax. This month Dan gives special attention to one of the most commonly occurring declarators.
Copyright ©1996

Declarators are central to object and function declarations in both C and C++. A declarator is a name being declared, along with any operators that modify that name. In C, the declarator operators are * (for pointers), () (for both function calls and grouping) and [] (for arrays). In C++, they also include & (for references) and pointer-to-member operators of the form T::*. Both the * and T::* operators can have trailing const and/or volatile qualifiers.

For example, the declaration
unsigned char *const p[N], &f(), c;
has three declarators. The first, *const p[N], declares that p is an array with N elements of type const pointer. The second, f(), declares that f is a function returning a reference. The third declarator, c, declares that c just is.

The keywords and type names that appear before the first declarator are called decl-specifiers. The sequence of decl-specifiers, though part of the declaration, is itself a distinct syntactic entity called a decl-specifier-seq. The decl-specifier-seq in the previous example has two decl-specifiers, namely, unsigned and char. Because they appear in the same declaration, all three declarators share this decl-specifier-seq. Thus p is an "array with N elements of type pointer to unsigned char", f is a "function returning reference to unsigned char", and c is an "unsigned char".

Although the details are sometimes daunting, the basic idea of declarators is not all that complicated. But, for whatever reasons, an awful lot of C and C++ programmers have never even heard of declarators, let alone understood them. C++ programmers have an even tougher time grasping the concept because so many of their mentors and colleagues persist in writing declarations such as
const char* f();
which use spacing to group the declarator operator * with the decl-specifiers rather than with the declarator.

Thus, for the past several months, I've been making a concerted effort to raise the collective declarator-consciousness of the civilized world. Six months ago, I explained the decl-specifier-seq as a construct distinct from the declarator-list that follows it (see "The Column That Needs a Name: Understanding C++ Declarations," CUJ, December 1995). The following month, I examined the declarators themselves (see "The Column That Needs a Name: Understanding C++ Declarators," CUJ, January 1996). Over the succeeding two months, I showed how to write C++ functions that can parse (read and verify) C++ declarators as part of a program, decl, which parses declarations and translates them into English (see "The Column That Needs a Name: Parsing C++ Declarations," CUJ, February and March 1996).

Table 1 presents the productions (grammar rules) for the declarations that my decl program accepts. The productions are written in the EBNF notation, summarized in Table 2. The productions in Table 1 define the form of a simple-declaration, which is what the C++ draft standard calls an object or function declaration.

Although declarators probably appear most often in simple-declarations, they have lots of others uses in C++. Among other places, declarators appear in conditions, exception declarations, function definitions, and parameter declarations. Declarators also come in an assortment of flavors. In addition to plain vanilla declarators, there are abstract-declarators, new-declarators, member-declarators, and conversion-declarators. These assorted declarators pop up in cast expressions, sizeof expressions, new expressions, and elsewhere.

I believe that understanding the various forms for declarators can really improve your understanding of many parts of the C++ language. Therefore, the consciousness raising continues this month with a look at one of these alternative declarative forms, namely, abstract-declarators.

Declarators without Names

As explained at the beginning, an ordinary declarator is built around a name. That name is called the declarator-id. My declaration parsing program accepts declarator-ids only if they are simple identifiers, but a declarator-id in full-blown C++ can have other forms. For example, in a declaration such as
size_t string::length() const;
the declarator-id is string::length (a qualified-id). In
string &operator+=(char);
the declarator-id is operator+= (an operator-function-id). And, in
~string();
the declarator-id is ~string (a destructor name).

An abstract declarator is a declarator without a declarator-id. The C++ grammar uses abstract declarators in a bunch of places, including cast expressions, parameter declarations, exception declarations, exception specifications, template argument lists, and sizeof expressions. Let's look at some examples.

When you define a function that has parameters, you typically give each parameter a name so that you can access the parameter inside the function body. For example,
char *memcpy
    (void *dst, const void *src, size_t n)
    {
    // copy n characters from src to dst
    }
defines function strcpy with three formal parameters, dst, src, and n. The body uses the names dst, src, and n to access the actual arguments passed to the function in a call.

Each of the three parameter declarations
void *dst
const void *src
size_t n
follow essentially the same form as a simple-declaration. In the first one, void is the decl-specifier-seq and *dst is the declarator. In the second, const void is the decl-specifier-seq and *src is the declarator. In the third, size_t is the decl-specifier-seq and n is the declarator.

When you declare (rather than define) a function, you don't need to provide the function body. In this case, the formal parameter names have function prototype scope. That is, their scope lasts only to the end of the enclosing function declarator. For example, in the following declaration
char *memcpy (void *dst, const void *src, size_t n);
the parameters dst, src, and n all go out of scope at the right parenthesis that ends memcpy's declarator, and never come back into scope anywhere else in the program. Although the formal parameter types in memcpy's definition must be the same as in any of memcpy's declarations, the formal parameter names need not be the same.

Thus, formal parameter names appearing in a function declaration are really just for show. No statements in the program can actually refer to those particular names from that particular declaration's prototype scope. Consequently, C++ (and C as well) lets you omit some or all of the formal parameters names, as in
char *memcpy (void *, const void *, size_t);
Each parameter declaration still has the same basic syntactic structure as it did when it had a named parameter. A parameter declaration such as const void * still has a decl-specifier-seq, const void, and a declarator, *. Only now the declarator-id (the name) is missing, so the declarator is an abstract declarator.

The third parameter declaration in memcpy's function definition is size_t n. In this case the declarator is just a declarator-id. When we dropped that declarator-id, the parameter declaration reduced to just size_t. It's still true that parameter declarations have two parts: a decl-specifier-seq and a declarator. However, there are two different ways to view the second part. The difference is rather subtle, but it does affect how you write the grammar.

Grammars for Abstract Declarators

The grammar in the C++ draft standard takes the view that abstract-declarators, like ordinary declarators, are non-empty. That is, an abstract declarator must contain at least one token. The production for parameter declarations is thus:
parameter-declaration =
    decl-specifier-seq
    [ declarator | abstract-declarator ].
The second part of a parameter declaration is either a non-empty ordinary declarator, or a non-empty abstract declarator, or nothing at all.

Table 3 shows the productions for non-empty abstract declarators. These productions refer to ptr-operator, array-suffix, and function-suffix defined in Table 1. If you think of each production as a parsing function that consumes tokens, you will find that you cannot "execute" the abstract-declarator "function" without consuming at least one token. That is the sense in which abstract-declarators are non-empty. Ordinary declarators (defined in Table 1) have the same property.

The other tenable view is that the second part of a parameter declaration is not optional, but one of the choices is an abstract declarator which may be empty. In that case, you use parentheses instead of brackets around the second part of the production for parameter-declaration with parentheses:
parameter-declaration =
    decl-specifier-seq
    ( declarator | abstract-declarator ).
Leaving the brackets doesn't hurt, but in a way, it's redundant. Since one of the choices may be empty, the entire set of choices is optional. However you write it, you must use either the brackets or the parentheses. Without either, the production becomes
parameter-declaration =
    decl-specifier-seq
    declarator | abstract-declarator.
which is equivalent to
parameter-declaration =
    ( decl-specifier-seq declarator ) |
    abstract-declarator.
This is most definitely wrong; it says an abstract all by itself (with no decl-specifiers) can be a parameter declaration.

The productions in Tables 3 and 4 are equivalent. I believe the choice comes down to:

which one produces a more readable description?

which one is easier to map into a parser?

Sometimes these goals conflict. In this case, I think they both favor the productions in Table 4, in large part because the production for direct-abstract-declarator in Table 4 looks much like the production for direct-declarator in Table 1. The production for direct-abstract-declarator in Table 3 looks a bit different. The rules in Table 4 suggest you can merge ordinary declarator and abstract-declarator into a single grammar rule and a single parsing function; the rules in Table 3 do not. In fact, in the next month or two I will merge the grammar into a single rule.

Other Uses for Abstract Declarators

Abstract declarators are useful for composing type names on- the-fly in expressions. For example, the following declaration
D &dr = (D &)b;
uses the cast-expression (D &)b. The C++ grammar calls that thing inside the parentheses a type-id. Much like a parameter declaration, a type-id consists of two parts: a type-specifier-seq and an abstract declarator (which may be empty). In the previous example, the type-specifier-seq is D and the abstract declarator is &; together they mean "reference to D".

Notice that the leading part of a type-id is a type-specifier-seq, whereas the corresponding part of a parameter-declaration (Table 3 or 4) or simple-declaration (Table 1) is called a decl-specifier-seq. Both type-specifiers and decl-specifiers include identifiers and type keywords such as int, long, and unsigned. However, only decl-specifiers include storage class specifiers such as extern, register, or static; function specifiers such as inline or virtual; and other non-type-specifiers such as friend and typedef.

The grammar in Table 1 does not show any of the non-type-specifiers because the decl program does not accept them. Table 1 makes no distinction between a decl-specifier and a type-specifier; however, the draft C++ Standard does. Only type-specifiers may appear in a type-id.

It appears from the grammar in the draft standard that you can use any decl-specifiers in a parameter declaration. For example, the grammar allows
void f(extern int n);
However, C++ imposes semantic constraints which prohibit most of the non-type-specifiers from parameter declarations. In fact, the only non-type-specifiers C++ does allow in parameter declarations are auto and register, and it pretty much ignores even those.

Abstract declarators appear (as part of type-id) in the new-style casts as well. For example, you can rewrite the earlier cast example as
D &dr = static_cast<D &>(b);
(if not now, then someday). The thing inside the < > is a type-id.

Speaking of < and > as angle brackets, abstract declarators also appear in template arguments (again as part of type-ids). For example, given
template <classT> class list;
then the declaration
list<const char **> todo;
declares todo as an instance of the list template with type argument const char **. The abstract declarator is **, meaning "pointer to pointer". Thus, todo is a "list specialized for type pointer to pointer to const char".

Abstract declarators can also appear in exception declarations in handlers. For example, the decl program contains the following handler that catches parsing errors:
catch (const recoverable_error &re)
    {
    }
(see "The Column That Needs a Name: Recovering from Parsing Errors," CUJ, April 1996). The declaration at the beginning of the handler is called an exception-declaration. In this example, it declares re as "reference to const recoverable_error". When this handler catches an exception, re binds to a copy of the thrown object so that the handler can access the thrown value. But in this case, the handler doesn't care what that value is. The parser throws exceptions to transfer control, not to transmit a value. Therefore, the handler can omit the formal parameter name (the declarator- id) re from the declaration, as in
catch (const recoverable_error &)
    {
    }
In this case, the & becomes the sole constituent of the abstract declarator.

Abstract Function Declarators

In practice, most abstract declarators that you write are no more complicated than the ones I've shown you so far. But, C and C++ being what they are, you occasionally run into an abstract declarator that gives you pause, or worse. The most complicated ones usually involve pointers to functions or pointers to member functions.

For instance, you can write the declaration for the Standard C/C++ atexit function as
int atexit(void (*)());
void (*)() is an abstract declarator specifying the type "pointer to function with no parameters returning void".

The key to deciphering an abstract declarator such as this is figuring out where the declarator-id would have been if this were an ordinary declarator. The first few times you try this, you may have to resort to some trial and error. In this particular example, eliminating places where the declarator cannot go is pretty easy because most of them just don't look right.

Using f as the declarator-id, let's try plugging f into different places in the declarator and then throwing away the bogus ones:

(1) void f(*)()
(2) void (f*)()
(3) void (*f)()
(4) void (*)f()
(5) void (*)(f)

(2) is no good because the pointer operator, *, is only a prefix operator, yet (2) uses it as a postfix operator. (Aside from its definition, ptr-operator occurs in only one place in Table 1, and that's just before a declarator on the right-hand side of the production for declarator.)

In (1), (*) cannot be a parameter list because * by itself (without at least one leading decl-specifier) is not a legitimate parameter declaration. (*) cannot be a pointer operator for two reasons. First, grouping parentheses in a declarator cannot enclose just an operator — they must enclose the declarator-id as well. (Grouping parentheses appear in the production for direct-declarator in Table 1. ) Secondly, even if the grouping parentheses were okay, the (*) appears as a postfix operator. So (1) does not have a valid declarator.

We can easily dismiss (4) and (5) because they also use grouping parentheses around a solitary pointer operator.

This leaves only (3) standing. Clearly, it declares f as "pointer to function with no parameters returning void". Hence, that is also the meaning of the abstract declarator you get when you remove f.

The seemingly simple parameter-declaration
int ()
poses an interesting problem. Do you obtain it from
int x()
meaning "function returning int"? Or, do you obtain it from
int (x)
meaning just "int" (where the parentheses are just redundant grouping)?

The grammar for abstract declarator in Table 3 shows that an abstract declarator may be completely absent from a parameter-declaration, but it cannot be empty. If the abstract declarator () comes from (x), then the () must be grouping around an empty abstract declarator. But this cannot be. Therefore, () must come from x(). The parameter-declaration has type "function returning int".

Looking Further Ahead

I'm working my way toward augmenting my decl program to recognize abstract declarators as well as ordinary declarators. Once I add abstract declarators, adding non-empty parameter-lists will be pretty easy. Not surprisingly, adding recognition for these constructs complicates the parsing algorithm. Moreover, it complicates the scanning because the parser must be able to peek further ahead down the token stream.

Thus far, the decl program needs to look at most one token beyond the current token to decide what to do next. The parser normally reads tokens one at a time by calling scanner::get. When it needs to look ahead at the next token so it can know what to do with the current token, the parser just calls get once more, looks at the token, and then calls scanner::unget to back up to the previous token. The next call to get will return the previously "ungot" token.

In fact, the current parser implementation needs to look ahead in only one particular place in the situation, namely, when the decl-specifier-seq parsing function encounters a type name. The specific problem is in distinguishing between a type name that is a decl-specifier, as in
const T *p;
and a type name that is part of a pointer-to-member declarator, as in:
const T::*pm;
Here, if the type name T is not followed immediately by a ::, it is a decl-specifier, and the parser continues looking for more decl-specifiers (which it may or may not find). However, if the type-name T is followed by a ::, the parser assumes that the T:: is the leading part of a pointer-to-member declarator. In either case, the parser must back up and resume parsing with the type name T.

A parser that recognizes abstract declarators and function declarators with non-empty parameter lists must look even further ahead. For example, suppose the parser has seen
int (
Is that parenthesis grouping around a declarator, or is it the delimiter before a parameter list (the beginning of an array-suffix)? Suppose the next symbol is a type name, T. That is, suppose the parser has seen
int (T
Well, it's still to early to tell. T could be the start of a pointer-to-member declarator operator, as in
int (T::*pmf)();
In this case, we have pmf is a "pointer to member of T with type function with no parameters returning int". Or, T could be the start of a parameter list, as in
int (T &);
In this case, we have an abstract declarator of type "function with one parameter — reference to T — returning int".

Therefore, the parser still doesn't know what to make of that left parenthesis; it must look at another token. If that token is ::, then the parenthesis is grouping around a declarator. Otherwise, it's the start of a parameter list. Whatever the outcome, the parser must back up two tokens to the parenthesis, and resume parsing from there.

The current scanner implementation does not support two consecutive calls to unget without an intervening call to get. Thus, you cannot use it to look ahead more than one token. If the parser is to accept abstract declarators and non-empty parameter lists, it must look ahead at least two. Is it sufficient to extend the lookahead to just two?

I generally subscribe to the 0-1-infinity rule, and I think it applies here. It's okay to disallow a feature, or support one instance of it, but once you go beyond one, you might as well go all the way.

I've modified the scanner to allow backing up over an arbitrary number of tokens. I will show you the implementation next month.

Dan Saks is the president of Saks &Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601 (the area code changes to 937 after September 1996), or electronically at dsaks@wittenberg.edu.