May 1998/C++ Theory and Practice

Columns

C++ Theory and Practice: Partitioning with Namespaces, Part 2

Dan Saks

Namespaces should help you partition programs more cleanly, once the compilers agree on how to implement them.

Copyright © 1998 by Dan Saks

I continue this month with the programming example that I began last month. (See "C++ Theory and Practice, Partitioning with Namespaces, Part 1," CUJ, April 1998.) The example serves as a concrete basis for discussing alternative techniques for partitioning large C++ programs into simpler components.
The program is a cross-reference generator called xr. xr reads text from standard input and writes a cross-reference to standard output. The cross-reference output is an alphabetized list of the words (identifiers as in C++) that appear in the input. Each line in the output contains one word followed by the sequence of unique line numbers on which that word appears in the input.
The program began as a single source file written in Typesafe C (C that compiles as both C and C++). When I ended last month, the program consisted of two source files:

xr.cpp: the main part of the application, including the input processing

cross_reference.cpp: the cross-reference table implementation, that is, the function and data definitions that implement of the cross-reference data structure

and one header file:

cross_reference.h: the cross-reference table interface, that is, the declarations that an application (such as xr.cpp) needs to access the facilities provided by cross_reference.cpp

Using Incomplete Types

cross_reference.h appears in Listing 1. The header declares several names, all members of namespace cross_reference. Of those names, only add and put are truly parts of the cross-reference interface. The other names are really just implementation details. Ideally, such details shouldn't be in the header. Unfortunately, the definitions for add and put won't compile without them.
add's function heading doesn't refer to any names declared in namespace cross_reference. However, add is an inline function, so its definition (not just its declaration) must appear in the header. The body of that definition refers to other members of the namespace, namely, xr and add_tree. put is also an inline function, and its body refers to xr and put_tree.
The declarations for xr, add_tree, and put_tree all refer to yet another namespace member, tree_node. Last month, I asserted that tree_node must also be defined in the header to satisfy those references. Actually, a definition is overkill; a declaration is sufficient.
A declaration of the form
class-key T
    {
    ...
    };
(where class-key is either class, struct, or union) is also a definition. A declaration of the form
class-key T;
is just a declaration. It declares T as an incomplete type.

An incomplete type is one that has unknown size. Therefore, you cannot declare objects of an incomplete type T because the compiler doesn't know how much storage to allocate for a T object. However, you can declare objects of type T * because the size of a pointer is independent of what it points to. You can also declare objects of type T & because the size of a reference is independent of what it refers to. You can even declare a function with parameters or a return type of type T as long as you don't try to define or call that function.
For example, the incomplete type declaration
struct tree_node;
is sufficient to compile declarations such as:
extern tree_node *xr;
     
tree_node *add_tree
    (tree_node *t, char const *w, unsigned n);
It is even sufficient to compile the body of
inline
void add(char const *w, unsigned n)
    {
    xr = add_tree(xr, w, n);
    }
The code to copy a pointer (for parameter passing or assignment) is independent of the type that it points to.

You complete an incomplete type T by defining it. Only then can you declare objects of type T. However, a translation unit that never requires the complete type need not define the type. Thus, you need not define tree_node anywhere in cross_reference.h. The declaration
struct tree_node;
is sufficient to compile the rest of the header. Since the only reference to list_node is in the definition of tree_node, you need not mention list_node in the header at all. However, the function definitions in cross_reference.cpp require that both list_node and tree_node be complete types, so their definitions must appear early in that source file.

Defining Namespace Members

A slightly less cluttered version of cross_reference.h appears in Listing 2. It merely declares tree_node, and does not mention list_node at all. The definitions appear in cross_reference.cpp shown in Listing 3.
In the course of rewriting the header (from Listing 1 to Listing 2) , I made one stylistic change. In Listing 1, I defined the functions add and put inside the namespace definition. In Listing 2, I merely declared these functions inside the namespace definition, and defined them at global scope.
Just as you can define a class member either inside or outside its class definition, you can define a namespace member inside or outside its namespace definition. When the definition appears outside, you must refer to the member by its fully-qualified name. For example, when it appears in the namespace definition (as in Listing 1) , the definition for put is:
inline
void put()
    {
    put_tree(xr);
    }
When it appears outside the namespace definition (as in Listing 2) , the definition is:
inline
void cross_reference::put()
    {
    put_tree(xr);
    }
A class member function defined inside its class definition is an inline function even if not declared with the keyword inline. A namespace member function defined inside its namespace definition is not inline unless explicitly declared so.
Defining members outside their namespace definitions produces shorter namespace definitions. This makes it easier to see what's in the namespace. Easier, but still not necessarily easy. Unlike a class definition, which must name all its members in a single definition, a namespace definition can be split into disjoint segments distributed over separate source and header files. In other words, whereas class membership is closed once the class definition is complete, namespace membership remains open.
cross_reference.cpp (Listing 3) takes advantage of this openness. The include-directive
#include "cross_reference.h"
provides declarations only for the namespace members needed to compile the cross-reference interface. The subsequent namespace definition
namespace cross_reference
    {
    struct list_node;
    }
adds list_node as another member. Once again, this only declares list_node. The definition appears later at global scope:
struct cross_reference::list_node
    {
    unsigned number;
    list_node *next;
    };
Although limiting namespace member declarations to declarations (not definitions) makes the namespace definitions easier to read, it makes the member definitions a little harder to read. For example, when it appears inside the namespace definition, the definition for add_tree looks like:
tree_node *add_tree
    (tree_node *t, char const *w, unsigned n)
    {
    ...
    }
When it appears at global scope, the definition looks like
cross_reference::tree_node *
cross_reference::add_tree
    (tree_node *t, char const *w, unsigned n)
    {
    ...
    }
This function heading is rather long. I had to break it into three lines to get it to fit with this text. The first line is the function return type, the second is the function name, and the third is the parameter list.
Notice that the return type uses the fully-qualified name cross_reference::tree_node, but the parameter list uses just the unqualified name tree_node. When a function is defined outside its namespace (or class), the compiler looks up names appearing in the function's return type in the current scope (surrounding the function definition). It looks up names appearing in the parameter list and function body in the namespace (or class) scope before looking in the current scope.
I can state the lookup rule more generally with the help of a little terminology. The name being declared in an object or function declaration is the declarator-id. (See "The Column That Needs a Name: Understanding C++ Declarators," CUJ, January 1996.) In the declaration just above, the declarator-id is cross_reference::add_tree.
Now, here's a more general statement of the lookup rule. In any declaration with a declarator-id of the form X::n (where X is a class or namespace name), names appearing before the declarator-id are looked up in the scope surrounding the declaration. Names appearing after the declarator-id are looked up first in the scope of X, and then, if not yet found, in the scope surrounding the declaration.
This applies to the data member definition
cross_reference::tree_node *
cross_reference::xr = NULL;
which also appears in Listing 3. Here the declarator-id is cross_reference::xr. The type-specifier tree_node must be qualified by its namespace name so that the compiler's lookup will find it as a namespace member. NULL is a global symbol (actually a macro), so it needs no qualification.

Another way to write a null pointer constant is as zero cast to the appropriate type, as in:
cross_reference::tree_node *
    cross_reference::xr
        = static_cast<tree_node *>(0);
You need not qualify tree_node when it appears in this cast expression because the expression appears after the declarator-id cross_reference::xr. Thus, the compiler looks up tree_node as a member of namespace cross_reference.
If you moved this object definition into its namespace definition, you could write it more succinctly as:
tree_node *xr
    = static_cast<tree_node *>(0);
Coping with Reality

I believe Listings 2 and 3 are written in Standard C++. I can compile and link Listing 3 with the "main" module (from xr.cpp in Listing 8 last month) using Borland C++ 5.02 for Windows. The Borland compiler has only one minor complaint: it wants me to use the fully-qualified name for list_node when list_node appears in the definition of tree_node. That is, it wants me to write:
struct cross_reference::tree_node
    {
    char *word;
    list_node *first, *last;
    tree_node *left, *right;
    };
as
struct cross_reference::tree_node
    {
    char *word;
    cross_reference::list_node *first, *last;
    tree_node *left, *right;
    };
It's only a warning, but I'm always uneasy about ignoring warnings. They often come back to haunt me later.

Microsoft's Visual C++ 5.0 has a bit more trouble with Listings 2 and 3. It encounters link errors because it doesn't recognize the keyword inline in the function definitions for add and put in Listing 2 (the header). The linker complains that each function is defined more than once. The compiler apparently treats add and put as non-inline functions and generates a copy of each function (with external linkage) in both object modules in the program.
You can eliminate the error by adding the keyword inline to the declarations for add and put inside the namespace definition, as in:
namespace cross_reference
    {
    ...
    inline  // add this
    void put();
    }
....
inline      // this can stay or go
void cross_reference::put()
    {
    put_tree(xr);
    }
Microsoft's linker also complains that cross_reference::xr (as defined in Listing 3) is an unresolved external symbol. Moving the definition for xr from the global scope to the namespace definition quells that one.
Although I have a decided aversion to defining class member functions inside their class definitions, I haven't committed to a style for laying out namespace members. Given the difficulty that some compilers still have with member definitions outside namespace definitions, I'll probably stick to defining members within namespace definitions, at least for a while longer.

In Our Next Episode...

By removing unnecessary declarations from the header, Listing 2 is a bit of an improvement over Listing 1. However, list_node is not as well hidden as you might think. I'll explore this issue, and some other neat stuff, next time.

On Another Note

In the March, 1998 issue, I wrote a reply to a letter in which I explained why Standard C++ does not allow extraneous semicolons in most declarations. That response prompted this little missive:

Dan,

As a fellow who spends his professional life picking up where other programmers have left off, I was interested by your article on style in the March 1998 CUJ, p. 89. One needs a highly developed sense of humour doing this work and I smiled ironically as you went on to discuss what appears to be a discrepancy between the EBNF spec of Standard C++ and the compilers which most of us are familiar with. I think the following is worth mentioning:

If it were not for the awful style of EBNF definitions (which are great for machines to read but are too much trouble for application programmers) this discrepancy would probably have been ironed out. In this case I suggest supplementing EBNF with syntax diagrams as the Pascal User Manual and Report did in 1975. The readership and comprehension of the C++ Standard would be greatly enhanced.
Phil Howson
p.howson@dial.pipex.com
I agree with some of this, but not all.
First, let's distinguish between the grammar notation that I use and what the C++ Standard uses. The C++ Standard uses the notation that Kernighan and Ritchie [1] used to describe C. The K & R notation is passable, but for reasons I gave in an earlier column, I prefer Extended Backus-Naur Form (EBNF). (See "The C++ Column That Needs a Name: Grammar Notations," CUJ, November 1995.) I agree that syntax diagrams are even more readable, but they are harder to work with. As I wrote in that earlier column:

Syntax diagrams can be very vivid, but most of us don't yet have really good tools for editing the diagrams, splicing them into source code or pumping them through the Net. I know I don't, but if you think you have such tools, I'd like to know about them. Until I do, I'll stick with EBNF.

I've been using EBNF in my column as needed ever since.

I would be happy if the C++ Standard used EBNF, and even happier if it used syntax diagrams. However, using syntax graphs would have added to the editor's burden, and he had more important things to work on.
But the choice of grammar notations has nothing to do with the reason why compilers don't obey the C++ Standard with regard to extraneous semicolons. All the compilers I tested obey the grammar. The ones that got it "wrong" did so because they ignored the semantic constraints (written in prose), which eliminate constructs that the grammar allows.
I believe this problem is more likely the result of ongoing changes in the C++ Standard during standardization. Compiler writers had to make choices about which changes were most important to implement. Some decided, quite reasonably, that catching extraneous semicolons wasn't one of the biggies.

Reference

[1] Brian Kernighan and Dennis Ritchie. The C Programming Language (Prentice-Hall, 1978).

Dan Saks is the president of Saks & Associates, which offers training and consulting in C++ and C. He is active in C++ standards, having served nearly seven years as secretary of the ANSI and ISO C++ standards committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield, OH 45504-4906 USA, by phone at +1-937-324-3601, or electronically at dsaks@wittenberg.edu.