February 1999/C++ Theory and Practice

C/C++ Contributing Editors

C++ Theory and Practice: Partitioning with Classes

Dan Saks

Dan is back, continuing where he left off, concerning the relative merits of namespaces and classes in structuring code.

Copyright © 1998 by Dan Saks

Hi. I'm back from my sabbatical. Although I haven't appeared in CUJ for a while, I didn't disappear completely from the C++ scene. I continued to present on-site seminars and put in my regular appearances at conferences such as Software Development, Embedded Systems, and the European C & C++ Developers Forum. I also kept writing my column for Embedded Systems Programming magazine. (It's not that my ESP column is more important than my CUJ column, but that my ESP articles are typically less than half as long as my CUJ articles.)

I also attended the C++ standards committee meeting last October. Yes, the C++ Standard is done, but the committee's work isn't. We're dealing with defect reports and pondering future directions for the language and library.

Despite my continued professional activities, my hiatus from CUJ gave me a little extra time for other parts of my life. Among other things, I coached my son Jeremy's soccer team to a 6-1-1 record. In true Buckeye fashion, we dominated our competition all season, only to lose our last game.

Over the past several months, a number of you wrote to me or approached me at a conference to tell how much you enjoy reading this column and look forward to its return. Thanks, I needed that. It gave me added incentive to come back. Now that I am back, I hope I can live up to your expectations.

Where Were We?

Nearly a year ago, I embarked on a series of articles about namespaces. (See "C++ Theory and Practice: Partitioning with Namespaces, Part 1," CUJ, April 1998.) I tried to see how far I could get using namespaces, in combination with separate compilation and incomplete types, to decompose a C++ program into simpler discrete components. In the end, I concluded that, although namespaces may be useful, they do not replace classes as the primary tool for implementing data abstractions in C++. (See "C++ Theory and Practice: Classes vs. Namespaces," CUJ, July 1998.)

Throughout that series on namespaces, I experimented with namespaces by applying them in a sample program. This month, I'll start rewriting that program to show you how and why I'd use classes to reorganize the program. I'm aiming this discussion primarily at experienced C++ programmers. I'll assume you are already familiar with the basic concepts of classes, access specifiers, member functions, constructors, destructors, inlining, and maybe more. Although I may throw in occasional bits of remedial C++, my focus will be more on when and how to use classes, rather than on what classes are.

My programming example is xr, a cross-reference generator. I'm not sure how much detail about this example I should expect you to retain from the last time I wrote about it, so here's the summary once again. If you remember all this from before, just skim ahead.

xr reads text from standard input and writes a cross-reference to standard output. The cross-reference output is an alphabetized list of the words (identifiers as in C++) that appeared in the input. Each line in the output contains one word followed by the sequence of unique line numbers on which that word appeared in the input.

For example, if you use the source code for xr (Listing 1) as input to itself, the cross-reference output should contain the following line for the identifier token:
token:   35   37   38   39   40
token actually appears twice on line 38, but the output only mentions 38 once.

Listing 1 is nearly identical to the program I started with way back when. It's a single source file written in "Typesafe C," C that also compiles as C++. The program contains just four functions:

main

get_token: get a token from standard input.

add_tree: add an entry to the cross-reference for a word and its corresponding line number.

put_tree: write the alphabetized cross-reference listing to standard output.

xr uses a binary tree to maintain the words in alphabetical order. Each tree_node in the tree holds a word and its corresponding sequence of line numbers. xr stores each word as a null-terminated character array and each line number sequence as a singly linked list. Each list_node in the list contains one line number.

More precisely, each tree_node contains three pointers: word, first, and last. word points to the first character in the null-terminated spelling of a word. first and last point to the first and last list_node, respectively, in the singly-linked list containing the line numbers for that word. The program doesn't really need to store last as well as first, but having last handy simplifies the code that adds a new line number.

Each call to add_tree(w, n) searches the tree to determine if some tree_node already contains a match for word w. If such a node exists, and its list doesn't already contain line number n, add_tree adds n to that list. If no such node exists, add_tree creates a brand new node containing word w and a list containing just line number n.

In my last column before I took my leave, I added the const qualifier where I thought it appropriate. (See "C++ Theory and Practice: const in Parameter Lists," CUJ, September 1998.) Those const qualifiers appear in Listing 1.

Improving the Design

The main function's job is to track input line numbers, decide when to place a word and its line number into the cross-reference table, and decide when to print the table. main should be able to do all this without knowing that the table is actually a binary tree. Unfortunately, it does know. It "knows" in the sense that it declares the cross-reference table as:
tree_node *xr = NULL;
and it passes xr an argument to functions named add_tree and put_tree.

In the lingo of design methodology, we say that main is too tightly coupled to the tree implementation of the cross-reference table. That is, evidence that the table is a tree clutters main with unnecessary details that make the program just a little harder to read, and thus a little harder to get right and keep right. The tight coupling makes maintenance harder because, if you change the table's implementation, you will almost certainly have to change main as well.

The best way to reduce the coupling between main and the cross-reference table facilities is to encapsulate the table as an object of a class type. That class should hide the tree structure so that main need not, and cannot, mention anything about trees. main should see the cross-reference table as an abstraction that does useful things without revealing any of its internal machinery.

Transforming the Typesafe C program in Listing 1 into a well- written object-oriented C++ program will take me several steps. At each step you should notice improvement, but you might also notice problems that remain to be solved. Please be patient.

A Cross-Reference Class

Let's start by defining a class called cross_reference_table as an abstract type for cross-reference tables. At the very least, the class must provide members so that the application can:

create an empty cross-reference table,

store words and corresponding line numbers in the table, and

write the table to standard output.

The cross_reference_table class definition appears in the header file table.h, shown in Listing 2. It declares three public member functions:

cross_reference_table() (the default constructor), which initializes a table so that it's empty;

void add(char const *w, unsigned n), which adds an entry to the cross-reference for a word w occurring at line number n (if no such entry is already there);

void put() const, which writes the alphabetized cross-reference listing to standard output.

The class hides the root of the tree as a private member, declared as:
tree_node *root;
This declaration requires a prior declaration for tree_node. Since it declares root as a "pointer to tree_node" rather than as a tree_node, it does not require the complete type definition for tree_node. The incomplete type declaration:
struct tree_node;
is sufficient for the header. (See "C++ Theory and Practice: Partitioning with Namespaces, Part 2", CUJ, May 1998 for more on incomplete types.)

Every one of the member functions has a body containing only a single statement. It seems appropriate to declare them as inline functions. Their definitions appear in the header just after the cross_reference_table class definition.

The add member function calls add_tree, and the put member function calls put_tree. Thus, declarations for add_tree and put_tree must also appear in the header. The declarations for add_tree and put_tree refer to tree_node, but only through a pointer. Again, declaring tree_node as an incomplete type is sufficient.

The remainder of the class implementation appears in the source file table.cpp, shown in Listing 3. This includes the complete type definitions for list_node and tree_node, as well as the function definitions for add_tree and put_tree.

Functions main and get_token remain in source file xr.cpp, shown in Listing 4. Looking at the body of main, you can no longer tell that the cross-reference table is a tree. This is good.

In main, the declaration that was:
tree_node *xr = NULL;
in Listing 1 has been replaced by the more abstract declaration:
cross_reference_table table;
in Listing 4. This declares table as an empty cross_reference_table object.

Since a cross_reference_table object has a single data member of type tree_node * and no virtual functions, table occupies no more nor less storage than variable xr did before. The object declaration invokes the default constructor, but that constructor expands inline. Thus, the expansion produces code equivalent to:
table.root = NULL;
which generates the same code as the original pointer initialization in Listing 1.

Also in main, the call:
xr = add_tree(xr, token, ln);
from Listing 1 has been replaced by the member function call:
table.add(token, ln);
in Listing 4. This call expands inline as:
table.root = add_tree(table.root, token, ln);
which yields code identical to that which it replaces. Similarly, the call:
put_tree(xr);
has been replace by the member function call:
table.put();
which also expands inline yielding code identical to what it replaces.

And Now, a Word About const

The keyword const appears in two places among the public members in the definition for class cross_reference_table. It appears in the declaration of the first parameter of the add member function:
void add(char const *w, unsigned n);
because add has no business modifying the characters in a word that it adds to a cross-reference table. It also appears in the declaration of the put member function:
void put() const;
thus declaring put as a const member function. A const member function is one that cannot alter its receiver object (the object that this points to). That strikes me as a reasonable restriction on put. Writing a cross-reference table to cout shouldn't change the table's contents.

Further Refinement

Earlier, I said that the cross_reference class should hide the tree structure so that main need not, and cannot, mention anything about trees. main (in Listing 4) does not mention anything about trees, so the class as defined in the header (in Listing 2) apparently satisfies the "need not" part. However, the header still contains accessible declarations for tree_node, add_tree and put_tree, so it does not satisfy the "cannot" part.

You can make tree_node, add_tree, and put_tree inaccessible from main simply by declaring them as private members of the cross_reference_table class. Listing 5 shows header table.h with these additional members. tree_node is still an incomplete type, but now it's a member of class cross_reference_table rather than a global type. An incompletely defined member type is also known as a forward-declared nested class. (A struct is a class in C++.)

The complete definition for tree_node still appears in the source file table.cpp, shown in Listing 6. Now that tree_node is a member type, the complete definition must refer to tree_node by its fully-qualified name:
struct cross_reference_table::tree_node
    {
    char *word;
    list_node *first, *last;
    tree_node *left, *right;
    };
The definitions for add_tree and put_tree must also use fully- qualified names for the function names and return types. For example add_tree's definition looks like:
cross_reference_table::tree_node *
cross_reference_table::add_tree
    (tree_node *t, char const *w,
     unsigned n)
    {
    ...
    }
The function heading is so long that I had to break it into three lines to get it to fit with this text. The first line is the function return type, the second is the function name, and the third is the parameter list.

Notice that the return type uses the fully-qualified name cross_reference_table::tree_node, but the parameter list uses just the unqualified name tree_node. When a function definition appears outside the definition of its class or namespace, the compiler looks up names appearing in the function's return type in the current scope (surrounding the function definition). It looks up names appearing in the parameter list and function body in the class or namespace scope before looking in the current scope. (I described this lookup rule in slightly greater detail in "C++ Theory and Practice: Partitioning with Namespaces, Part 2," CUJ, May 1998.)

Some C++ compilers, particularly older ones, may have trouble with the definition for cross_reference_table::add_tree. The return type refers to tree_node, which is now a private member of class cross_reference_table, but the return type appears before the function's declarator and therefore outside the class scope. Although tree_node and add_tree are members of the same class, some compilers might deny add_tree access to tree_node in this context. However, the C++ standard is now pretty clear that this function definition is okay. It says:

In particular, access controls apply as usual to member names accessed as part of a function return type, even though it is not possible to determine the access privileges of that use without first parsing the rest of the function declarator.

Declaring add_tree and put_tree as class members introduces a little unnecessary overhead into the program. You might try finding it. I'll show you where it is, and what you can do about it in an upcoming article. And there's a whole bunch more to do after that.

Dan Saks is the president of Saks & Associates, which offers training and consulting in C++ and C. He is active in C++ standards, having served nearly seven years as secretary of the ANSI and ISO C++ standards committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield, OH 45504-4906 USA, by phone at +1-937-324-3601, or electronically at dsaks@wittenberg.edu.