July 1991/Stepping Up To C++

Columns

Stepping Up To C++

Rewriting Modules As Classes

Dan Saks

Dan Saks is the owner of Saks & Associates, which offers consulting and training in C, C++ and Pascal. He is also a contributing editor of TECH Specialist. He serves as secretary of the ANSI C++ committee and is a member of the ANSI C committee. Readers can write to him at 287 W. McCreight Ave., Springfield, OH 45504 or by email at dsaks@wittenberg.edu.
This is the third in a series of columns on gradually migrating to C++ by transforming an existing C program into a C++ program. (See "Writing Your First Class," CUJ March 1991, and "Your First Class," CUJ May 1991.) I have tried to produce a C++ program that encapsulates design and implementation decisions better than the original C program. In the process, I hope I've been changing the way you think about large programs.
The program I've been rewriting is xr, a cross-reference generator that prints an alphabetized list of words appearing in a document. Each line in the cross-reference output contains a word from the input document followed by the sequence of line numbers in the document on which that word appears.
The implementation of xr uses two kinds of data structures — a binary tree and a linked list. Each node in the tree stores a word along with that word's sequence of line numbers. The line numbers for each word are stored in a singly-linked list rooted in the tree node for that word. The original C program declares both the tree and list structures at the main level, so they are known to all parts of the program.
Earlier in this series, I partitioned the program into three source files and two header files. The main program, xr.cpp, appears in Listing 1. xr accesses the cross-reference table via functions declared in xrt.h (see Listing 2) and defined in xrt.cpp (see Listing 3) . In turn, xrt.cpp manipulates the line number sequences as objects of class ln_seq, declared in ln_seq.h (see Listing 4) . The first two parts of this series show the implementation of class ln_seq, as well as the body of function getword.
The cross-reference table module, xrt.cpp, is still organized much like a C program. It uses C++ features to access ln_seq objects, but it still provides a C-like interface to the main program. As I promised last time, I will now complete the program's metamorphosis, by rewriting xrt.h and xrt.cpp as a class.

Changing The Interface
To put a new C++ face on the cross-reference table, change the existing interface functions in xrt.h into public member functions of a new class xrt. xrt_add and xrt_print become xrt::add and xrt::print, respectively. xrt_destroy becomes the destructor xrt::xrt. Move the static data object root from xrt.cpp into the class definition in xrt.h as a private data member, and add a constructor xrt::xrt to initialize root to null. The resulting class definition appears in Listing 5.
Because root is of type treenode *, class treenode is now also declared in xrt.h. In turn, treenode requires the definition of class ln_seq, so xrt.h includes ln_seq.h. I'd rather not put the declarations for treenode and ln_seq in the header, because then they become known to the main level, and that's exactly what I set out to avoid when I started rewriting the program.
The simplest way to get around this problem is to declare the class name treenode in xrt.h without actually declaring the class, as shown in Listing 6. Declared this way, treenode behaves like an incomplete type as in C. That is, you can declare pointers to treenodes (like xrt::root), but you can't do anything with treenode objects themselves until you complete the class declaration.
Thus you can leave the declaration for treenode, as well as include ln_seq.h, in xrt.cpp. When you include xrt.h in xrt.cpp, these declarations provide a complete declaration for treenode, making it usable. When you include xrt.h in the main program, treenode remains incomplete and essentially useless. Using incomplete types may be the simplest way to hide treenodes and ln_seqs from main, but it does have a serious drawback, described later in this article.
Strictly speaking, C++ doesn't have incomplete types. My reading of The Annotated C++ Reference Manual [1] is that declarations such as
class X;
behave like incomplete types in C, but the C++ description doesn't use the words "incomplete type." I tested my code with three different C++ compilers, and all of them compiled and executed my code as I expected. Members of the C++ standards committee are trying to incorporate the C terminology for incomplete types into the C++ standard.
Listing 7 shows a new version of main that uses the class xrt. I've added a declaration for an xrt object called x and replaced the calls to xrt_add and xrt_print with calls to x.add and x.print, respectively. I also removed the call to xrt_destroy, because the compiler automatically invokes x's destructor just before returning from main.

Changing The Implementation
Changing xrt.cpp to match xrt.h requires only a few simple changes (see Listing 8) . You replace the declaration of root:
static treenode *root = 0;
with the definition of a constructor that initializes root:
xrt::xrt() : root(0) { }
The initializer : root(0) initializes root with 0 and is equivalent to
xrt::xrt()
   {
   root = 0;
   }
You also change the names of xrt_add, xrt_destroy, and xrt_print to their new member function names. Be sure to remove the void return type specifier when you change xrt_destroy to ~xrt, because destructors (like constructors) can't have a return type.

Trimming The Excess
In the first column in this series, I created xrt_add and xrt_print (which are now xrt::add and xrt::print) to hide the existence of root from main. The sole purpose of xrt_add (xrt_print) is to call addtree (printtree) with root as an additional argument. These functions cost an extra function call and return every time the program adds an entry to the table or prints the table, but I was willing to pay that price for the improved encapsulation. Now that root is a private data member of xrt, and not a static variable in xrt.cpp, you no longer need this extra function-call layer.
I will consider simplifying addtree first, and then apply similar changes to printree. Try reproducing these changes on your own system. I found that as I applied each small change, C++'s type and access restrictions guided me to the next step. The compiler's messages were very helpful.
You eliminate the extra function layer from xrt::add by merging the body of addtree into xrt::add and discarding addtree.addtree is recursive, so you must translate the recursive calls to addtree into recursive calls to xrt::add. That is, you translate the call

t->left = addtree(t->left, w, n);
in the body of addtree into

t->left.add(w, n);
in the body of xrt::add.
This transformation forces you to alter the declaration of treenode. If t->left.add(w, n) is supposed to be a recursive call on xrt::add, then t->left must be of type xrt. Thus, you must change the declarations of treenode::left and treenode::right from

treenode *left, *right;
to

xrt left, right;
Notice that, even after this change, treenode is still a self-referential data structure. Before, left and right were pointers to other treenodes; now they are objects of class xrt whose root pointers point to other treenodes. Because class xrt has only one data member, namely treenode *root, xrt objects occupy exactly the same amount of storage as treenode pointers. To some of you, it looks like I just added an extra level of pointers to each treenode, but I assure you I haven't. An object of class xrt doesn't point to a root pointer — it's an object whose only data member is a root pointer. Thus, changing the types of left and right does not affect the storage efficiency of the cross-reference table. It merely adjusts the data types to facilitate writing xrt::add as a recursive function.
What becomes of argument treenode *t in the body of addtree when you move the code into the body of xrt::add? In addtree, t is the root of the subtree for the current recursive call. In xrt::add, you change all the occurrences of t to root (the root member of the current xrt object).
You should omit the return statement at the end of addtree from xrt::add. addtree returns t (the root of the subtree), and is always invoked by statements of the form

t = addtree(t, w, n);
However, xrt::add has a void return type. It needs no return value because the calls are of the form

x.add(w, n);
That is, a call to x.add modifies x.root directly.
As you rewrite xrt::add, you must apply similar changes to xrt::print. You move the body of printtree to xrt::print, replacing all occurrences of t with root, and changing calls of the form

printtree (root->left);
to

t->left.print();
The new implementation of xrt.cpp appears in Listing 9.

Tuning The Constructors And Destructors
Look at the treenode destructor in Listing 9, where I placed the lines
delete left;
delete right;
inside comments. When I changed treenode::left and treenode::right to objects of class xrt, these delete statements produced compilation errors. The operand of delete must be a pointer returned by new; however, left and right are no longer pointers.
The original intent of the delete statements was to discard the left and right subtrees of a treenode. Now that left and right are xrt objects, you no longer need these deletes, because the xrt class has a destructor that deletes its root pointer. The treenode destructor automatically applies the xrt destructor to left and right, thus discarding the subtrees.
Changing left and right from pointers to objects has a similar effect on the treenode constructor. The first conditional block of statements in xrt::add (see Listing 9) handles adding a cross-reference entry to an empty tree. That is, the statements
if (root == 0)
   {
   root = new treenode(n);
   root->word =
      new char[strlen(w) + 1];
   strcpy(root->word, w);
// root->left = root->right = 0;
   }
create a new leaf of the tree containing word w at line number n. After I changed treenode::left and treenode::right to xrt objects, this assignment also produced a compilation error, so I placed it in a comment.
Originally, this assignment (when it was in addtree) made the left and right subtrees empty. Now that left and right are xrt objects, the effect of these assignments is performed by the treenode constructor, invoked by
root = new treenode(n);
The treenode constructor automatically invokes the default xrt constructor to initialize left and right. The default xrt constructor sets the root pointer to zero.
Shifting more work to the treenode constructor simplifies xrt::add even further. You add char *w as an argument to the treenode constructor, and move the statements
root->word = new char[strlen(w)
   + 1];
strcpy(root->word, w);
to the constructor's body. Since strcpy returns its first argument, these two lines can be replaced by
root->word = strcpy (new       char[strlen(w)+ 1], w);
These changes to the treenode constructor and destructor, and their impact on xrt::add, appear in Listing 10.

Inlines: The Finishing Touch
xrt.cpp contains several extremely small functions, such as the constructors and destructors for treenode and xrt. For such small functions, the overhead of function entry and exit can be much greater than the code in the function body. In such cases, it's usually preferable to have the function body expanded inline at the point of the call. In C, you implement inline function expansion using function-like macros. In C++, you use inline functions.
There are two ways to ask the compiler to expand a function inline. You can simply place the keyword inline before the function definition, as in

inline int abs(int i) { return i >= 0 ? i : -i; }
Alternatively, for member functions, you can place the function definition inside the class declaration, as in

class ln_seq { public: ln_seq() { first = last = 0; } ... };
Inline functions are safer than macros because macros can have dangerous side effects. For example, given the macro

#define abs(i) ((i) >= 0 ? (i) : -(i))
the call

y = abs(*p++);
expands to

y= ((*p++) >= 0 ? (*p++) : -(*p++));
which increments p twice. In contrast, inline functions behave the same as non-inline functions. That is, they evaluate their arguments exactly once upon function entry.
Declaring a function inline (using either form) is only a hint to the compiler. The compiler can ignore your request. If you absolutely need inline expansion for a function, you should write it as a macro.
To request inline expansion for the treenode constructor and destructor, place the keyword inline before the function definitions in xrt.cpp. If you want inline expansion for the xrt constructor, move it from xrt.cpp to xrt.h. The xrt constructor is called from main (in xr.cpp) as well as from the treenode constructor (in xrt.cpp). If you leave the function definition as an inline in xrt.cpp, the compiler will generate a non-inline function body to satisfy the constructor invocation in main. You can merge the constructor xrt::xrt into xrt.h in either of two ways, shown in Listing 11 and Listing 12.
Do not make the xrt destructor into an inline function. Leave it as a non-inline function in xrt.cpp. Moving the destructor's definition into xrt.h causes a very subtle bug in the program. The problem is that the xrt destructor calls
delete root;
Because root is of type treenode *, the delete also applies the treenode destructor to *root. But in xrt.h, treenode is an incomplete type. When you move xrt's destructor into the header, the compiler can't always determine that xrt's destructor should invoke the treenode destructor. When main invokes the xrt destructor, it sees xrt::root as a pointer to an incomplete type. The destructor returns the root to the free store, without destroying any of the subtrees.
Using incomplete types can be hazardous to the health of your C++ program. None of the C++ compilers I used produced any diagnostics to warn me of this problem.

Post Mortem
I started with a C program implemented as a single 114-line source file. I ended with a C++ program implemented as three source files and two header files totaling nearly 200 lines. The object code for the C++ program increased by about 4 percent above the C program. However, the C++ has additional code to discard the cross-reference table (the destructors), which isn't present in the C code. If you discount the destructors, the C++ source code shrinks to 168 lines, and the increase in the object code size falls to less than 2 percent. There is a cost in object code efficiency, but it's very small.
Almost all of the extra source code comes from the class definitions in the header files. This points to a key difference between C and C++ programs. In C++, specifying the interface between components is part of the compiled language, and not just a matter of style and commenting conventions. By enforcing the interfaces between components, C++ helps make large programs more maintainable.

A Minor Correction
In my last column the text misstated that the declaration of struct treenode (in Listing 7 of that column) was changed to class treenode. Listing 7 still shows treenode as a struct. I meant to write the declaration as
class treenode
   {
public:
   char *word;
   ln_seq lines;
   treenode *left, *right;
   treenode (unsigned);
   };
This change is purely cosmetic; it doesn't affect the behavior of the code.

References
[1] Ellis, Margaret A. and Bjarne Stroustrup, The Annotated C++ Reference Manual. Addison-Wesley, Reading, MA, 1990.