You don't have to make each class highly reusable from the outset. An incremental approach is often faster and quite good enough.
Copyright © 1999 by Dan Saks
For the most part, program design is about figuring out how to carve up a relatively complicated task into a bunch of simpler tasks. In C++, as in any other language that supports object-oriented programming, the favored approach usually involves carving up programs into classes.
The general principle that guides us in decomposing programs into classes is that classes should represent abstractions. In other words, each class should hide some of the program's details from the rest of the program. By hiding details, each class lets you act in many ways as if the program is simpler than it really is. That is, abstraction fosters an illusion of simplicity, which is usually good because it makes the program more manageable.
Although each class should hide something, it shouldn't try to hide too much all by itself. A class that takes on too much is still an improvement over no class at all because it simplifies the program around it. Unfortunately, the class itself remains a complex entity that will be hard to manage.
Of course, the way to simplify a large, complex class is to farm out most of its work to smaller, simpler classes. Then again, it's possible to go too far splitting larger classes into smaller classes. If you do go too far, you get complexity in another dimension. A program composed of too many classes can easily become a tangle of complex interactions.
Much of the art of class design is in deciding just how much or how little each class should do. Last month, I suggested a simple, first-order principle for decomposing programs into classes:
Hide each design decision in a separate class.
I started to show you how to apply this principle in xr, the cross-reference generator program that's my surrogate practical example. Let's pick up again just a bit before...
Where I Left Off
xr reads text from standard input and writes a cross-reference listing to standard output. The output is an alphabetized list of the words that appeared in the input. Each line in the output contains one word followed by the sequence of unique line numbers on which that word appeared in the input.
xr uses a binary tree to accumulate the words in alphabetical order. Each node in the tree holds a word and its corresponding sequence of line numbers. More precisely, each node in the tree contains a pointer to the null-terminated character array representing a word, and pointers to the head and tail of a linked list representing the sequence of line numbers for that word.
The program uses a class called cross_reference_table to encapsulate the tree. (See "C++ Theory and Practice: Partitioning with Classes," CUJ, February 1999.) table.h in Listing 1 contains the cross_reference_table class and inline member function definitions. table.cpp in Listing 2 contains the non-inline definitions for the cross_reference_table class.
The nodes in the tree are all of type tree_node, a structure declared in Listing 1 as a private member of class cross_reference_table. The complete definition for tree_node appears in Listing 2 as:
struct cross_reference_table::tree_node { deep_pointer<char> word; deep_pointer<list_node> first, last; deep_pointer<tree_node> left, right; };(deep_pointer is a class template. For any type T, a deep_pointer<T> behaves just like a T * except that a deep_pointer<T> const behaves like a T const *const. See "C++ Theory and Practice: Thinking Even Deeper," CUJ, July 1999.)
On the surface, the cross_reference_table class has a single, well-defined job, namely, to hide the structure of the table from the rest of the program. However, each pointer type in the tree_node structure actually corresponds to a distinct design decision:
1. Conceptually, member word represents a character string. But its declaration reveals that it's implemented as a pointer to the zeroth character of a null-terminated character sequence stored in a dynamically allocated array of char.
2. Together, the members first and last represent a sequence of line numbers. However, their declaration reveals that the sequence is implemented as a pair of pointers, one to the first node and one to the last node in a singly linked, null terminated list of list_nodes. (Elsewhere in Listing 2, you'll find that each list_node contains one number of a sequence.)
3. The pointers left and right each represent a subtree of the binary tree that alphabetizes the words.
The rule of thumb hide each design decision in a separate class suggests that xr needs three classes (one for each design decision) instead of the one it already has. These classes need not be completely separate from each other. It could be that one class uses the others to offload work. That, in fact, is the way I suggest doing it.
Rather than declare tree_node's member word as:
deep_pointer<char> word;xr should use a type for the abstract concept of a variable-length character string. It could use the standard library's string class or use one written specifically for this application if need be. In either event, the declaration of member word should be something like:
string word;Similarly, the two pointer members:
deep_pointer<list_node> first, last;should be wrapped up into a single object of an abstract type representing a line number sequence. Using a class called line_number_sequence, you can replace both pointers with one member:
line_number_sequence lines;Applying both of these abstractions to tree_node, you get:
struct cross_reference_table::tree_node { string word; line_number_sequence lines; deep_pointer<tree_node> left, right; };This definition reflects the program's design much better than the previous tree_node definition. Even without any comments, it says "each node in the cross-reference table's tree associates a string with a corresponding line number sequence."
A Line Number Sequence Class
Rather than introduce both a string and a line_number_sequence class at once, I think it's more instructive to approach one new class at a time. Let's start with the line_number_sequence class. You'll need to create sequence.h and sequence.cpp, which you can do mostly by stealing code from table.h and table.cpp.
Using just the (soon to be defined) line_number_sequence class, the definition for tree_node in table.cpp becomes:
struct cross_reference_table::tree_node { deep_pointer<char> word; line_number_sequence lines; deep_pointer<tree_node> left, right; };This definition will compile only if a definition for line_number_sequence appears earlier in the compilation, so you should add the directive:
#include "sequence.h"somewhere in table.h prior to tree_node's definition.
To start writing sequence.h, cut the members first and last from tree_node in table.cpp and paste them in as private members of the line_number_sequence class in sequence.h:
class line_number_sequence { public: // ??? private: deep_pointer<list_node> first, last; };The declarations for first and last will compile only if deep_pointer is already defined. Adding:
#include "deep.h"before the line_number_sequence class definition takes care of that.
The declarations for first and last also require a prior declaration for list_node. The definition for list_node already appears in table.cpp (Listing 2), but it doesn't belong there anymore. list_node is an implementation detail of line number sequences, so you should package it with the line_number_sequence class.
You could move list_node's definition to sequence.h, but that's overkill. line_number_sequence's members first and last have type "(deep) pointer to list_node", which requires only a declaration for list_node, not a complete definition. The incomplete type declaration:
struct list_node;is sufficient for the header. (See "C++ Theory and Practice: Partitioning with Namespaces, Part 2," CUJ, May 1998 for more on incomplete types.)
Since list_node is an implementation detail, it should be a private member of the line_number_sequence class. (See "C++ Theory and Practice: Partitioning with Classes," CUJ, February 1999 for a discussion of incomplete types as class members.) Listing 3 shows the sequence.h header with everything except the line_number_sequence member function declarations in place. Listing 4 shows table.cpp with the changes thus far.
The complete definition for list_node should appear in sequence.cpp as:
struct line_number_sequence::list_node { unsigned number; deep_pointer<list_node> next; };<[>The relationship between list_node and line_number_sequence is now the same as the relationship between tree_node and cross_reference_table.Constructing Line Number Sequences
Now let's figure out what member functions the line_number_sequence class needs. Some object-oriented design purists might suggest that you should rock back on your chair at this point, look up at the ceiling and think up all of the fundamental operations that would be appropriate for a general-purpose sequence class. I don't want to do that here. My objective in this exercise is just to partition what I already have without embellishing its behavior. Let's get a clear picture of what we have before we entertain generalizing it.
Rather than read through table.cpp looking for ideas on what the line_number_sequence member functions should be, you can get help from your compiler. Just compile table.cpp (as in Listing 4) and look at the diagnostics.
The first diagnostic occurs in the part of cross_reference_table::add_tree that creates a new tree_node for word w found on input line number n:
if (t == NULL) { t = new tree_node; t->word = new char [strlen(w) + 1]; strcpy(t->word, w); t->first = new list_node; ***^error*** t->first->number = n; t->first->next = NULL; t->last = t->first; t->left = t->right = NULL; }Every compiler I used complained that first is not a member of tree_node, which is quite true. The next three statements in the program contain similar errors. first and last have been replaced by a single member called lines.
The four statements with errors are doing something with a line number sequence. Once we figure what that something is, we can wrap the statements inside a public line_number_sequence member function and use that function instead.
Again, the body of this entire if statement creates a new tree_node for word w found on input line number n. The statement:
t = new tree_node;allocates storage for that tree_node. The new-expression also calls a constructor. In this case, tree_node (Listing 4) has no explicitly-declared constructors, so the compiler generates a default constructor. That generated constructor in turn invokes the default constructors for each member.
Most of tree_node's data members are deep_pointers. deep_pointer has a default constructor that does nothing (it leaves the pointer uninitialized). tree_node's only other data member is a line_number_sequence, which also has no explicitly-declared constructors, and so once again, the compiler generates one. But all of line_number_sequence's data members are deep_pointers, so line_number_sequence's generated default constructor does nothing as well. The net result of all this is that tree_node's generated default constructor does nothing.
Thus, tree_node's generated default constructor doesn't really initialize anything. After construction, all of tree_node's data members remain uninitialized. It's up to the statements in add_tree that follow:
t = new tree_node;to actually place meaningful values into the tree_node's data members. The next two statements:
t->word = new char [strlen(w) + 1]; strcpy(t->word, w);fill in the tree_node's word member to point to a copy of w. The next four statements (the ones causing the compilation errors):
t->first = new list_node; t->first->number = n; t->first->next = NULL; t->last = t->first;fill in the tree_node's line number sequence to contain the single line number n. Finally,
t->left = t->right = NULL;fills in the left and right subtree pointers.
Apparently, the four statements that caused the compilation errors are doing work that should have been done by a line_number_sequence constructor. tree_node's lines data member contains random noise until these statements put something meaningful there. Since these statements require access to private members of the line_number_sequence class, the thing to do is to move them into the body of a line_number_sequence constructor, defined as:
line_number_sequence::line_number_sequence(unsigned n) { first = new list_node; first->number = n; first->next = NULL; last = first; }This is not a default constructor it requires an argument.
The code in the body of add_tree must be able to call that constructor. However, there's no way to call a constructor as if it were just another member function. That is, you cannot write a call such as:
t->lines.line_number_sequence(n);Okay, you can, but it won't compile.
Once again, compiler diagnostics can give you a hint as to what to do. If you add a declaration for the constructor to the line_number_sequence class, as in:
class line_number_sequence { public: line_number_sequence(unsigned n); private: deep_pointer<list_node> first, last; };and compile table.cpp again, you should get a diagnostic at the new-expression:
if (t == NULL) { t = new tree_node; ***^error*** ...The compiler complains that either tree_node doesn't have a default constructor or that the compiler can't generate one.
A generated default constructor invokes the default constructors for all of its bases and members. In this case, one of tree_node's members, lines, is a line_number_sequence. Now that line_number_sequence has an explicitly-declared non-default constructor, it does not have a compiler-generated default constructor. The compiler can't generate a default constructor for tree_node because tree_node has a data member without a default constructor.
Okay, so let's write a default constructor for tree_node. Simply extend the definition for tree_node in table.cpp as follows:
struct cross_reference_table::tree_node { tree_node(); deep_pointer<char> word; line_number_sequence lines; deep_pointer<tree_node> left, right; }; cross_reference_table:: tree_node::tree_node() { }Now, if you compile this code once again, the new-expression compiles just fine, but the compiler complains about the definition of the tree_node constructor. The constructor has no member-initializers, so the compiler generates a call to the default constructor for each of tree_node's data members. Once again, tree_node's lines member has no default constructor, so this constructor is in error.
You can placate the compiler by adding a member initializer to the tree_node constructor to initialize the lines member:
cross_reference_table::tree_node::tree_node(unsigned n) : lines(n) { }You must change the declaration for this constructor (inside the class definition) accordingly.
Now, when you construct a tree_node, you must pass the initial line number for the line number sequence within the tree_node. Therefore, you must add a constructor argument list to any new-expression that creates a tree_node:
t = new tree_node (n);The code in add_tree that creates a new tree_node containing word w and line number n now looks like:
if (t == NULL) { t = new tree_node (n); t->word = new char [strlen(w) + 1]; strcpy(t->word, w); t->left = t->right = NULL; }which looks a bit simpler than the original code.
Other Line Number Sequence Operations
Now let's see what other member functions the line_number_sequence class needs. If you incorporate the line_number_sequence as just described and recompile table.cpp, you should see a flurry of compiler diagnostics associated with this code appearing near the end of add_tree:
else if (n != t->last->number) { t->last->next = new list_node; t->last = t->last->next; t->last->number = n; t->last->next = NULL; }Again, the problem is that tree_node no longer has members named first and last.
This code executes when add_tree determines that the word it was asked to add to the table is already in the table. In that case, add_tree simply adds a new line number to the existing line number sequence for that word. However, that number might already be in the sequence because the word has already appeared on the current line. The condition of the if statement verifies that the number is not already in the sequence. There's no need to search the entire sequence for a duplicate number. The line numbers arrive in non-decreasing order and they're placed at the end of the sequence, so if there is a duplicate, it can be only at the end.
Here again, the line_number_sequence class must provide one or more public member functions that will support what add_tree needs to do. I can think of two alternate designs that reflect different views of what a sequence is.
One approach is to provide two member functions in the line_number_sequence class. One member is:
bool contains(unsigned n) const;which returns true if the sequence contains n. The other member is:
void add(unsigned n);which adds n to the sequence. Using this approach the code in add_tree becomes:
else if (!t->lines.contains(n)) t->lines.add(n);This design suggests that you might someday want to allow a sequence to contain duplicates.
The other approach is to provide just an add member function that does not allow duplicates. Using this approach the code in add_tree becomes simply:
else t->lines.add(n);This design suggests that you believe a sequence is much like a set in that it cannot contain duplicates. Adding a number to a sequence that already contains the number does nothing.
I'm not sure the added flexibility of the first design is worth making the line_number_sequence interface more complicated, so I'm going to opt for the second design for now. However, I will revisit this decision in the future.
The only other code in table.cpp that still refers to first or last as if they were members of tree_node is in put_tree (table.cpp in Listing 2):
list_node const *p; for (p = t->first; p! = NULL; p = p->next) printf(" %4d", p->number);This code is simply writing each number in a sequence to standard output. You can move all of this to the body of the final line_number_sequence member function:
void put() const;and place the call:
t->lines.put();in the body of put_tree.
A complete version of sequence.h appears in Listing 5. The corresponding version of sequence.cpp appears in Listing 6. A complete version of table.cpp that uses the line_number_sequence class appears in Listing 7.
Now What?
I think the version of table.cpp in Listing 7 is a modest improvement over the one in Listing 2. By turning line number sequences into an abstraction, the code for both add_tree and put_tree is a bit shorter and a bit clearer.
I have not yet finished applying the principle that each class should hide a separate design decision. The line_number_sequence class hides the list implementation of a sequence, but the cross_reference_table class still encapsulates the structure of strings (as arrays) as well as the alphabetizing of those strings (in a binary tree). More refinements are yet to come. Stayed tuned.
Dan Saks is the president of Saks & Associates, which offers training and consulting in C++ and C. He is active in C++ standards, having served nearly seven years as secretary of the ANSI and ISO C++ standards committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield, OH 45504-4906 USA, by phone at +1-937-324-3601, or electronically at dsaks@wittenberg.edu.