March 1991/Stepping Up To C++

Columns

Stepping Up To C++

Writing Your First Class

Dan Saks

Dan Saks is the owner of Saks & Associates, which offers consulting and training in C, C++ and Pascal. He is also a contributing editor of TECH Specialist. He serves as secretary of the ANSI C++ committee and is a member of the ANSI C committee. Readers can write to him at 287 W. McCreight Ave., Springfield, OH 45504 or by email at dsaks@wittenberg. edu.
Over the past two years I've spoken with many C programmers who are thinking about switching to C++. Nearly all have heard about the highly touted benefits of object-oriented programming (OOP) and C++. Although a few are skeptical about the benefits of C++, most programmers are intrigued and want to know more about it.
As I mentioned in my last column, I think some apprehension is justified. C++ is not as widely available as C. Where C++ is available, the development tools are sometimes lacking. And there's no formal standard for C++. Many C programmers are just starting to appreciate programming in a mature, standardized language. Understandably, they are reluctant to switch to C++.
Most of all, I think C programmers are confused, intimidated, or just put off by the overly zealous preaching of a few highly vocal "true believers" in OOP. These zealots are hard to avoid — every software development organization seems to have at least one. The zealots want you to believe that you shouldn't use C++ for anything other than OOP (using both inheritance and polymorphism).
I've heard C programmers express their frustration with learning C++ under the shadow of a zealot. It's rather disconcerting to have your self-improvement efforts belittled by someone who thinks you haven't improved enough. You expect to hear, "Keep up the good work!" Instead you hear, "OK, but you really could have done it better." (Some of my former university students will probably be amused to read this, wondering, "When did this guy mellow out?" Age works wonders on us all.)
The fact is, to a C programmer, C++ has many new features. Don't expect to grasp them all at once. Many of these features are intended to solve problems that arise in large-scale software systems. Small programming examples such as printing Hello, world, rarely make a convincing case for the advantages of C++ over C. To really learn C++, you need to work through large examples that take time to develop.
Complex programming problems usually have more than one solution. You should try different approaches to see which one works best for each situation. As Bjarne Stroustrup wrote in his first book on C++, "To write good programs takes intelligence, taste, and patience. You are not going to get it right the first time; experiment!" [1]
Experimenting is fine if you're just writing practice programs, but most programmers must work for a living. They don't have much spare time to experiment with different design and programming techniques. Programmers and project managers must continue to try new tools and techniques to improve quality and productivity, but too much innovation all at once is risky.
If you use object-oriented techniques in your first large C++ program, you might develop some reusable components that dramatically reduce the program's code size. More likely, you'll make lots of mistakes. If you believe, as Fred Brooks suggests [2], that you should "plan to throw one away," then maybe you can plunge into C++ and OOP all at once. If you're not prepared to throw the program away, you're courting disaster.

Learning C++ In Stages
As an alternative, I think it's not only possible but preferable, to learn and apply C++ in stages. You can't learn to use virtual functions (polymorphism) unless you understand inheritance, and inheritance only makes sense if you understand classes (encapsulation). The best way to learn encapsulation is to apply it in real programs, and you can write lots of useful C++ programs using encapsulation without inheritance. Languages that provide encapsulation without inheritance are called object-based [3], whereas languages that support inheritance are called object-oriented.
Most people learn to program by reliving the evolution of programming languages. New languages appear as programmers run up against the limitations of existing languages. Object-based languages, like Ada and Modula-2, represent a major evolutionary step between data-structured languages, such as C and Pascal, and object-oriented languages such as C++. Just as the languages took years to evolve, programmers need time (months, if not years) to progress through these evolutionary stages. Everyone progresses at a different rate.
If you've ever tried explaining the virtues of pointers and user-defined data types to a FORTRAN programmer, then you probably know what I'm talking about. Most FORTRAN programmers who crunch numbers for a living can't understand why arrays aren't adequate for structuring any collection of data you could ever want. I've had similar difficulty explaining information hiding (encapsulation) to students who have never maintained someone else's code. If you haven't experienced the problem, you can't appreciate the solution.
Many C programmers want to start using C++, but they're reluctant to take the OOP plunge. OOP is not the only reservation programmers have about C++, but it is significant. C++ is, for all practical purposes, a superset of C. C++ was designed to be integrated with existing C code and practice. The language permits you — but doesn't force you — to overhaul your design and programming styles. You can gain a lot by using C++, even if you only apply it a little bit at a time.
If you want to start small with C++, then use it as an object-based language. Find some part of your application that could be a separate, well-defined entity, and implement that entity as a class. Try writing the class so that its public interface hides the implementation details from the rest of the application. The following example does this by analyzing the weaknesses of an existing C program.

A Cross-Reference Generator
A cross-reference generator program reads a document and prints an alphabetized list of the words appearing in that document. Each word in the output listing is followed by a sequence of line numbers on which that word appears in the document. For example, if the word object appears once on lines 3, 19, and 100, and twice on line 81, then the cross-reference listing entry for object is

object 3 19 81 100
This program is suggested as exercise 6-3 in K & R [4].
Listing 1 shows a portion of xr.c, an implementation of the cross-reference generator. xr is based on the solution presented by Tondo and Gimpel [5]. I modified their solution to compile in C++ as well as C, and made a few small stylistic changes.
xr works as follows: Each call to getword reads the next word, punctuation character, or newline character from the input document. If getword returns a word (a sequence of letters and digits starting with a letter), the program adds an entry to the cross-reference table containing that word and the current line number. If getword returns a newline, the programs increments the current line number. After reading the entire input, the program prints the cross-reference listing.

Hiding The Details
Listing 1 illustrates a problem that plagues most large programs. The program contains declarations at the main level that reveal design and implementation decisions made at lower levels. These declarations reduce the program's readability and maintainability by cluttering the main level with inappropriate detail. This clutter isn't much trouble in small programs but can be overwhelming in programs with tens or hundreds of declarations.
For example, it's evident from Listing 1 that xr implements the cross-reference table as a binary tree. Why? For starters, the function that adds an entry to the cross-reference is called addtree, and the output function is called printtree. Both functions accept root as an argument, and root is of type treenode *.
If you refer to my description of how xr works, you'll see that it never mentions trees. At this level in the program design, you don't need to know how the table is implemented, but you should know what the table does
A good implementation of the cross-reference program keeps these concerns separate. Each design decision, like the structure of the cross-reference table, should be hidden in a separate part of the program. This practice of isolating design decisions is known as information hiding or data hiding. In the OOP world, it's called encapsulation.
C provides only limited support for encapsulation. You hide information by placing code in separately compiled modules. Understanding the technique and its limitations will help you appreciate C++ classes, so I'll demonstrate the technique by applying it in the implementation of xr.

Encapsulation With C
Listing 2 through 4 present a better implementation of xr using some encapsulation by separate compilation. xr.c (Listing 2) is the main source file. The implementation of the cross-reference table is hidden in xrt.c (Listing 3) . The header xrt.h (Listing 4) defines the interface to xrt.c. That is, the header declares the functions through which the main program accesses the hidden table.
Notice that all evidence that the cross-reference table is a binary tree is gone from the xr.c. The variable root, the functions addtree and printtree, and the struct definitions have all been moved from xr.c to xrt.c. root, addtree, and printtree are declared static in xrt.c, so they can't be referenced directly by code in xr.c.
The functions xrt_add and xrt_print, declared in xrt.h and defined in xrt.c, provide the only access from xr.c to cross-reference table data structure. Instead of calling addtree directly, main must call xrt_add. xrt_add passes root to a call to addtree but keeps both root and addtree hidden inside xrt.c. Similarly, main must call xrt_print to invoke printtree.
The key to this encapsulation technique is the selective use of the static storage specifier. You place the data to be hidden, along with the functions that manipulate that data, in a single, separate source file. You declare all the data at file scope, and almost all of the functions static. The only functions that should be extern are those few that grant access to the data structure from the outside world. The access function names and prototypes should clearly indicate what they do, but give little or no clue as to how they work.
This implementation of xr is more readable than the first one. The input processing is clearly distinct from the table processing. main more clearly describes what it does without unnecessary and intrusive details about the inner workings of the table. This program is also more maintainable. The structure of the cross-reference table is so well hidden that you can change it to a b-tree or hash table without even recompiling the main source file.

Where C Breaks Down
Ideally, each encapsulation unit hides a single design decision. However, the table implementation in xrt.c (Listing 3) actually embodies two decisions:

the table is a binary tree

each sequence of line numbers is a singly-linked list referenced by a single pointer in each tree node
Just as the implementation of the table should be hidden from main, the implementation of each line number sequence should be hidden from the table module.
Unfortunately, the separate compilation technique that seems to work so well at hiding the table is completely inadequate for hiding the implementation of the line number sequences. The problem is that there's exactly one sequence for each tree node. The representation of a sequence must be declared as part of struct treenode. You can't store the sequences in static data in another module.
So what do you do? Implement line number sequences as a C++ class.

Encapsulating With C++
The header file ln_seq. h (Listing 5) contains the declaration for a class of line number sequences called ln_seq. ln_seq provides a constructor and two public access functions: add and print. The class has one private data member, first, which points to the first element in the linked list implementation of the sequence.
Note that the type of each linked list node, listnode, is a nested type. That is, listnode is declared inside class ln_seq. Versions of C++ compatible with AT&T C++ 2.0 (or earlier) treat nested classes as if they were declared in the scope of the enclosing class, but AT&T C++ 2.1 (and the current draft standard for C++) treat nested classes as local to the enclosing class. My intention in Listing 5 is that listnode should be private within ln_seq. Although many compilers don't yet enforce this access restriction, they will eventually.
ln_seq. cpp (Listing 6) implements the member functions of the class. The constructor ln_seq::ln_seq is trivial — it just initializes the list to null. The body of ln_seq::print is simply the for loop that appeared in xrt_print. ln_seq::add is the addnumber function from Listing 3, with one noteworthy change.
ln_seq::add contains additional code to handle the case where the list is empty, i.e., first is null. addnumber never confronts this case because addtree (Listing 3) creates each list with an initial node. I could have written the constructor to initialize the sequence with an initial line number, but handling the null in the add function is easier.
Listing 7 shows xrt.cpp, the cross-reference table handler rewritten using the line number sequence class. It now shows no hint of how the sequences are implemented. You can safely change the implementation of ln_seq without changing xrt. cpp (although you will probably need to recompile).
For example, if you use only a single pointer to the head of each list, then every time you add an element, you have to search through the entire list to find the end. If you track the end of each list in a second pointer, you eliminate the searching. Listing 8 and Listing 9 show this alternative implementation.
An added benefit of using classes is that it forces you to think carefully about the interfaces between components of your program. For example, notice that the first parameter of addnumber (Listing 3) is a treenode, not a listnode. The function adds a line number to the sequence in a tree, rather than to a list by itself. The sloppiness of this design becomes apparent when you try to transform it into a member function of the line number sequence class.

Looking Ahead
xr is now an object-based program with lots of line number sequence objects. It uses only the most basic features of C++, but I think the C++ version of xrt (Listing 7) is better organized and more readable than the C version (Listing 3) . More improvements could be made, but they will keep until some future column.
You don't need a crash course in object-oriented design to write your first practical C++ class in a real-live application. Just identify a design decision and create a class to hide it. With practice, you'll get good at it.

References
[1] Stroustrup, Bjarne, The C++ Programming Language. Addison-Wesley, Reading, MA, 1986.
[2] Brooks, Fred, The Mythical Man-Month. Addison-Wesley, Reading, MA, 1975.
[3] Wegner, Peter, "Concepts and Paradigms of Object-Oriented Programming," OOPS Messenger, Vol. 1, No. 1, Aug 1990.
[4] Kernighan, Brian and Ritchie, Dennis, The C Programming Language, 2nd ed. Prentice-Hall, Englewood Cliffs, NJ, 1988.
[5] Tondo, Clovis and Gimpel, Scott, The C Answer Book, 2nd ed. Prentice-Hall, Englewood Cliffs, NJ, 1989.