PROGRAMMER'S BOOKSHELF

C++: The Next Generation

Andrew Schulman

A year ago, in the May 1990 issue of DDJ, I wrote a round-up of books on the C++ programming language. The bottom line was that, if you planned on reading only one C++ book, then Stanley B. Lippman's C++ Primer based on C++ 2.0 was the book to read. That's still true.

Some excellent books on C++ have appeared in the last year, reflecting the growing maturity of the language. Some assume that the reader is already familiar with C++. We have entered the second generation of C++ usage.

The best of the new C++ books is Jonathan S. Shapiro's A C++ Toolkit. This book can be read in one evening, and is an enjoyable, brief introduction to software reuse. Shapiro's goal is to prod you into thinking about using C++ for "reusable programming."

In a discussion of "The Failure of Libraries," Shapiro cites the example of two different Unix library routines for handling regular expressions. "No interesting program has ever used either of these library packages!" (p.3). Something better than libraries is needed if we are to have a "software-components subindustry"; object-oriented programming languages such as C++ are an attempt to solve this problem.

We've all heard this one before. But simple things, such as using the example of linked lists rather than the tired complex-numbers example, make this a convincing argument for using C++.

Computer science has the interesting property that the vast majority of problems are solved with a very small number of fundamental data structures. These data structures are used so often that they have achieved the status of koans. The most common by far is the linked list. If you have been programming for more than a year or two, you can probably write them in your sleep. Initially, I thought linked lists were too basic a topic for this book. A recent project changed my mind.

I found myself working on a project that needed linked lists in several places. Having written several hundred linked list data structures in my career, I threw one together without bothering to build a class. No sooner had I completed the first list than I need to build a second, and cranked out the code for that one, too. Does this sound familiar? Alarm bells went off in my head and I decided that this chapter was worth including.

In addition to the pleasure of reading some decent prose for a change, Shapiro's book provides a fresh view of the major classical data structures. There are chapters on bit sets, lists, arrays, dynamic arrays, binary trees, hash tables, and atoms. Shapiro uses C++ to say something new and interesting about these structures.

But all is not wine and roses. The section "Coping with Compiler Brain Death" (pp. 76-7) explains what happens when your compiler can't inline a function that has been declared inline. "An Implementation Note About Virtual Functions" (pp. 86-7) says that if you have a class with virtual functions, but without any noninline member functions, then your compiler is likely to emit tons o'vtbls.

Chapter 15, on memory management, makes this point: "C++ programs, much more than C programs, take advantage of the heap. As a result, C++ objects are more frequently allocated in the heap than their C counterparts. Careful memory management is a crucial aspect of C++ performance. As compilers get better, it will very likely become the dominant issue in tuning C++ applications" (p. 161).

At the beginning of the book, Shapiro makes a point that seems to sum up the big problem with C++, a problem that has no solution, and that stems from C++'s greatest asset, which is its strong tie to C. "There are places where the need to support C features prevents C++ from supporting object-oriented features as well as one might like, and a surprising number of programs will run up against these problems in one way or another" (p. ix).

That remark leads straight into this month's next C++ book, the long-awaited Data Abstraction and Object-Oriented Programming in C++ by Keith Gorlen, Sanford Orlow, and Perry Plexico. Gorlen et al. discuss how to stretch C++ as far as it will go in the direction of object-oriented languages such as Smalltalk, and away from the language's machine-oriented C heritage.

This book is based on the the NIH Class Library, a Smalltalk-like class library for C++, which the authors developed as part of a project involving biomedical research on Unix-based workstations at the National Institutes of Health (NIH).

The NIH Class Library addresses a very real problem: C++ compilers do not come with extensive class libraries. If you need a LinkedList class, you write it (or borrow the one from Shapiro's book!). If you want the += operator to signify concatenation when applied to a string, then you have to write a String class with an operator+=( ) member function. C++ gives you the mechanisms, but after that you're on your own. Fundamentally, C++ is still C.

C++ programmers can be jealous of programmers using object-oriented languages such as Smalltalk, which come with extensive class libraries. When you buy Digitalk Smalltalk/V, you get a massive class hierarchy. When you buy a C++ compiler, you get iostream.h. I am convinced that this Spartan approach, remaining true to the language's C origins, is precisely why C++ has succeeded. It is a compromise between C on the one hand and object-oriented programming on the other.

But that doesn't change the fact that you need a class library. The NIH Class Library brings some of the flavor of Smalltalk to C++; its class hierarchy has Object at the top, a Collection class underneath that, a Bag class underneath that, and so on. If you have Turbo C++ or the newer Borland C++, note that the sample CLASSLIB is a scaled-down implementation of this same idea.

From the guided tour of the NIH Class Library given by Gorlen et al., I got the sense that C++ provides just enough object-oriented features to be tempting, but not enough to really work. How could it? C++ is still C.

For example, the constructor for a BigInt class must nonintuitively take a string of digits because "this is the only way we can legally write very large integer constants in C++" (p. 34). You can't write BigInt n = 18446744073709551615 because that number has 20 digits and is not a legal integer constant in C--I mean, C++. Nor can you overload operator^( ) to mean exponentiation and check if (n== 2^64 - 1) because in C--I mean in C++, the ^ operator is unary not binary.

This sort of restriction means that the promises of C++ often can't be fulfilled. One promise is that with operator overloading, we can give "an easily readable, 'mathematical' appearance" to mathematics programs (p.96). I believe that the NIH Class Library comes as close as possible to this goal, but it can't succeed, because C++ does not provide a free-form collection of overloadable operators.

C++ seems to hold out the promise of working at a higher level, only to pull you up short at the last minute with a stern reminder that this is still C.

Restrictions of this sort are necessary if C++ is to remain a serious tool for developing commercial software. The authors of the NIH Class Library show what can be done within these restrictions. In addition to reading their book, you can get the NIH Class Library source code, either from the publisher (an additional $16.95) or by downloading it from BIX (listings area c.plus.plus; files nih30.zip, nih30.inf, and cppoops.zip). This is probably the largest collection of public C++ source code available, and is well worth examining.

One final note on this book. For years, I have been expecting to see the phrase "switch statement considered harmful" in print. One of the chief benefits of C++ is that its virtual functions (dynamic binding) can eliminate the need for switch statements. Anyone who has seen one of the 14-page "switch statements from hell" that regularly appear in Microsoft Windows source code cannot doubt that the switch statement should nearly always be replaced by some sort of table (of function pointers, for instance). Anyhow, I was glad to read the brief note, "The switch statement is considered harmful" (p. 104).

Our final book is Margaret A. Ellis and Bjarne Stroustrup, The Annotated C++ Reference Manual. These 447 pages are an expansion and update to the 70-page Reference Manual that appeared in the back of Stroustrup's 1986 book The C++ Programming Language.

The new Ellis and Stroustrup book is nearly as unreadable as the original Stroustrup book, and if you are doing anything with C++, it's just as essential. Besides its approval as base document for the ANSI standardization of C++ (the cover is stamped "ANSI Base Document"), Ellis and Stroustrup's book contains many annotations and commentaries that clarify points in the original reference manual, plus lengthy discussions of the many new features added since 1986.

Opening the book to a random chapter, we find 22 pages of in-depth Talmudic commentary on the following topics: Single Inheritance, Multiple Inheritance, Multiple Inheritance and Casting, Multiple Inheritance and Implicit Conversion, Virtual Base Classes, Virtual Base Classes and Casting, Single Inheritance and Virtual Functions, Multiple Inheritance and Virtual Functions, Virtual Function Tables, Instantiation of Virtual Functions, Virtual Base Classes with Virtual Functions, and Renaming.

I came away from Ellis and Stroustrup's book with very grave worries about the complexity of C++. It starts on p. 22 with the remark that a certain variable "may not be eliminated even if it appears to be unused."

The reason is that the constructor or destructor for the variable's class may have side-effects. You may say that no one should write a class where the mere creation of an "unused" variable changes the program's behavior, but there are several important C++ applications for just this sort of nonintuitive behavior. On the same page, Ellis and Stroustrup provide a beautiful example of a Tracer class. The importance of such "unused" variables also comes out in static initializers for modules.

The point is simply that some of the nicest applications of C++ also reveal its innate complexity: Here we have a language in which you simply cannot look at a line of code and know what it's doing. An assembly-language programmer might say the same thing about C, but to me there is a difference when we are talking about a language in which deleting an unused variable might break the program!

C++ is soon going to become even more complex. All three books discuss the two major forthcoming features of C++: templates (parametized types) and try/catch/throw (exception handling). These much-needed features will undoubtedly interact in many interesting ways with all of the language's existing features.