October 1994/Stepping Up To C++

Columns

Stepping Up To C++

Designing Generic Container Classes, Part 4

Dan Saks

Dan Saks is the president of Saks & Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or electronically at dsaks@wittenberg.edu.
This is the fourth part in my series on designing and implementing generic container classes. A container (a.k.a. collection) is a data structure, such as a list, tree, or vector, that holds other objects. A generic container class is a single class definition for a particular type of container that you can readily adapt to hold objects of various types. In contrast, a specific container class holds objects of only one specific type.
In the first part of this series I introduced one popular design for generic containers, namely, as containers of void *. (See "Stepping Up to C++: Designing Generic Container Classes, Part 1," CUJ, June, 1994.) As such, a generic container actually holds pointers to the elements in the collection, rather than holding the elements themselves. void * is the generic pointer type — a pointer to void can hold the address of any type of object. Thus, you can put objects of any type into one of these generic containers.
My example has been a class called genq that implements a generic first-in-first-out queue as a list of pointers to void. The private data of a genq object is a pair of pointers to the first and last cell, respectively, in a linked list of cells. Each cell contains a pointer (of type void *) to a queued element along with a pointer to the next cell in the list.
genq has undergone several revisions over its lifetime. In Part 2 of this series (August, 1994), I replaced genq's print member function with a much more general iteration function called apply. In Part 3 (September, 1994), I replaced the iteration function with an even more general iterator object type.
The latest version of the genq class definition (with a nested iterator type) appears in Listing 1. genq provides a default constructor, along with the following operations:
void append(void *e) — append element e to the end of a queue.
int remove(void *&e) — for a non-empty queue, remove an element from the head of the queue, place it in e, and return non-zero; for an empty queue, simply return zero without changing e.
genq also provides a nested iterator type with the following operations:
iterator(genq &q) — initialize an iterator for iterating through genq q.
void *next() — return the address of the next object in the queue associated with the iterator; or return a null pointer if no next object exists.
The corresponding member function definitions, for both genq and genq::iterator, appear In Listing 2.
Last month, I stated erroneously (in the introductory part of the article) that genq had a default constructor and a member function called clear. It did in its earlier incarnations. However, by Part 3, I had moved the default destructor and clear from genq into the wrapper classes such as intq and strq. I introduced wrapper classes in Part 2 as a technique for reconstructing static type information lost by using void * as the generic container element type.

Type Information Lost
A queue of void * is generic simply because C++ applies very lax type checking to void *. A C++ program can convert any data pointer to a void * without using a cast. (However, it cannot convert a function pointer to void *, nor a const or volatile qualifier, without a cast.) Of course, this flexibility has a price, namely, that conversion to void * strips away static type information. Consequently, a generic container of void * knows very little about the objects it contains, and therefore cannot support as many operations as a container for a specific object type. The clear member function is a case in point.
In Part 1, I presented two specific container classes, intq and strq. An intq is a queue of int and a strq is a queue of str (a variable-length string class type). It was the obvious similarity in these two classes that launched this quest for a single generic queue to replace them both.
intq and strq each had a public member function called clear, implemented as shown in Listing 3. (The T in the class name Tq stands for the type of elements in the queue, which in my examples has been either int or str.) The clear function is pretty straightforward; it visits each cell in the queue and applies
delete p;
where p is the pointer to the current queue cell. That delete expression does more than just deallocate memory; it applies a destructor, if any, to the cell object addressed by p.
The cell type that I used with Tq doesn't have an explicit destructor, but it has two data members:
cell *next;
T element;
Although next has type cell *, which never has a destructor, element has type T, which could easily be a type, such as str, that has a destructor. In that case, the compiler synthesizes a destructor for cell which invokes the destructor for element. Thus, when T is str,
delete p;
destroys the string stored in the cell as part of destroying the cell.
Unlike a type-specific queue, a genq can't destroy the queued elements automatically. The element type in a genq cell is void *, which does not demand destruction. Thus, the generated cell destructor is vacuous or trivial — it does nothing, and the compiler may optimize it away. If, in a genq, the cell's element actually points to an object that has a destructor (as is the case when element actually points to a str object), then
delete p;
in clear may cause memory leaks by failing to call that destructor. Again, converting a str * into a void * strips the static type information that a compiler needs to generate the appropriate destructor for the object referenced by the resulting void *.
In Part 1, I tried fixing that leak by adding the line
delete p->element;
to the clear function just before the existing delete expression. Alternatively, I could have written a destructor for the cell type as
~cell() { delete element; }
so that deleting cell pointer p would invoke this destructor, and thus automatically delete p->element. Either way, element is still a pointer to void, which carries with it no information about the destructor for the object to which it points. Adding the extra delete expression will reduce, and sometimes eliminate the leaks, but don't count too heavily on it.
In Part 2, I solved that problem by implementing type-specific wrapper classes around genq. That is, I wrote new versions of intq and strq that each wrapped a thin layer of member functions around a genq. The wrapper for a specific type uses explicit type conversions (casts) and a little extra code to restore the compile time type information lost by using void * elements. By confining the casts to just one small part of the code, wrappers make genqs both safer and easier to use.
When you design a generic version of a specific container, you should expect that you won't be able to implement some of the functions from the specific container as part of the generic. You may have to omit some functions from the generic and implement them as part of a wrapper. That's exactly what I did with the clear function. When I moved clear from the generic to the wrapper, I had to drag the destructor along because all it does is call clear.
Writing type-safe wrappers is not always easy. Ideally, the author of the generic should supply wrappers for every conceivable element type, but that's just not feasible. Thus end-users (in this case, application programmers) often must write their own wrappers.
Last month, I said I'd look at techniques for automatically generating type-specific wrapper classes. But I forgot that there's another variation on generic containers that I should cover first.

Generic Containers of ABCs
No doubt some of you have been wondering why I used a pointer to void as the generic container element type, when I could have used a pointer to an abstract base class instead. After all, C++ supports object-oriented programming techniques. Why not use them?
An abstract base class (ABC) is a class with at least one pure virtual function. You cannot create objects of the ABC type, but you can derive classes from the ABC and create objects of the derived types. Whereas a pointer to void carries very little type information, a pointer or reference to an ABC type carries a potentially hefty bundle of type information in the form of overridden virtual functions. It's possible that using virtual functions may eliminate the need for wrapper classes. Let's try and see what happens.
As you design a family of generic containers around an ABC, you must identify what the containers need to know about, or do to, the objects they contain. You use these requirements to design an ABC with declarations for pure virtual functions capable of supplying the required information and operations. Then you implement the generic containers using elements that are pointers to this ABC. The containers use virtual function calls to get the proper type-specific behavior from the objects they contain.
For example, the clear function in a generic queue needs to know how to destroy an object in the queue. If the queue cell's element type is a pointer to an object with a virtual destructor, deleting that pointer invokes that object's destructor. Similarly, the generic queue's remove function must be able to copy an object from the queue. A virtual assignment operator (operator=) can fill that need quite well.
Listing 4 shows the header common.h, which defines class common, an abstract base class for containable objects — objects you can store into a generic container. common is based loosely on a class with the same name that AT&T distributed with its early releases (versions 1.x) of the original C++ compiler. Class common also bears some resemblance to class Object in the NIH (National Institutes of Health) Class Library (Gorlen, et. al. [1990]).
My version of class common declares the following operations, all pure virtual:
common &operator=(const common &c) — assign one common object to another, and return (a reference to) the left-hand operand.
common *dup() const — return a duplicate copy of a common object.
size_t size() const — return the size of a common object.
ostream &write(ostream &s) const — write a common object to an ostream.
istream &read(istream &s) — read a common object from an istream.
Class common also provides a virtual (but not pure virtual) destructor with an empty body. As I explained a few months back, you must supply a definition for a pure virtual destructor, even if that definition has an empty body. Otherwise, you'll get a linker error. (See "Stepping Up to C++: Compilation Firewalls, Part 2," CUJ, May, 1994.)
As an added bonus, common.h defines stream input and output operators >> and << to put a prettier face on the read and write operations.
Class comq is a generic queue that uses elements of type common *. The class definition for comq appears in Listing 5, and the accompanying non-inline member function definitions appear in Listing 6. These comq listings are strikingly similar to the most recent genq listing (Listing 1 and Listing 2) . The most obvious difference is that comq uses common * in most (but not all) places where genq uses void *. comq also provides a clear function and a destructor (as discussed earlier) that genq does not. In addition, there are a few subtle, but important, differences worth noting.
Rather than add a second delete statement inside the loop in the clear function (Listing 6) , I chose to add a destructor to the cell. The delete statement in clear invokes the cell's destructor, which in turn invokes the queued element's virtual destructor by deleting the element pointer. As a matter of style, I prefer to do resource management down in the class that owns the resource. In this case, the class is comq::cell and the resource is the object addressed by the cell's element pointer.
In a few places, I translated parameters of genq member functions with parameters of type void * into comq member functions with parameters of type [const] common &. For example, the declaration for genq::append is
void genq::append(void *e)
but the declaration for comq::append is
void comq::append(const common &e)
Except for this, the functions are identical.
The reason for this change is to make comqs look as if they have value rather than pointer semantics. genqs have obvious pointer semantics. For instance, calls to genq::append typically have the form
q.append(new T(e));
But applications rarely call genq:append directly. Rather, they call a type-specific wrapper class's append function using an expression of the form
q.append(e);
In turn, the wrapper's append function calls new and passes the resulting pointer to genq::append. All the wrapper class's member functions work in concert to create the illusion that the type-specific container class traffics in objects rather than pointers.
A large part of my reason for using common * instead of void * is that I'm hoping to supply the generic queue with enough type information so that it doesn't need wrapper classes. At the same time, I don't want to lose the convenience of value semantics that the wrappers provide. Thus, the comq member functions must have (or at least look like they have) value semantics. Passing arguments by reference-to-const fills that bill quite nicely.
However, a comq, like a wrapper class, only creates the appearance of value semantics; somewhere inside its append function, a comq must create its own dynamically-allocated copy of the queued value, so it can destroy the copy later on. genq needs a wrapper class to make the correct copy. comq uses common's virtual dup function to make the copy. Again as a matter of style, I elected to call dup inside the cell's constructor to preserve the symmetry of resource allocation and deallocation with the cell class. The call to dup actually appears in the member-initializer for element in the cell constructor (Listing 5) .
comq::remove has a parameter of type common &, rather than void *& as in genq::remove. The body of comq::remove (Listing 6) is identical to genq::remove (from Listing 2) , except for the assignment statement
e = *p->element;
which passes an element from the queue back through reference parameter e.
In genq::remove, the assignment is just
e = p->element;
which passes the address of the queued object through a reference to a void *. Once again, applications rarely see these pointers to void. A type-specific wrapper intercepts the void *, casts the pointer to its original type, and then hands the queued object's value back to the application by copying it through a reference parameter.
However, since comq dispenses with the wrapper class, comq's remove should copy the queued object's value (not its address) back to the application. (Again, users should see comqs as containing values, not pointers.) Since it doesn't know the exact type of the queued object, comq::remove needs a virtual assignment operator that copies objects appropriately. The assignment
e = *p->element;
copies the value of the queued object into the object referenced by e using e's virtual operator=. (This statement needs a * (dereferencing operator) on the right-hand side because p->element has type common *, but e's operator= requires a right operand of type common.)

Adapting common for Specific Uses
Although comqs no longer need wrapper classes (or so it seems), it turns out that comqs are not all that easy to use. The problem is that you can't just put any old object, like an int or a str, into a comq. You can only put objects of a type derived from common into a comq. Thus, if you want to use a comq as a queue of str, you must derive a class from common to be a containable str — a str that can go into a queue. Listing 8 shows the definition for class comstr, a containable str.
Class comstr is not an ABC. It has a private data member of type str (for holding a queued value), and it overrides all the pure virtual functions inherited from common with functions that make a comstr act like a str. Listing 9 shows the definitions for these overriding functions, all of which are pretty straightforward except for operator=.
comstr::operator= assigns one comstr to another. Normally, you'd declare the comstr assignment operator as
comstr &operator=(const comstr &c);
to match the typical form of a generated assignment. But in this case, the assignment operator must override operator= in base class common. An overriding function must have exactly the same signature (sequence of parameter types) as the function it overrides. Therefore, the right-hand operand of the comstr assignment operator has type const common &, not const comstr &. Nonetheless, the left operand is a comstr, because it's the object addressed by this. In a non-static comstr member function, this has type comstr *const.
All that's really involved in copying one comstr to another is to copy their e data members (using the str assignment operator). However, the right operand of comstr::operator= is declared as common, which has no data members. Thus, comstr::operator must boldly cast its right operand to comstr (actually const comstr &) so it can access the data member that we all hope is there.
Therein lies a real weakness of this design. Since comqs don't use wrappers, a comq that's a queue of int (actually comint) has the same static type as a comq that's a queue of str (actually comstr). The compiler can't catch errors such as removing an object that's actually a comstr into an object that's a comint. The only way to prevent this inadvertant type conversion is with runtime type checking inside comq::remove.
In a sense, using comqs instead of genqs merely trades one tedious job for another. comqs don't use wrappers, but they do need a different derived class for each containable object. Writing a type-specific derived class is a task comparable to writing a type-specific wrapper.
Listing 9 shows my queue testing program adapted for comqs as a queue of str. The output operator points out yet another problem with this design — it doesn't eliminate all the casts from the application code. The comq iterator's next function returns the address of a queued object as a common *, and the application must cast it back to its original type. Maybe comqs do need wrapper classes after all.

What's Wrong With This Picture
These various problems with containers of ABCs are really surface indicators of a fundamental design flaw. I believe that using containers of common * as generic containers is a misuse of inheritance and polymorphism.
As I've said before, I believe that most applications need homogenous containers that employ static (translation-time) type checking. Type-specific wrappers offer just that. Containers of common * don't. In fact, by using common * as a kind of generic data pointer, containers of common * use inheritance as a way to weaken the type checking they apply to the objects they contain. That's not what inheritance is for.
In general, you should use inheritance to capture logical relationships between types, not as an implementation trick. In particular, inheritances model Is-A relationships, such as: a sequential file Is-A file, a dialog box Is-A view, and a manager Is-A[n] employee. But objects of types derived from common have no inherent relationship beyond some programmers desire to shoe-horn them all into a common container framework.
I don't mean to suggest that you should never create containers of polymorphic objects. For example, I think it's perfectly alright to represent a desktop in a windowing environment as a collection of dialog boxes, message boxes, edit windows, and other types derived from a common ABC called view. The fact is, even if you never place views into containers, all these different views have inherent commonality that an inheritance hierarchy models extremely well. I wouldn't even call the desktop a heterogenous container. I'd say it's a homogeneous container of views.
Also understand that the judgments I make about using C++ do not necessarily apply to all programming languages. C++ was designed to favor strong static checking and minimize run-time support. As such, good C++ designs build on that strength. On the other hand, highly dynamic languages and environments, such as Smalltalk, support different approaches. Smalltalk container classes are much more like containers of common *. (In Smalltalk, they are containers of Objects.) Thus, this approach isn't wrong per se; it's just not good C++.

References
Keith Gorlen, Sanford Orlow and Perry Plexico. Data Abstraction and Object-Oriented Programming in C++. (John Wiley & Sons, 1990.)

Listing 7

Columns

Stepping Up To C++

Designing Generic Container Classes, Part 4

Dan Saks

Type Information Lost

Generic Containers of ABCs

Adapting common for Specific Uses

What's Wrong With This Picture

References

Listing 7