August 1994/Stepping Up To

Columns

Stepping Up To

Designing Generic Container Classes, Part 2

Dan Saks

Dan Saks is the president of Saks & Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or electronically at dsaks@wittenberg.edu.
A container is a data structure, such as a list, tree, or vector, that holds other objects. Many applications use containers. In fact, many applications use more than one instance of the same kind of container. For example, a single commercial application might juggle lists of customers, lists of products, and lists of orders all at the same time.
Containers are good candidates for abstract types, and so C++ programmers typically implement them as classes. Often, the class and member function definitions for one instance are nearly identical to the definitions for a different instance of the same kind of container. For example, the code for class customer_list might be strikingly similar to the code for class order_list, differing only in that the former uses customer where the latter uses order. You don't have to implement very many list classes before you notice that they're all pretty much the same and you may be repeating work unnecessarily.
Using generic containers reduces this apparent duplication of effort. A generic container uses a single implementation for a particular kind of container that you can tailor to hold objects of many different types. For example, you can adapt a single implementation of a generic list into separate instances for customer lists, product lists, and order lists.

Recapping Our Story
My previous installment introduced the technique of using containers of void * as generic containers. (See "Stepping Up to C++: Designing Generic Containers, Part 1," CUJ, June, 1994.) In particular, I presented a class genq which implements a generic FIFO (first-in-first-out) queue. A genq is actually a pair of pointers to the first and last cell, respectively, in a linked list of cells, as shown in Figure 1. Each cell contains a void pointer to a queued element along with a pointer to the next cell in the list.
The class definition for genq appears in Listing 1. In addition to a default constructor and a destructor, genq provides the following operations:

void append(void *e) — append element e to the end of a queue.

void clear() — discard all elements in a queue.

void apply(void f(void *)) — apply function f to the address of each object in the queue.

int remove(void *&e) — for a non-empty queue, remove an element from the head of the queue, place it in e, and return non-zero; for an empty queue, simply return zero without changing e.
The corresponding member function definitions appear in Listing 2.
In contrast to a generic queue, a special-purpose queue holding objects of a particular type T known at compile-time an store a T object completely within a queue cell. That is, the element member of each cell has type T, and appending an object to the queue creates a new cell with a copy of the queued object in the cell's element member. A genq can't store the queued objects within the cells because it doesn't know the element type(s) at compile-time. Rather, a genq stores pointers to the queued objects. The pointers have type void *, so they can point to objects of any type. (void *is the one type of data pointer that fits all.)
The single genq class defines a queue type that can indeed contain objects of any type. However, using a genq is less convenient than using a special-purpose queue In particular, the objects in a genq have pointer rather than value semantics. That is, a genq user must always be aware that the genq stores the addresses of objects rather the objects themselves. Special-purpose queues (like the intq and strq classes I used last month) tend to use value semantics.
The queue testing program in Listing 3 provides concrete examples showing how pointer semantics can be more cumbersome. This program uses a genq as a queue of str. (str is a variable-length string class. See last month's column for the class definition, or use any string class that you happen to have handy.) If q[i] were a strq (a queue with str rather than void * elements), you could append str qe to q[i] simply using
q[i].append(qe);
But since q[i] is a genq, you must write
q[i].append(new str(qe));
as in Listing 3.
Using void * as the queue element type eliminates the need for casts when placing objects in the queue, but not when removing objects. If q[i] were a strq you could remove an element from q[i] using just
if (q[i].remove(qe))
   // do something with qe
But since q[i] is a genq, you must follow the more elaborate sequence of steps:
if (q[i].remove(pv))
   {
   pqe = (str *)pv;
   // do something with *pqe
   delete pqe;
   }
as in Listing 3. As always, any interface that requires casting lacks type safety.
Using void * as the queue element type has other adverse effects. For one thing, using void * makes it harder to preserve the purity of homogeneous containers. Recall that a homogeneous container is one in which all the objects have the same type. I believe that most applications need homogeneous containers; that is, although a typical application may require the type of objects in one container to differ from the type in another, all the objects in a given container have the same type. But genqs are inherently heterogeneous. Since any data pointer type converts quietly to void *, you can inadvertently place the address of an X object into a queue that's only supposed to hold Y objects.
Another problem is that genq::clear may cause memory leaks. The delete-expression
delete p->element;
is supposed to delete the dynamically-allocated copy of the queued value (presumably allocated by a new-expression within a call such as
q.append(new T(qe));
for queuing an object of type T). Unfortunately, p->element in genq::clear has type void *, so the delete-expression fails to call T's destructor. If T is a type, like str, whose destructor calls delete, failing to call the destructor causes memory leaks.
Not all the changes wrought by the loss of compile-time type information are bad. Although this loss rendered the print function (presented in Part 1) essentially useless, the apply function that replaced it is much more flexible, albeit a tad harder to use. genq::apply is a general-purpose iteration function that applies its function argument to each queued object in turn. Listing 3 shows how to use genq::apply to print each object in a queue.

Type-Safe Wrappers
You can repackage a generic container so that it's more hygienic for a particular application by simply wrapping it up as particular private data inside another class. The wrapper class has essentially the same public interface as the generic class, and the wrapper's member functions delegate most of the real work to the generic. But the wrapper also enforces the requirement that all objects in the container have a particular element type, say T. Thus, the wrapper's member functions have parameter lists expressed in terms of T rather than void *, and they do all those nasty data conversions as they move objects to and from the generic container, so the end user doesn't have to. The wrapper can also create the appearance that the objects in the container have value, rather than pointer, semantics.
Listing 4 shows the class and inline member function definitions for a new class strq that turns a genq into a homogeneous queue of str. strq's append and remove functions have str instead of void * parameters, the same as my original strq class from last month. But unlike my original strq class, the strq wrapper class provides an apply function rather than a print function.
The strq wrapper does not declare any constructors. However, the genq member object has a non-trivial default constructor, so the compiler generates a default constructor for strq that invokes the genq constructor.
genq also has a destructor (from Listing 1 and Listing 2) . I might have omitted the declaration for strq's destructor and let the compiler generate it as well. But the genq destructor invokes genq::clear, which leaks memory left and right. Rather, strq needs its own memory-tight clear function (coming up soon), and strq's destructor should use strq::clear instead, which it does.
strq::append is a simple inline function that hands most of its work to the genq object. But before it does, it dynamically allocates a copy of the str object, and passes the address of the copy to the genq. This hides the pointer semantics of the genq class from strq users, who can simply write
q[i].append(qe);
to append str qe to strq q[i].
Two of the strq member functions, clear and remove, must labor a little to restore the type information lost by genq. They appear as non-inline function definitions in Listing 5.
genq::clear can't delete queued objects properly, so strq::clear doesn't even ask it to try. Rather, strq::clear removes each queued element from the genq as a void *, and deletes it after casting it back to the proper type.
genq::remove uses a similar approach. It bundles the use of a void * and the appropriate casts so that strq users can simply write
if (q[i].remove(qe))
   // do something with qe
to remove str qe from strq q[i].
This strq wrapper has one remaining hole in its interface that could let data typing errors through. Just like genq::apply, the strq::apply function has a parameter of type void f(void *). Consequently, strq users must still write casts from void * to str *inside functions that they pass to strq::apply.
You may well ask why didn't I close that hole by defining strq::apply as something like
void strq::apply(void f(str *))
   {
   typedef void F(void *);
   gq.apply((F)f);
   }
The following passage from the current C++ draft standard held me back:
A pointer to a function may be explicitly converted to a pointer to a function of a different type. The effect of calling a function through a pointer to a function type that differs from the type used in the definition of the function is undefined.
Inside strq::apply as shown above, the cast in
gq.apply((F)f);
converts f from type void (str *) to type void (void *). Then inside genq::apply, the call
f(p->delete);
calls a function whose definition has a type different from f. That's behavior that's undefined.
When a language standard says the effect of something is undefined, bad things might happen if you do that something. I have no doubt the cast works in some, and possibly many, environments, but it's clearly not sanctioned by the draft standard.

Iteration Functions Revisited
The iteration function

strq::apply(void f(void *))
is fairly general in that it applies an arbitrary operation to every object in a queue. However, this function assumes that you can somehow express the desired operation as a function that accepts a single pointer argument and returns nothing. This assumption sometimes forces you to program in styles that you'd rather avoid.
For example, my queue testing program contains the following loop:

for (size_t i = 0; i < DIM(q); ++i) { cout << i << ':'; q[i].apply(print_str); cout << "\n"; }
The body of this loop displays the contents of one queue in the form

i: aaa bbb ccc
where i is the queue number and aaa, bbb, and ccc represent strings in the queue. Most C++ programmers would prefer to write the body of the loop as just

cout << i << ':' << q[i] << "\n";
This you can do by overloading operator<< to output a strq to an ostream.
The definition for that output operator must look something like:

ostream & operator<<(ostream &os, strq &q) { q.apply(print_str); return os; }
It can't look exactly like this because this doesn't work. operator<< should pass os along with print_str to strq::apply, which should in turn pass both arguments to genq::apply. Then genq::apply should pass os to print_str when it calls print_str (as f). But none of this can happen because strq::apply (and genq::apply) won't accept os as an additional argument.
The revised strq test program in Listing 6 shows a solution to this problem. operator<< completely bypasses apply in transmitting ostream os to print_str. Instead, operator<< stores the address of os into global variable os_ptr. When apply calls print_str, print_str uses os_ptr to locate the output stream.
Some of you might prefer making the os_ptr object a private static member of class strq to remove it from the global namespace. Then you would also have to make print_str a static member function and operator<< a friend of strq. Personally, I don't think that's the way to go. I don't think of the os_ptr object as part of a queue's state. It's merely an argument to the output operator that we're having trouble passing around to the right places.
I'd rather redesign the apply functions as shown in Listing 7. Both apply functions now have two parameters. As before, the first parameter, f, points to the function to be applied to each queued object. But now, f points to a function that accepts two arguments rather than just one. apply's second parameter, args, points to an additional argument that apply passes in turn as the second argument to f.
Listing 8 shows the strq output operator rewritten using the new apply function. operator<< passes the addresses of both print_str and os to strq::apply, which passes them on to genq::apply. genq::apply repeatedly calls print_str (via f), passing the address of a queued object along with the address of ostream (via args). print_str then converts its parameters to their expected types, and uses them to write (what it thinks is) a str to (what it thinks is) an ostream.
Are two parameters enough? I think so. If you need to pass more data to the applied function, you can always define a class that encapsulates several parameters as a single object, and pass a pointer to one of those objects.

What's Up Next?
Adding a second parameter to the iteration function adds considerable flexibility, but iterator objects may be even more flexible and possibly easier to use. I'll introduce iterator objects next time.
Also, as you think about designing generic containers, you should remember that the designer of the container and the end user are not necessarily the same person. The designer's job is to craft a single implementation for a class that end users can easily adapt to their application needs. If the job of adapting the generic is too complicated, users will reject the generic and just write their own special-purpose containers.
The type-safe wrapper around the generic container certainly makes the container much safer and easier to use. But who's responsible for writing the wrapper? Not the author of the generic. He/she can't possibly write type-safe wrappers for every possible end-user need. Rather, the end users should write wrapper classes as needed. But again, if creating a wrapper is too much work, end users won't do it.
Next time, I'll look at ways that generic class designers can help automate the creation of type-safe wrappers.