November 1994/Stepping Up To C++

Columns

Stepping Up To C++

Designing Generic Container Classes, Part 5

Dan Saks

Dan Saks is the president of Saks & Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or electronically at dsaks@wittenberg.edu.
This is the fifth part of my exploration into designing and implementing generic container classes. A container is a data structure, such as a list, tree, or vector, that holds other objects. A generic container class is a single class definition for a particular type of container. But unlike a specific container class which can only hold objects of one particular type, each instance of a generic container class can hold a different type of object.
I began this series by presenting two different classes for first- in-first-out queues: intq and strq. (See "Stepping Up to C++: Designing Generic Container Classes, Part 1," CUJ, June, 1994.) These classes differed only in the types of elements they could hold: an intq is a queue of int, and a strq is a queue of str (a class of variable-length strings). The obvious similarities in these classes led me to try writing a single class, genq, that could do the work of both.
A major impediment to implementing a generic container class is that the different types they are meant to contain can have vastly different sizes. It's difficult, if not impossible, to cram large as well as small objects into container slots of only one size. Thus I designed a genq that holds pointers to the elements in the collection, rather than holding the elements themselves. The pointers have type void *so that a genq can hold the address of any type of data object.
Using a container of void *as a generic container has several drawbacks:
1. genqs have pointer semantics. Thus, applications using genqs must deal with complications that arise from storing the addresses of objects rather than the objects themselves. In contrast, a container for a specific type (such as intq or strq) typically has value semantics, which are simpler to use.
2. genqs lack type safety. Applications must use casts when removing objects from a genq.
3. genqs are heterogeneous. However, most applications need only homogeneous containers, that is, containers for which all the objects in a single container have the same type. Specific containers can enforce homogeneity at compile time, genqs cannot.
Part 2 (August, 1994) introduced type-specific wrapper classes as a solution to these problems. A wrapper is a version of a type-specific class (such as intq or strq) using a genq object as a private member. A wrapper class "wraps" a thin layer of member functions around a genq. These member functions delegate most of the real work of the container to the genq member functions, but they add explicit type conversions (casts) plus a little more code to restore the compile-time type information and value semantics lost by using void * elements. By confining the casts and pointer manipulations to just a small part of the code, wrappers make genqs both safer and easier to use.
Part 3 (September, 1994) introduced iterator objects as a general mechanism for visiting each (iterating over) element in a container. An iterator is not part of the container, but it knows enough about the internals of the container to be able to locate objects within the container. A given iterator class typically works with only one particular container class. Therefore, each container that has an iterator defines its own iterator class. I recommend declaring each iterator class as a public member of its corresponding container.
Writing a type-safe wrapper is not always easy. Ideally, the author of the generic should supply wrappers for every conceivable element type, but that's simply not feasible. Thus end users (application programmers) often must write their own wrappers.
In Part 4 last month, I considered using polymorphism (dynamic binding) instead of wrapper classes as a way to provide a generic container with the type information it needs to do its job. I presented a class comq that's very similar to genq, except that comq has elements of type common * instead of void *. common is an abstract base class for containable objects (objects that can be placed into containers). You adapt comq to hold a specific element type T not by wrapping comq in another class, but rather by deriving from common a new class that has a member of type T.
This polymorphic approach solves some problems, but doesn't completely eliminate the need for using casts in applications using comqs. Thus containers of common * still need wrapper classes. Furthermore, in my opinion, this design misuses inheritance.
All this brings us to the present. It appears that the best way to write generic containers is by using some combination of void * elements (to relax type checking) and wrapper classes (to restore lost type information). If only class designers could supply users with some way to generate wrappers as needed, generics would be much safer and easier to use. That's what I'll show you how to do this month.
Until a few years ago, macros were the only tool available for generating wrapper classes. Unfortunately, they weren't really up to the task. More recent C++ implementations, as well as the draft C++ Standard, provide templates as a superior alternative to macros. Although I'm clearly headed toward using templates, I think the older macro-based techniques still merit discussion because:

They provide useful background for understanding and appreciating templates.

Some compilers still don't support templates. In that case, macros are the best alternative for implementing generics until templates arrive.

The Header <generic. h>
Most C++ compilers provide a header called generic.h that defines a small set of macros for automating the production of type-specific wrapper classes. Bjarne Stroustrup briefly explained these macros in the first edition of The C++ Programming Language [1], and Tony Hansen provided additional explanations and examples in The C++ Answer Book [2]. Other than that, I haven't found any other books that discuss these macros.
A version of generic.h appears in Listing 1. The name2 and _name2 macros are for use by generic class implementers. The declare and implement macros are for generic class users. (The header in Listing 1 is a bit shorter than what you'll probably find with your compiler. An industrial strength generic.h provides additional versions of the name, declare, and implement macros that accept extra arguments, plus additional macro and function declarations for error-handling support.)
Applications use the declare and implement macros to adapt generic classes to specific types. A macro call of the form
declare(C, T)
expands to the class and inline member function definitions for generic container class C adapted to type T. The act of adapting a generic class for a specific type is called instantiation. That is, the call to declare instantiates a definition for generic class C with type parameter T.
For example, if queue is a generic queue, then
declare(queue, int)
instantiates the class definition for a queue of ints. The macro call expands to a class definition roughly of the form
class int_queue
   {
public:
   int_queue();
   ~int_queue();
   append();
   remove();
   // and so on ...
   };
with int and int_queue substituted in the appropriate places throughout the definition. As you will see shortly, the author of the class determines the exact spelling for the generated class name. The name could just as easily be intqueue or queueint instead of int_queue.
A macro call of the form
implement(C, T)
instantiates the non-inline member function definitions and static data member definitions for generic class C with type parameter T.
For example,
implement(queue, int)
instantiates the non-inline member function definitions for a queue of ints. The expansion looks something like
int_queue::int_queue()
   {
   ...
   }
int_queue::~int_queue()
   {
   ...
   }
int_queue::append()
   {
   ...
   }
int_queue::remove()
   {
   ...
   }
// and so on...
An application must call declare(C, T) in every source file that uses generic class C instantiated with type T, but it need call implement(C, T) in only one source file. For instance, Listing 2 shows the main source file (strtst7.cpp) of my test program for generic queues. I've modified the program to use the generic.h macros. Listing 2 calls
declare(queue, str)
to instantiate a queue of str, but it does not call implement. The corresponding call to implement is in Listing 3 (strq7. cpp), which must be compiled and linked with Listing 2. Notice that Listing 3 calls declare (as it must) before it calls implement.
Neither Listing 2 nor Listing 3 includes generic.h directly. Rather, they include queue7.h which, in turn, includes generic.h and defines a bunch of other stuff. Among that stuff is a macro queue(T) which expands to the generated class name. For example, the declaration
queue(str) q[4];
appearing in the body of main (Listing 2) expands to
str_queue q[4];
The declaration
queue(str)::iterator sqi(q);
appearing in the body of operator<< (also Listing 2) expands to
str_queue::iterator sqi(q);
Using a macro to generate the class name relieves users from knowing the exact spelling of the name.

How It All Works
The macro definitions for declare and implement in Listing 1 are pretty short. They don't expand immediately into full-blown code, but generate a series of intermediate macro calls. The sequence of intermediate expansions is nearly identical for both macros.
A macro call of the form
declare(C, T)
expands directly to
name2(C, declare)(T)
which, in turn, expands to
_name2(C, declare)(T)
_name2 uses the token-pasting operator ## to splice C and declare into a single identifier. Thus, the call to _name2 expands to yet another macro call of the form
Cdeclare(T)
Similarly, a call of the form
implement(C, T)
expands to
Cimplement(T)
For example, if queue is a generic queue, then
declare(queue, int)
expands to call
queuedeclare(int)
and
implement(queue, int)
expands to call
queueimplement(int)
It is queuedeclare(C, T) and queueimplement(C, T) that finally expand to the code that defines and implements class C with type parameter T. The author of the generic queue class provides the definitions for these macros, as well as the definition for the class name generating macro, queue(T). These definitions appear in Listing 4 (queue7.h). Let's start with queue(T).
Like queuedeclare and queueimplement, the queue macro pastes identifiers together using the name2 macro. A call such as
queue(str)
expands through several steps:
1. name2(str, _queue)
2. _ name2(str, _queue)
3. str ## _queue
which finally yields str_queue.
The token-pasting operator ## is an invention of the C standards committee, circa 1985. Early C and C++ preprocessors didn't have this operator, so programmers used an assortment of tricks to paste identifiers together. Different tricks worked with different preprocessors. One trick was to define name2 as:
#define name2(n1, n2) n1\
n2
Another trick used an empty comment as a vanishing separator:
#define name2(n1, n2) n1/**/n2
Neither of these tricks work on current preprocessors; now you must use ##.
Why, you ask, does the name2 macro invoke _name2 to paste tokens together? It seems that you could define name2 as just
#define name2(n1, n2) n1 ## n2
The extra level of macro expansion produces a different behavior, but it's rather subtle. In most applications, you probably won't notice the difference.
The difference arises only when the arguments to name2 are themselves macros that should be expanded before they are pasted together. For example, using the name2 macro that invokes _name2 (as in Listing 1) ,
#define str string
name2(str, _queue)
expands to string_queue (str expands to string before pasting). On the other hand, using the name2 macro that invokes ## directly expands to str_queue.
queuedeclare is a big macro spanning many lines. Every line in the macro definition, except the last, ends with a backslash. The backslashes fuse separate physical lines into a single logical line, satisfying the requirement that a macro definition must fit on one logical line.
queuedeclare contains the class definition and the inline member function definitions for a queue with elements of type T implemented as a wrapper around a genq. The macro body is essentially the same as the header strq5.h (Listing 10 in Part 3), which defines a queue of str as a wrapper, except that queuedeclare is parameterized to handle any queue element type T. Aside from all the backslashes, the only differences in the listings are that queuedeclare uses T where strq5.h uses str, and queuedeclare uses queue(str) where str5.h uses strq.
queueimplement is also a multi-line macro, though a bit shorter than its corresponding queuedeclare macro. queueimplement contains the non-inline member function definitions for a queue of T wrapped around a genq. You may wish to compare this with strq4.cpp (Listing 4 in Part 3), which is the str-specific version of this wrapper.
In summary, you implement a generic class of void * with accompanying macros that generate type-specific wrappers by piecing together the following components:
1. a header file, cov.h, that specifies the interface to a generic container of void *. (cov is a place-holder that stands for "container of void *.") In my example, cov.h is genq.h.
2. a source file, cov.cpp, that defines the remaining components (such as non-inline member function definitions) for the generic container of void * . In my example, cov.cpp is genq.cpp.
3. a header file, C.h, that defines the C(T), Cdeclare(T), and Cimplement(T) macros for wrapper class C(T). In my example, C.h is queue.h. C.h must include both generic.h and cov.h.

Eliminating the Middle-Man
Recall that I originally chose to use void * as the container element type so I could write a single implementation of the container and easily adapt it to hold objects of any type. A container of void *comes close to that goal, but still requires some specialized code, namely, a type-specific wrapper, for each instantiation. The generic.h macros reduce that specialized code to just a couple of simple macro calls per instantiation, and that's pretty good.
Now think about this: If the macros work so well, do we really need the generic container of void * at all? Why not just generate all the code for each instantiation of the generic from the macros? Listing 5 (queue8.h) shows a version of the queue(T), queuedeclare(T) and queueimplement(T) macros that do just that.
The decision to eliminate the container of void * is a classic trade-off of code space and execution time. A queue of void * (a genq) captures a sizeable chunk of code common to all type-specific instances of a queue. An application with many different queue instantiations can save code space by using a single instance of genq shared by all the wrapper classes. Indeed, each new wrapper generates a little more code, but much less than the code for an entirely new type-specific queue (that's not just a wrapper).
On the other hand, wrapper classes add extra layers of function calls to most of the member functions, which may slow them down. Granted, many wrapper functions are inline functions applying simple casts that can be compiled away, but others add considerable run-time overhead. In particular, wrapper functions that require extra calls to new and delete, as do queue(T)::append and queue(T)::remove, might be considerably slower than the corresponding functions in a type-specific (non-wrapper) queue.

Assessing generic.h
Once defined, generic classes built from the generic.h classes are type-safe and fairly easy to instantiate. Unfortunately, macros for large generic classes push the preprocessor well beyond the simple tasks it was designed to handle.
These macros can be extremely hard to debug. The problem is that each macro call expands on a single logical line. If there's a bug in the macro, or you call it with an invalid argument, the best most compilers can do is tell you there's some sort of error somewhere in the macro expansion.
Cramming lots of code on a single line gives some symbolic debuggers fits. They won't let you set more than one breakpoint per line, or set a breakpoint in the interior of a line.
Templates were added to C++ to alleviate these shortcomings. In many ways, a template is like a macro. You can explain much of the behavior of templates by analogy with the declare and implement macros. But templates are part of the language itself, not the preprocessor, so they preserve the line structure of the source code.
Of course, templates have their share of problems, but on balance, they are an improvement. I'll show you why I think so next month.

References
[1] Bjarne Stroustrup. The C++ Programming Language (Addison-Wesley, 1986)
[2] Tony L. Hansen. The C++ Answer Book (Addison-Wesley, 1990)