September 1994/Stepping Up To C++

Columns

Stepping Up To C++

Designing Generic Container Classes, Part 3: Iterators

Dan Saks

Dan Saks is the president of Saks & Associates, which offers consulting and training in C++ and C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or electronically at dsaks@wittenberg.edu.
In my past two articles, I've been exploring the design and implementation of generic container classes. (See "Stepping Up to C++: Designing Generic Container Classes, Part 1," CUJ, June, 1994 and Part 2, August, 1994.) A container (or collection) is a data structure, such as a list, tree, or vector, that holds other objects. C++ encourages you to implement each type of container as a class. A generic container class is a single class definition for a particular type of container that you can readily adapt to hold objects of various types. This is in contrast with a specific container class which holds only objects of a specific type.
My focus thus far has been on one particular design for generic containers, namely, as containers of void *. With this design, a generic container actually holds pointers to the elements, rather than holding the elements themselves. In C++, as in C, void * is the generic pointer type, meaning a void * can hold the address of any type of data object. Thus, a container of void * can indeed manage objects of any type.
As my example, I've been using a class called genq that implements a generic first-in-first-out queue as a list of void pointers. A genq object is actually a pair of pointers to the first and last cell, respectively, in a linked list of cells. Each cell contains a pointer (of type void *) to a queued element along with a pointer to the next cell in the list.
genq has undergone several revisions over the past months. The latest version of genq's the class definition appears in Listing 1. As before, genq provides a default constructor and a destructor, along with the following operations:

void append(void *e) — append element e to the end of a queue.

void clear() — discard all elements in a queue.

void apply(void f(void *e, void *a), void *args) — invoke f(&obj, args) for each object obj in the queue.

int remove(void *&e) — for a non-empty queue, remove an element from the head of the queue, place it in e, and return non-zero; for an empty queue, simply return zero without changing e.
The corresponding member function definitions appear in Listing 2.
This design has both good and bad points. On the plus side, a generic container of void * provides truly reusable code. Every genq object in a program, regardless of the type of the elements in the queue, uses the same object code for the non-inline member functions. In contrast, each class for a queue of a specific type (such as intq and strq from Part 1 of this series) employs different object code for its non-inline member functions.
On the minus side, genq's generality relies on the very weak type checking applied to void *.genqs are heterogeneous. That is, a single genq can hold objects of differing types. By itself, a genq cannot enforce the homogeneity that most applications require. In other words, a genq cannot guarantee that all objects it contains have the same type.
Last month I showed how to adapt a generic container for a specified element type by writing a type-safe wrapper class around the generic container. You simply embed a generic container object as a private data member of the wrapper class. Then you write a public interface for the wrapper class that tailors the public interface of the generic class to the specified element type. The wrapper class guarantees at compile-time that all objects in the wrapped container have the same type.
Listing 3 shows the class and inline member-function definitions for strq, a queue of str (varying-length strings presented in Part 1) implemented as a wrapper around genq. Listing 4 shows the corresponding non-inline member function definitions. The wrapper adds a little extra code to some of the queue operations. However, applications must do this work anyway each time they use a genq as a queue of some specific element type, so the true cost of the wrapper is actually quite small.
Last month, I also reexamined iteration (apply) functions. This culminated in new versions of genq::apply and strq::apply that now appear in Listing 2 and Listing 4, respectively. Though effective, iteration functions like these do not meet all needs. This month, I'll introduce iterator objects, an alternative mechanism for iterating through the elements in a container.

Iteration Functions, Again
Good container classes implement abstractions. That is, a good container provides a mechanism for storing and retrieving objects without exposing how that mechanism actually works.
For example, I could have implemented genq using an array instead of a list. I don't want genq users programming in a way that depends on my implementation decisions, otherwise I may find that I can't improve the implementation without breaking some user's code. Therefore, genq's public interface shouldn't reveal the underlying data structure.
Quite often, an application using a container needs to iterate through that container. That is, it needs to apply some operation to each object in the container. For example, the application may want to write each object to a stream, or locate the object with the largest value.
In a C program, you'd probably iterate through a genq using a loop such as:

genq_cell *p; ... for (p = q.first; p != 0; p = p->next) /* do something with *p */
Even though this violates the genq abstraction, you might not notice because C doesn't enforce access control. However, you can't get away with this in C++ because both the genq's cell type and first data member are inaccessible outside genq. Container classes, written in C++, should provide more abstract iteration mechanisms.
Iteration functions like genq::apply are one such mechanism. An iteration function for a container applies a given function to each object in that container. For example, genq's iteration function (declared in Listing 1 and defined in Listing 2) is:
void apply(void f(void *e, void *a),
void *args);
which calls f(&obj, args) for each object obj in the queue.
The strq wrapper class propagates genq's apply function to the strq public interface. Listing 5 shows a program for testing genq (Listing 1 and Listing 2) and the strq wrapper (Listing 3 and Listing 4) . That program uses strq::apply inside
ostream &operator<<(ostream &, strq &);
to print a strq by applying
void print_str(void *e, void *args);
to each element in the queue.
Unlike strq's other public members, strq::apply doesn't replace any of the void * parameters in genq::apply's signature with parameters of type str (or str * or str &). Rather, strq::apply has the exact same signature as genq::apply, and it's peppered with void *. Here's why.
strq::apply doesn't know how to iterate through a genq; it invokes genq::apply to do the iterating. Consequently, strq::apply doesn't know what to do with its arguments (f and args) other then pass them along to genq::apply. genq::apply deals only with void *, so the f passed to genq::apply had better be prepared to accept void * arguments. Thus, the application must declare all applied functions (such as print_str) with the same type. Each applied function must then cast its parameters to the "right" type.
Why, you ask, can't I declare strq::apply to accept an applied function such as
void print_str(str *s, ostream *os)
and cast this to
void (void *, void *)
inside strq::apply before passing it to genq::apply? Because calling a function of one type through a pointer to a function of another type yields undefined behavior, meaning all bets are off. Thus, iteration functions like strq::apply require that users employ casts in their application code, and that's unfortunate.
Aside from depending on void * and casts, the problem with iteration functions is that they force you to bundle the applied operation into a separate function. Sometimes, this causes very artificial partitions in the program's logic.

Iterator Objects
Iterator objects are an alternative to iteration functions. An iterator is an object that can iterate through a container. The iterator is not part of the container, but it knows enough about the container's innards to be able to locate each object in the container.
You typically initialize an iterator by attaching it to a particular container object. Then you apply member functions to the iterator to access each element in the container, and determine when you've seen them all. For instance, Listing 6 shows the output operator for strq (from Listing 5) rewritten using an iterator.
The declaration
strq::iterator sqi(q);
declares sqi as an object of type strq::iterator attached to strq q. strq::iterator is the fully-qualified name for strq's iterator type declared as a publicly-accessible nested class within class strq (details forthcoming). The subsequent loop:
while ((ps = sqi.next()) != 0)
   os << ' ' << *ps;
calls sqi.next() to obtain, in turn, the address of each str object in the queue. sqi.next() returns a null pointer to indicate that sqi has reached the end of the queue. The body of this loop is essentially the same as the body of print_str from Listing 5, minus those horrible casts.
Why is the iterator an object distinct from the container? Why not just declare the iterator's member functions as members of strq itself so that you can write something like the following?
q.reiterate();
while ((ps = q.next()) != 0)
   os << ' ' << *ps;
Here, calling q.reiterate() resets q's internal iterator, and calling q.next() returns the address of the next str object in the queue, or null.
The problem with this approach is that it doesn't allow multiple iterators concurrently. For example, it's not that uncommon to see nested loops such as
int a[N];
...
for (i = 0; i < N; ++i)
   for (j = 0; j < N; ++j)
      // do something with a[i] and a[j]
which iterate simultaneously through a single array. If you try something similar using a strq with an internal iterator, you get code something like
q.reiterate();
while ((psi = q.next()) != 0)
   {
   q.reiterate();
   while ((psj = q.next()) != 0)
      // do something with *psi and *psj
   }
This doesn't work because the calls to q.iterate() and q.next() inside the loop interfere with the calls to q.next() in the outer loop. This interference doesn't occur when iterators are distinct objects, where the code for nested loops looks like
strq::iterator sqi(q);
while ((psi = sqi.next()) != 0)
   {
   strq::iterator sqj(q);
   while ((psj = sqj.next()) != 0)
      // do something with *psi and *psj
   }
Implementing Iterators
Let's peek inside and see how this iterator works. Listing 7 shows the header for class genq with iterator as a nested type. Much of the class definition is the same as in Listing 1, but I rearranged the order of the member declarations for reasons I'll explain a bit later.
The iterator class itself is quite simple. It has a single data member, pc, that's a pointer to a strq cell. The iterator constructor attaches an iterator to a strq by initializing pc to point to the first cell in that strq. The iterator's next function (Listing 8) returns the next element's address, or null if no such address exists.
Class genq includes this curious sequence of declarations:
class iterator;
friend class iterator;
class iterator
   {
   // ...
   };
The last of these three declarations defines iterator as a nested class inside genq. iterator's definition includes the declaration
cell *pc;
referring to type genq::cell, a private member of genq. However, a nested class like iterator, although in the scope of its enclosing class, cannot normally access the non-public members of that enclosing class. Thus, genq must declare iterator as a friend so that iterator can refer to cell.
In my experience most programmers find it surprising that a nested class cannot access the non-public members of its enclosing class. Even though a nested class is a member, it's apparently not a member in full standing. My current view is that this is an error in the language specification. It's neither the first, nor the last.
Okay, that explains the friend declaration before the class definition. Why does genq declare
class iterator;
as an incomplete type before the friend declaration? Normally, a class or function name first mentioned in a friend declaration is declared in the innermost enclosing non-class scope. If you removed the incomplete type declaration, the first mention of iterator would appear in the friend declaration, which would inject the name iterator into the global scope. In other words, the friend declaration would grant friendship to some global class named iterator, not to the nested class genq::iterator.
Thus, the purpose of the incomplete type declaration is to enter the class name iterator into the scope of class genq, so that the subsequent friend declaration is not the first mention of the name iterator. Hence, the compiler associates the friend declaration with the nested class iterator.
I haven't seen anyone else declare their iterators as nested types. Every other book or article I've seen that uses iterators declares iterator classes at file scope. To avoid name conflicts, they prefix each iterator class name with the name of its associated container, as in genq_iterator or genqIter. Despite the slightly awkward declarations, I still prefer using nested classes to show the association of an iterator with its container.
You may have noticed that I almost always define my classes with the public members before the private members. C++ imposes no such requirement; its just my personal preference. However, I switched the order of the member declarations in Listing 8 to be sure I got the name bindings right. In particular, genq's public member iterator has a private member pc whose declaration refers to genq's private member cell. I placed the declaration for cell before the declaration for iterator to insure that pc's declaration (inside class iterator) referred to the nested cell type, rather than to some global cell type.
Then again, using forward-declared nested classes is even better. Listing 9 shows the class definitions for genq and its nested classes, rearranged using forward-declared nested classes. Much neater. Unfortunately, too few compilers handle these forward-declarations, so I have to restrain myself.
Listing 10 shows the class and inline member function definitions for a strq class (with an iterator) implemented as a type-safe wrapper around the genq class from Listing 7. Just as the strq wrapper has a private data member of type genq, strq::iterator is a wrapper with a private data member of type genq::iterator
The non-inline member functions corresponding to the strq wrapper in Listing 10 are identical to those in Listing 4. You can test the strq iterator using the test program in Listing 5, after replacing the definition for operator<< in Listing 5 with the definition in Listing 6.

Design Issues
Although the strq::iterator doesn't reveal the underlying structure of the container, it weakens the container abstraction by returning a non-const pointer to an object in the container. This non-const pointer lets users modify objects that are still in the container. For data structures like strq where the retrieval mechanism doesn't use the values in the container as search keys, this might not be a problem. But for ordered data structures, like b-trees or hash tables, writing through the pointer returned from an iterator could corrupt the container. At the very least, the iterator should discourage such behavior by returning a const pointer.
When you design containers and iterators, you should decide the answer to a very basic question: Does the container own the objects in it, or is the container just keeping track of objects owned by somebody else? With containers that accept and/or return pointers to contained objects, you can never be quite sure. Container that traffic only in values demostrate their ownership of objects explicitly.
My advice is to design containers that "own" the objects they contain. Another way to say this is: Design containers that have value (rather than pointer or reference) semantics. Specifically, when you put an object into a container:

The container should make a copy of the object.

The container should not save nor return the address of an object.
If copying is expensive, then repackage the objects you put into the containers using optimizing techniques, such as handle classes and reference counting, that reduce the copying cost. (See any of the references listed below for a discussion of these techniques.)
Theres's much more to iterator design than I can cover in this article, and I still have other issues to cover regarding containers themselves. I will stick with this imperfect iterator design for the time being and take another look at iterators some time in the future. For those of you who are impatient, refer to Stroustrup [1991] and Murray [1993] for further discussion of iterators.

References
Adams [1994]. Robert M. Adams. "Temporary Object Management through Dual Classes," The C Users Journal, May, 1994.
Meyers [1992]. Scott Meyers. Effective C++. Addison-Wesley.
Murray [1993]. Robert B. Murray. C++ Strategies and Tactics. Addison-Wesley.
Stroustrup [1991]. Bjarne Stroustrup. The C++ Programming Language, 2nd. ed. Addison-Wesley.