STL Iterators

Start with iterators to use STL effectively

Dan Zigmond

Dan is a senior software engineer with Wink Communications. He can be reached at dan.Zigmond@wink.com.


The Standard Template Library (STL) developed by Alexander Stepanov and Meng Lee has been widely discussed since it became part of the proposed ANSI/ISO C++ standard. With compiler vendors lagging behind standards committees, however, many programmers are only now beginning to use STL in day-to-day programming tasks.

In this article, I'll explore STL iterators. In future articles, I'll examine STL generic algorithms and go inside the standard STL implementation from HP, explaining some of the details of how generic algorithms and data structures can be specialized using iterator tags and adapters.

Iterators

Before discussing iterators, I'll take a step back and look at C pointers, which form the basis of the iterator abstraction.

When passing an array to a function in C, you usually pass a pointer. Using this pointer (and some information about the size), you can do anything you want with the array: Traverse it in order, pick out an arbitrary element in the middle, copy it, and even destroy it. Example 1(a), for instance, is a function that finds the first occurrence of a given character in an array of characters using only pointers to the beginning and end of the array (assuming you don't have strchr handy).

Pointers are used as the interface to the data structure in find_char, which in this case is an array of characters. C++ lets you generalize this function to work with almost any sort of array by turning it into a template function. All you have to do is substitute a template parameter for all occurrences of the hard-coded type char; see Example 1(b). This generalized version of find_char works on virtually any array because it depends only on your ability to traverse an array using a pointer and compare the contents of the array with the inequality operator. In other words, if you have an array of objects of type T, find_in_array depends on your ability to traverse that array using pointers of type T*. C guarantees this for C arrays.

The difficulty comes when you decide that arrays are not the perfect internal representation for whatever you're trying to do, and you decide to move to, say, a linked-list structure. A typical implementation might look like Example 1(c), which bears a resemblance to the original find, but the two are obviously incompatible. For simplicity, list_nodes are implemented as a simple struct, rather than a more complicated (and well encapsulated) class; other sorts of implementations would require some minor changes to the new find_in_list, but the structure would remain the same.

There's something unsatisfying about find_in_array and find_in_list. At some level, they are the same function; if you were to describe the algorithms in plain English, you would probably use the same words for both. The only difference is in the interface to the underlying data structure. Simple pointers are the interface to C arrays, while pointers to list_node structures are the interface to the linked lists. Pointers are incremented with the ++ operator, while the list nodes are incremented using the idiom node = node-->next. Pointers are dereferenced using the * operator, while the list nodes are dereferenced by extracting the item member. All this seems natural (almost all C and C++ linked lists work this way), yet completely arbitrary. Why can't we access lists and arrays the same way?

Example 2 is a slightly more complicated way of defining list nodes. Using this implementation, list_nodes act very much like C pointers, so you can now rewrite find_in_list. It's interesting that this find_in_list is virtually identical to the generalized find_in_array. The only difference is in the type of the pointer-like objects used to do the iteration through the data structure. In find_in_array, I use real pointers, while in find_in_list, I use the pointer-like list_node class.

These type-only differences are exactly what templates are designed for. Just as I generalized find_char to find_in_array by substituting the template parameter T for the hard-coded type char, you can generalize both find_in_array and find_in_list to a truly generic find function by substituting a template parameter for the list_node class and T* pointers; see Example 3. This raises the question: What is this new parameter (called I in the example)? In STL, classes that meet the requirements of this parameter are called "iterators"--objects that function more or less like pointers. Like conventional pointers, iterators are used as the interface to a data structure so that you can define functions like find that will work with a variety of different data representations. Because the pointer semantics used by find (forward iteration using ++, dereference using *, and comparison with !=) can be implemented efficiently for both arrays and linked lists, the generalized find function is as efficient as the more specialized find_in_array and find_in_list; once you defined the pointer-like list_node template class, making a single find algorithm work for both arrays and lists costs nothing. Functions like find that use iterators to encapsulate efficient algorithms independent of the data representation are called "generic algorithms."

Using Iterators

Iterators make a lot more sense when taken in context. If you haven't examined sample STL programs, I've included two that use iterators (and other parts of the standard libraries) to solve a basic problem: counting the number of words from standard input. The first program, wc (see Listing One, simply stores all the words in a temporary vector, then counts them. (The program isn't very smart. A better implementation would count them on the fly and not store them at all.) The second program, uwc (see Listing Two), is a bit more sophisticated; it stores all the words in a sorted set before counting them so that duplicate words are discarded.

The two programs look almost identical, even though they perform substantially different functions and use completely distinct internal data representations. This is because the interface to the copy algorithm is defined entirely with iterators. In fact, copy can be defined in a very similar way to the generic find; see Example 4. The only significant difference between copy and find is that copy deals conceptually with two types of iterators--one for the sequence it is copying from and the other for the sequence it is copying to. These two sequences need not be the same; they can be, but that is just a special case. To allow users to copy from, say, a linked list to a C array, you need to allow the two types of iterators to be distinct.

In wc, you are copying from an istream_iterator to a vector. In uwc, you are copying from an istream_iterator to a set. The two have very different results: Copying into a vector in this way preserves the original word order, while copying into a set removes, duplicates, and sorts alphabetically. In both of these cases, the call to copy looks the same because its interface is defined entirely in terms of iterators. (The output line is also identical because both vectors and sets define a size member function.) The ability to abstract our algorithms from our data structures is the real power of the STL.

The C++ library recognizes five major types of iterators. The reason to separate iterators into these types is that different algorithms ask different things of their data structures. The simple find algorithm, for example, makes only three demands of the class used as the iterator parameter: It must be incremented using ++, dereferenced using *, and compared using !=. Values in the underlying data structure are dereferenced only once, and the result of dereferencing is never used as an lvalue. In other words, you make a single pass through the structure, and read from it, but never write to it.

The copy algorithm is slightly more complicated, with its two distinct types of iterators. While SourceIterator need only perform the same operations as the iterator used in find, DestinationIterator uses the * operator to establish somewhere to write to, not to read from. But it also makes a single pass through the structure, dereferencing each location exactly once.

Input and Output Iterators

The C++ standard library calls the first of these iterators (find-style) an "input iterator." Technically, to be an input iterator, a type X must meet the following requirements:

Any type that meets these criteria is an input iterator as far as the C++ library is concerned, regardless of how it meets the criteria.

Iterators like DestinationIterator of the copy algorithm are called "output iterators." A type X must satisfy a different set of requirements to be considered an output iterator:

For both input and output iterators, the assumption is that only one pass will be made through the underlying structure. This makes it possible to create iterators for I/O streams. It allows you to write programs like wc and uwc that work with standard input and output without having to write any special code. You can copy to or from one of these streams just by creating a special kind of input or output iterator (called an istream_iterator or an ostream_iterator) and using the copy algorithm in the usual way.

I/O streams demonstrate perfectly what input and output iterators are really for. An input iterator is for any sequence that, like an input stream, is intended to be read only once, in order. An output iterator is for any sequence that, like an output stream, is intended to be written to once and in consecutive order. The reason the C++ library defines different sorts of iterators is that different data structures can support only certain types of iterator operations efficiently and different algorithms rely on different sets of these operations. If a structure (like an input stream) can support an input iterator, you can use the find algorithm to search it. If a structure (like an output stream) can support an output iterator, you can use the copy algorithm to fill it.

Forward Iterators

If you think of the five types of iterators as forming a hierarchy, input/output iterators represent the bottom, the simplest of all the iterators. On the next level up are objects that effectively combine the properties of input and output iterators and remove the single-pass requirement. These are called "forward iterators." More formally, a type X is a forward iterator if it meets all of the requirements for both input and output iterators, plus:

Together, these requirements allow multipass algorithms to work as expected with forward iterators. A good example of a data structure that supports a forward iterator well is a singly linked list. You can pass through a list as many times as you want without seeing different results (unlike some streams), and you can both read to and write from most lists. A good example of an algorithm that requires forward iterators is remove, which searches for all instances of a given object in a sequence and takes them out. Example 5(a), taken from HP's original implementation of STL, presents this algorithm. The algorithm depends on find and remove_copy (which is like remove except that it stores the resulting sequence in another container, rather than overwriting the original). In essence, all remove does is search the sequence described by first and last for value. If there are no instances of value in the sequence, you're done. If there is at least one occurrence, you call remove_copy to take them out, putting the resulting value-free sequence on top of the existing sequence. Example 5(b) is the code for remove_copy.

remove_copy requires only input and output iterators, because it is always reading from one place and writing to another. remove itself requires a forward iterator, because it is essentially the special case of remove_copy in which both the source and destination are the same.

Bidirectional Iterators

The next step for iterators is to provide decrementing as well as incrementing, allowing for multipass, bidirectional algorithms. A type X is considered a bidirectional iterator if it meets all the requirements of a forward iterator as well as the following stipulations regarding the--operator:

An example of a data structure that supports bidirectional iterators (but nothing more) is a doubly linked list. There aren't many standard algorithms that require bidirectional iterators, but one is reverse_ copy, which functions like copy except that the elements are copied from the source to the destination in reverse order; see Example 6.

Random-Access Iterators

At the top of the iterator hierarchy are random-access iterators. These add to bidirectional iterators the ability to jump to arbitrary locations with a structure in constant (not linear) time by introducing the following requirements:

The classic example of a random-access iterator is a C pointer into an array. One major advantage of arrays is that they support random access. The various sorting algorithms included in the C++ standard libraries (sort, stable_sort, partial_sort, and partial_sort_ copy) all require random-access iterators. Other data structures (such as the STL list class) that do not provide random access but may need to be sorted require their own special-purpose sorting routines.

Other Distinctions

Iterators can have other properties. One of the most basic distinctions among various types of iterators is whether one is dereferenceable or past-the-end. A dereferenceable iterator is one that can be dereferenced using the unary * operator. A past-the-end iterator, on the other hand, cannot be assumed to be dereferenceable. Rather than describing an element in some container, a past-the-end iterator is intended to bound the container itself by pointing just past the last value in it.

Among dereferenceable iterators, there are both mutable and constant iterators. A mutable iterator can be assigned, as in *i = a. Constant iterators cannot. In other words, the result of *i is an lvalue for a mutable iterator, but not for a constant iterator.

Specialized Iterators

The basic sorts of iterators are useful, but there are times when more variations are needed. The C++ standard library provides several types of predefined, specialized iterators for other purposes.

Iterators for Streams

I have alluded to the two types of iterators used on C++ streams. The istream_iterator template class is used to read from input streams. When the iterator is constructed and every time the ++ operator is used, a new value is read from the stream using the operator. The stream to read from is passed as an argument to the constructor of the iterator. Once the end of the stream is reached, the iterator becomes equal to a special, end-of-stream value. At this point, the results of dereferencing the iterator are undefined. An istream_ iterator constructed with the default constructor is also set to this end-of-stream value, making it easy to test an istream_ iterator before trying to dereference it. An istream_iterator is an input iterator.

An ostream_iterator acts similarly, except that it writes values using the operator each time it is assigned. Like all output iterators, an ostream_iterator cannot be read from. If you construct the iterator with both an ostream and char*, the char* will be written to the ostream after each new value is written. This is useful in delimiting output with spaces or new lines. For example, to copy a series of words from standard input to standard output and convert multiple spaces between them to a single space, you could use copy (istream_iterator< string >(cin), istream_ iterator< string >(), ostream_ iterator< string >(cout,' ');.

Insert Iterators

When making a call like copy(first, last, result), you normally expect the sequence starting with result and continuing to result + last --first to be overwritten. Sometimes, however, you want to insert a new sequence into another sequence, without having to do a series of copies into a new structure. Insert iterators facilitate this.

When you increment an insert iterator, a new element is inserted in the sequence. The iterator is set to this new element, rather than the element that would have been next in the sequence. There are three types of insert iterators, and each one inserts the new element in a different place: front_ insert_iterator adds it to the beginning of the container; back_insert_iterator adds it to the end of the container; and insert_iterator adds the new element where the iterator points to a container.

Rather than having to create instances of these classes explicitly using constructors, the library provides template functions to create them. The functions front_ inserter and back_inserter take the container and return a front_insert_iterator and a back_insert_iterator, respectively, on that container. For example, given a container x, front_inserter(x) returns an iterator that will insert new elements at the beginning of x.

For inserting in the middle of a structure, inserter takes two arguments: the container itself, and an iterator into the container describing where to insert. For example, copy(front, back, inserter(x,result)) will take the sequence from front to back and insert it into container x starting at result.

Reverse Iterators

Sometimes you want to move backwards through a container. You can always do this explicitly using the--operator (as long as you're using a bidirectional or random access iterator), but this requires changing all of the code that traverses the container in order to change directions. An easier way is simply to create a "reverse iterator"--one that goes "backwards" with each ++ and "forwards" with each--. The C++ library provides two template classes of reverse iterators, reverse_ bidirectional_ iterator and reverse_iterator. The latter assumes that you are reversing a random-access iterator.

The STL containers that can support reverse iterators have special member functions to create them. rbegin, for example, returns a reverse iterator pointing to the end of a sequence (which is the beginning if you want to traverse the sequence in reverse); rend returns a "past-the-end" reverse iterator (which actually points just before the first element). To create a for loop that goes all the way through some vector v in reverse order, you can use for (r = v.rbegin(); r != v.rend(); ++r) {}.

In the end, all these iterators are only useful as the interface for manipulating our data structures with generic algorithms. I'll explore this algorithm more in the next article in this series.

Example 1: (a) Function that finds the first occurrence of a given character in an array of characters using only pointers to the beginning and end of the array; (b) generalizing this function to work with almost any sort of array by turning it into a template function; (c) this resembles, yet is incompatible with, (a).

(a)
char* find_char(char* first, char* last, const char& value)
{
  while (first != last && *first != value) ++first;
  return first;
}

(b)
template< class T >
T* find_in_array(T* first, T* last, const T& value)
{
  while (first != last && *first != value) ++first;
  return first;
}

(c)
template <class T>
struct list_node
{
  list_node* next;
  T          item;
};
template <class T>
T find_in_list(list_node<T>* first, list_node<T>* last, const T&

value) { while (first != last && first->item != value) first = first- >next; return first; }

Example 2: Defining list nodes.

template <class T>
class list_node {

public:
  list_node() : next(0) {}
  list_node(list_node< T >& ln) : next(ln.next), item(ln.item) 
      {}
  ~list_node() { delete next; }
  list_node<T>& operator=(const list_node<T>& ln) {
    if (ln != this) {
      delete next;
      next = ln.next;
      item = ln.item;
    }
    return *this;
  }
  bool operator==(list_node<T> ln) { return (item==ln.item &&
                                            next==ln.next); }
  bool operator!=(list_node<T> ln) { return (item!=ln.item ||
                                            next!=ln.next); }

  T& operator*() { return item; }

  list_node<T> operator++() { return *(next = next->next); }
  list_node<T> operator++(int) { list_node<T> tmp = *this;
                                 ++*this; return tmp; }

private:
  list_node<T>* next;
  T              item;
};

template< class T >
list_node< T > find_in_list(list_node< T > first, list_node< T > last,
                            const T& value)
{
  while (first != last && *first != value) ++first;
  return first;
}

Example 3: Substituting a template parameter for the list_node class and T* pointers.

template< class I, class T >
I find(I first, I last, const T& 

value) { while (first != last && *first !=

value) ++first; return first; }

Example 4: copy can be defined in a way very similar to the generic find.

     template <class SourceIterator, class DestinationIterator>
     DestinationIterator copy(SourceIterator first, SourceIterator last,
                             DestinationIterator result) {
         while (first != last) *result++ = *first++;
         return result;
     }

Example 5: (a) The HP remove algorithm; (b) remove_copy.

(a)
template <class ForwardIterator, class T>
ForwardIterator remove(ForwardIterator first, ForwardIterator last, const T&

value) { first = find(first, last, value); ForwardIterator next = first; return first == last ? first : remove_copy(++next, last, first, value); } (b) template <class InputIterator, class OutputIterator, class T> OutputIterator remove_copy(InputIterator first, InputIterator last,OutputIterator

result, const T& value) { while (first != last) { if (*first != value) *result++ = *first; ++first; } return result; }

Example 6: reverse_copy.

     template <class BidirectionalIterator, class OutputIterator>
     OutputIterator reverse_copy(BidirectionalIterator first,
                                BidirectionalIterator last,
                                OutputIterator result) {
         while (first != last) *result++ = *--last;
         return result;
     }

Listing One

#include<string.h>
#include<vector.h>
#include<iostream.h>
void main()
{
  if (argc != 1) throw("usage: wc\n");
  vector< string, less< string > > words;
  copy(istream_iterator< string >(cin),
       istream_iterator< string >(),
       inserter(words, words.end()));
  cout << "Number of words: " << words.size() << endl;
}

Listing Two

#include<string.h>
#include<set.h>
#include<iostream.h>

void main() 
{
  if (argc != 1) throw("usage: uwc\n");

  set< string, less< string > > words;
  copy(istream_iterator< string >(cin),
       istream_iterator< string >(),
       inserter(words, words.end()));
  cout << "Number of unique words: " << words.size() << endl;
}