STL Iterators

Start with iterators to use STL effectively

Dan Zigmond

Dan is a senior software engineer with Wink Communications. He can be reached at dan.Zigmond@wink.com.

The Standard Template Library (STL) developed by Alexander Stepanov and Meng Lee has been widely discussed since it became part of the proposed ANSI/ISO C++ standard. With compiler vendors lagging behind standards committees, however, many programmers are only now beginning to use STL in day-to-day programming tasks.

In this article, I'll explore STL iterators. In future articles, I'll examine STL generic algorithms and go inside the standard STL implementation from HP, explaining some of the details of how generic algorithms and data structures can be specialized using iterator tags and adapters.

Iterators

Before discussing iterators, I'll take a step back and look at C pointers, which form the basis of the iterator abstraction.

When passing an array to a function in C, you usually pass a pointer. Using this pointer (and some information about the size), you can do anything you want with the array: Traverse it in order, pick out an arbitrary element in the middle, copy it, and even destroy it. Example 1(a), for instance, is a function that finds the first occurrence of a given character in an array of characters using only pointers to the beginning and end of the array (assuming you don't have strchr handy).

Pointers are used as the interface to the data structure in find_char, which in this case is an array of characters. C++ lets you generalize this function to work with almost any sort of array by turning it into a template function. All you have to do is substitute a template parameter for all occurrences of the hard-coded type char; see Example 1(b). This generalized version of find_char works on virtually any array because it depends only on your ability to traverse an array using a pointer and compare the contents of the array with the inequality operator. In other words, if you have an array of objects of type T, find_in_array depends on your ability to traverse that array using pointers of type T*. C guarantees this for C arrays.

The difficulty comes when you decide that arrays are not the perfect internal representation for whatever you're trying to do, and you decide to move to, say, a linked-list structure. A typical implementation might look like Example 1(c), which bears a resemblance to the original find, but the two are obviously incompatible. For simplicity, list_nodes are implemented as a simple struct, rather than a more complicated (and well encapsulated) class; other sorts of implementations would require some minor changes to the new find_in_list, but the structure would remain the same.

There's something unsatisfying about find_in_array and find_in_list. At some level, they are the same function; if you were to describe the algorithms in plain English, you would probably use the same words for both. The only difference is in the interface to the underlying data structure. Simple pointers are the interface to C arrays, while pointers to list_node structures are the interface to the linked lists. Pointers are incremented with the ++ operator, while the list nodes are incremented using the idiom node = node-->next. Pointers are dereferenced using the * operator, while the list nodes are dereferenced by extracting the item member. All this seems natural (almost all C and C++ linked lists work this way), yet completely arbitrary. Why can't we access lists and arrays the same way?

Example 2 is a slightly more complicated way of defining list nodes. Using this implementation, list_nodes act very much like C pointers, so you can now rewrite find_in_list. It's interesting that this find_in_list is virtually identical to the generalized find_in_array. The only difference is in the type of the pointer-like objects used to do the iteration through the data structure. In find_in_array, I use real pointers, while in find_in_list, I use the pointer-like list_node class.

These type-only differences are exactly what templates are designed for. Just as I generalized find_char to find_in_array by substituting the template parameter T for the hard-coded type char, you can generalize both find_in_array and find_in_list to a truly generic find function by substituting a template parameter for the list_node class and T* pointers; see Example 3. This raises the question: What is this new parameter (called I in the example)? In STL, classes that meet the requirements of this parameter are called "iterators"--objects that function more or less like pointers. Like conventional pointers, iterators are used as the interface to a data structure so that you can define functions like find that will work with a variety of different data representations. Because the pointer semantics used by find (forward iteration using ++, dereference using *, and comparison with !=) can be implemented efficiently for both arrays and linked lists, the generalized find function is as efficient as the more specialized find_in_array and find_in_list; once you defined the pointer-like list_node template class, making a single find algorithm work for both arrays and lists costs nothing. Functions like find that use iterators to encapsulate efficient algorithms independent of the data representation are called "generic algorithms."

Using Iterators

Iterators make a lot more sense when taken in context. If you haven't examined sample STL programs, I've included two that use iterators (and other parts of the standard libraries) to solve a basic problem: counting the number of words from standard input. The first program, wc (see Listing One, simply stores all the words in a temporary vector, then counts them. (The program isn't very smart. A better implementation would count them on the fly and not store them at all.) The second program, uwc (see Listing Two), is a bit more sophisticated; it stores all the words in a sorted set before counting them so that duplicate words are discarded.

The two programs look almost identical, even though they perform substantially different functions and use completely distinct internal data representations. This is because the interface to the copy algorithm is defined entirely with iterators. In fact, copy can be defined in a very similar way to the generic find; see Example 4. The only significant difference between copy and find is that copy deals conceptually with two types of iterators--one for the sequence it is copying from and the other for the sequence it is copying to. These two sequences need not be the same; they can be, but that is just a special case. To allow users to copy from, say, a linked list to a C array, you need to allow the two types of iterators to be distinct.

In wc, you are copying from an istream_iterator to a vector. In uwc, you are copying from an istream_iterator to a set. The two have very different results: Copying into a vector in this way preserves the original word order, while copying into a set removes, duplicates, and sorts alphabetically. In both of these cases, the call to copy looks the same because its interface is defined entirely in terms of iterators. (The output line is also identical because both vectors and sets define a size member function.) The ability to abstract our algorithms from our data structures is the real power of the STL.

The C++ library recognizes five major types of iterators. The reason to separate iterators into these types is that different algorithms ask different things of their data structures. The simple find algorithm, for example, makes only three demands of the class used as the iterator parameter: It must be incremented using ++, dereferenced using *, and compared using !=. Values in the underlying data structure are dereferenced only once, and the result of dereferencing is never used as an lvalue. In other words, you make a single pass through the structure, and read from it, but never write to it.

The copy algorithm is slightly more complicated, with its two distinct types of iterators. While SourceIterator need only perform the same operations as the iterator used in find, DestinationIterator uses the * operator to establish somewhere to write to, not to read from. But it also makes a single pass through the structure, dereferencing each location exactly once.

Input and Output Iterators

The C++ standard library calls the first of these iterators (find-style) an "input iterator." Technically, to be an input iterator, a type X must meet the following requirements:

X must have a constructor and destructor.

X must have a copy constructor.

It must be possible to compare two objects of type X using the == operator and the != operator. The result of these comparisons must be convertible to bool.

It must be possible to dereference an object of type X using the * operator, but the resulting type need not be an lvalue type.

For objects x of type X, ++x should return an object of type X& as long as x is dereferenceable. The result must be either dereferenceable or a value signifying past-the-end.

For objects x of type X, x++ should return an object convertible to the type const X&. The expression should be equivalent to { X tmp = x; ++x; return tmp; }.

For objects x of type X, it must be possible to dereference the object using *x++ if x is dereferenceable.

Any type that meets these criteria is an input iterator as far as the C++ library is concerned, regardless of how it meets the criteria.

Iterators like DestinationIterator of the copy algorithm are called "output iterators." A type X must satisfy a different set of requirements to be considered an output iterator:

X must have a constructor and destructor.
X must have a copy constructor.
For objects x of type X and objects t of type T where X is an iterator on a data structure containing objects of type T, it must be possible to alter the data structure using *x = t if x is dereferenceable and not past-the-end.
The ++ operator must be usable in both prefix and postfix fashion with objects of type X, as described for input iterators.
For objects x of type X and objects t of type T where X is an iterator on a data structure containing objects of type T, it must be possible to alter the data structure using *x++ = t if x is dereferenceable and not past-the-end.

For both input and output iterators, the assumption is that only one pass will be made through the underlying structure. This makes it possible to create iterators for I/O streams. It allows you to write programs like wc and uwc that work with standard input and output without having to write any special code. You can copy to or from one of these streams just by creating a special kind of input or output iterator (called an istream_iterator or an ostream_iterator) and using the copy algorithm in the usual way.

I/O streams demonstrate perfectly what input and output iterators are really for. An input iterator is for any sequence that, like an input stream, is intended to be read only once, in order. An output iterator is for any sequence that, like an output stream, is intended to be written to once and in consecutive order. The reason the C++ library defines different sorts of iterators is that different data structures can support only certain types of iterator operations efficiently and different algorithms rely on different sets of these operations. If a structure (like an input stream) can support an input iterator, you can use the find algorithm to search it. If a structure (like an output stream) can support an output iterator, you can use the copy algorithm to fill it.

Forward Iterators

If you think of the five types of iterators as forming a hierarchy, input/output iterators represent the bottom, the simplest of all the iterators. On the next level up are objects that effectively combine the properties of input and output iterators and remove the single-pass requirement. These are called "forward iterators." More formally, a type X is a forward iterator if it meets all of the requirements for both input and output iterators, plus:

It must be possible to assign one object of type X to the value of another using the = operator, with the result having the type X&.

For all objects x1 and x2 of type X, x1 == x2 implies *x1 == *x2 and ++x1 == ++x2.

Together, these requirements allow multipass algorithms to work as expected with forward iterators. A good example of a data structure that supports a forward iterator well is a singly linked list. You can pass through a list as many times as you want without seeing different results (unlike some streams), and you can both read to and write from most lists. A good example of an algorithm that requires forward iterators is remove, which searches for all instances of a given object in a sequence and takes them out. Example 5(a), taken from HP's original implementation of STL, presents this algorithm. The algorithm depends on find and remove_copy (which is like remove except that it stores the resulting sequence in another container, rather than overwriting the original). In essence, all remove does is search the sequence described by first and last for value. If there are no instances of value in the sequence, you're done. If there is at least one occurrence, you call remove_copy to take them out, putting the resulting value-free sequence on top of the existing sequence. Example 5(b) is the code for remove_copy.

remove_copy requires only input and output iterators, because it is always reading from one place and writing to another. remove itself requires a forward iterator, because it is essentially the special case of remove_copy in which both the source and destination are the same.

Example 1: (a) Function that finds the first occurrence of a given character in an array of characters using only pointers to the beginning and end of the array; (b) generalizing this function to work with almost any sort of array by turning it into a template function; (c) this resembles, yet is incompatible with, (a).

(a)
char* find_char(char* first, char* last, const char& value)
{
  while (first != last && *first != value) ++first;
  return first;
}

(b)
template< class T >
T* find_in_array(T* first, T* last, const T& value)
{
  while (first != last && *first != value) ++first;
  return first;
}

(c)
template <class T>
struct list_node
{
  list_node* next;
  T          item;
};
template <class T>
T find_in_list(list_node<T>* first, list_node<T>* last, const T&
              value)
{
  while (first != last && first->item != value) first = first-
     >next;
  return first;
}

Example 2: Defining list nodes.

template <class T>
class list_node {

public:
  list_node() : next(0) {}
  list_node(list_node< T >& ln) : next(ln.next), item(ln.item) 
      {}
  ~list_node() { delete next; }
  list_node<T>& operator=(const list_node<T>& ln) {
    if (ln != this) {
      delete next;
      next = ln.next;
      item = ln.item;
    }
    return *this;
  }
  bool operator==(list_node<T> ln) { return (item==ln.item &&
                                            next==ln.next); }
  bool operator!=(list_node<T> ln) { return (item!=ln.item ||
                                            next!=ln.next); }

  T& operator*() { return item; }

  list_node<T> operator++() { return *(next = next->next); }
  list_node<T> operator++(int) { list_node<T> tmp = *this;
                                 ++*this; return tmp; }

private:
  list_node<T>* next;
  T              item;
};

template< class T >
list_node< T > find_in_list(list_node< T > first, list_node< T > last,
                            const T& value)
{
  while (first != last && *first != value) ++first;
  return first;
}

Example 3: Substituting a template parameter for the list_node class and T* pointers.

template< class I, class T >
I find(I first, I last, const T& 
      value)
{
  while (first != last && *first != 

        value) ++first;
  return first;
}

Example 4: copy can be defined in a way very similar to the generic find.

     template <class SourceIterator, class DestinationIterator>
     DestinationIterator copy(SourceIterator first, SourceIterator last,
                             DestinationIterator result) {
         while (first != last) *result++ = *first++;
         return result;
     }

Example 5: (a) The HP remove algorithm; (b) remove_copy.

(a)
template <class ForwardIterator, class T>
ForwardIterator remove(ForwardIterator first, ForwardIterator last, const T&
                      value) {
    first = find(first, last, value);
    ForwardIterator next = first;
    return first == last ? first : remove_copy(++next, last, first, value);
}

(b)
template <class InputIterator, class OutputIterator, class T>
OutputIterator remove_copy(InputIterator first, InputIterator
                          last,OutputIterator

                          result, const T& value) {
    while (first != last) {
      if (*first != value) *result++ = *first;
      ++first;
    }
    return result;
}

Example 6: reverse_copy.

     template <class BidirectionalIterator, class OutputIterator>
     OutputIterator reverse_copy(BidirectionalIterator first,
                                BidirectionalIterator last,
                                OutputIterator result) {
         while (first != last) *result++ = *--last;
         return result;
     }

Listing One

#include<string.h>
#include<vector.h>
#include<iostream.h>
void main()
{
  if (argc != 1) throw("usage: wc\n");
  vector< string, less< string > > words;
  copy(istream_iterator< string >(cin),
       istream_iterator< string >(),
       inserter(words, words.end()));
  cout << "Number of words: " << words.size() << endl;
}

Listing Two

#include<string.h>
#include<set.h>
#include<iostream.h>

void main() 
{
  if (argc != 1) throw("usage: uwc\n");

  set< string, less< string > > words;
  copy(istream_iterator< string >(cin),
       istream_iterator< string >(),
       inserter(words, words.end()));
  cout << "Number of unique words: " << words.size() << endl;
}