Programming with the Standard Template Library

Sage advice for coping with STL

Thomas Keffer

Tom is the founder and CEO of Rogue Wave Software. He has served on the C++ ANSI/ISO Committee and is actively involved in working to evolve C++ into an even more practical and elegant object-oriented language.

In terms of algorithms, data structures, internationalization, and language support, the Standard C++ Library could well be the single most powerflibrary ever incorporated into a standardized language. The Standard C++ Library includes an iostream facility, a locale facility, a templatized string class, a templatized class for representing complex numbers, a class for numerical arrays, support for memory management (new and delete), and language support (terminate(), unexpected(), and so on). It also includes one other important component--the Standard Template Library.

The Standard Template Library (STL) was developed by Alexander Stepanov and Meng Lee at HP Labs. In July of last year, HP proposed to the ANSI/ISO Standardization Committee that the STL be included as part of the Standard C++ Library. This addition was approved by the committee, and last August Hewlett-Packard offered a freely copyable version of the STL (available for anonymous FTP at butler.hpl.hp.com and mirror sites). Earlier this year, DDJ named Stepanov as a recipient of the "Excellence in Programming Award" for his work on the STL; see "Dr. Dobb's Journal Excellence in Programming Awards" (DDJ, March 1995).

The STL is a large set of templatized classes and has an unusual and elegant architecture. A key concept of the STL is that it separates data structures from algorithms. This prevents it from being very object oriented, but it also gives it unusual flexibility. While this library is currently known informally as "STL," it will eventually lose its distinct identity as it becomes subsumed by the rest of the Standard C++ Library. Eventually, it will all be known as the "Standard C++ Library."

Algorithms and the STL

Algorithms in the STL are written in generic terms, without regard for the data structure that might be used to hold the elements upon which they will operate. This allows you to add additional data structures that can immediately leverage existing algorithms. Conversely, you can add new algorithms that will work with any data structure. Of course, different data structures have different time/space characteristics, which, after all, is why more than one data structure is useful. Hence, for this approach to work, the time/space characteristics of both data structures and algorithms must be recorded and standardized. This is what the STL does, allowing structures and algorithms to be combined in highly predictable ways.

Algorithms can be generalized to work on any data type and any STL-compliant data structure. Consider, for example, the linear-search algorithm in Figure 1(a), which searches an array of integers for an element with a particular value. Figure 1(b) shows how you might use such a function to find the integer value 5 in a given array. Note that if the search fails, the algorithm will return a pointer to one past the end of the array (address a+20, in this case). While there is no element there, both C and C++ guarantee that this is always a valid address. You can't dereference such an address, but you can build a pointer that contains it.

Hence, this is a completely general algorithm for finding an element within a C-like array of like-typed elements. You can generalize the algorithm to work with any data type using templates as in Figure 1(c). This will work not only with integers, but with linear arrays of any type that supports an inequality operator; see Figure 1(d). Our templatized algorithm, find2(), requires:

That for a variable a of type T and a variable q of type pointer to T, the expression *q!=a be convertible to type bool.
That the second parameter be reachable from the first; that is, that you be able to continually increment the first pointer and eventually reach the second pointer.

In return, our algorithm promises to:

Return a pointer pos for which the expression *pos!=value evaluates False.
Return the value end if no such pointer can be found.
Take linear time.

This last promise states that as elements are added, the time taken for the algorithm to execute will expand linearly (as opposed to quadratically, logarithmically, and so on).

This is as precise a statement as we can make about the algorithm. Given types and values that satisfy the input requirements of the algorithm, you can make very strong statements about the output variables. These strong statements make it possible to combine types and STL algorithms in new and novel ways, while making strong guarantees about the outcome.

Generalizing an Algorithm

As general as our algorithm is, it can be made still more general. Right now, it will work only with linear arrays. But other kinds of linear searches are possible, through a linked list, for example. How might you accommodate these? The problem is that our first attempt at an algorithm advances to the next element by incrementing a pointer. To advance down a linked list, you need to chase a pointer.

You can generalize to any kind of linear search by introducing a kind of abstract pointer called an "iterator." The iterator is incremented by the familiar p++ notation, but the actual implementation might involve chasing a pointer or taking even more exotic actions. Figure 2 presents the most general kind of linear-search find algorithm, as provided in the STL.

Like the previous algorithm, Figure 2 has been templatized on type T, but it has also been templatized on the type of pointer. It performs a linear search between the element pointed to first and the element just before end, looking for value. If successful, it returns an iterator that points to the matching element. If unsuccessful, it returns an iterator equal to end. The actual type of the pointer parameterized by the type InputIterator pointer is left unspecified.

I find it useful to think of an iterator as a kind of pointer but, really, all they have in common with pointers is a similar interface. As a minimal requirement, you need to be able to apply a dereferencing operator (*p) to an iterator and be able to increment it (p++). These operators can be either the native operators for built-in pointer types or overloaded operators that you have supplied for nonnative types.

Remember, algorithms are written in terms of a "signature" on types. So long as your iterator offers the correct signature, it will work. For example, Figure 3(a) presents a linked-list class. Because a simple pointer type will not suffice for traversing this list, I also supply the list iterator shown in Figure 3(b).

Note that the iterator supports a dereferencing interface through an overloaded "*" operator. You can also increment the iterator through an overloaded "++" operator. Finally, an overloaded "!=" operator tests for inequality of iterators. With these supported interfaces, the list iterator is indistinguishable from a built-in pointer, as far as the find() algorithm is concerned.

Because the relationship between iterators and the data structure over which they traverse can be complicated, STL data structures supply simplified interfaces for the most common iterator values: the start of the structure (member function begin()) and one past the end of the structure (end()). The class in Figure 3(a) shows sample implementations of begin() and end(); Figure 3(b) shows the ListIterator used by these functions.

Finally, given a data structure type, it is useful to know the type of its corresponding iterator. This is supplied by a public typedef, also shown in Figure 3(a). Figure 4 shows how you use the results. Note how our find algorithm is completely comfortable using either built-in arrays or linked lists, of any type. Provided that we have included sensible iterators, it will also be just as efficient as any algorithm coded with a particular data structure in mind.

Many other algorithms are offered by the STL. For example, there is a binary-search algorithm for sorted sequences, as well as algorithms for sorting, merging, copying, and shuffling sequences.

STL Data Structures

So far, all we have looked at only one STL algorithm, with a hint of how a data structure might be written. While the focus of the STL is algorithms, it also includes a modest set of data structures; see Table 1. All data structures include an interface necessary to support the algorithms. Hence, the same algorithm can be used on many data structures. Conversely, the same data structure can be used by many algorithms. But, things cannot be combined willy-nilly. For example, many algorithms (sort comes to mind) require iterators more capable than the simple InputIterator examined earlier. For example, the iterator may have to support random indexing p[i], which would be insufferably slow with a linked list. For this reason, the iterator associated with the STL data-structure list does not support random indexing.

All the information you need for deciding which algorithms can be usefully combined with which data structures has been carefully recorded as part of the standardization process. However, it is up to you to look up this information (perhaps in online help pages) and to be aware of the hazards.

Limitations of the STL

While the STL is powerful, it has its limitations, the biggest being that it is not very object oriented. Object orientation depends on data and algorithms being combined under one interface. The whole premise of the STL is at odds with this: The STL separates data and algorithms, allowing their recombination in novel and useful ways. This exposes the programmer to a host of programming errors. For example, consider the code in Figure 5(a). If the ending iterator is not reachable from the starting iterator, that is, if they refer to different data structures (perhaps because the programmer accidentally passed in the wrong iterator), results are unpredictable. This example could lead to a program crash.

This is a danger of the STL. The algorithms bristle with pointer-like iterators, none of which have any relationship to each other that the compiler can check. The result is code that is less object oriented and more prone to errors that go undetected at compile time.

In Figure 5(b), we start by filling two list data structures, a and b, with ten elements, the numbers 0 through 9. At this point, the two lists are identical. We then use the STL remove algorithm to erase all instances having the value 5 from list a. We might expect the resultant data structure to have nine elements, but it still has ten. This is because, in general, the algorithm is handed only iterators. Given an iterator, there is no way for the algorithm to figure out into which data structure it points. Hence, the algorithm is unable to adjust the size of the resultant data structure. You must do this yourself. The correct answer is obtained by calling the special and (arguably) more object-oriented member function list<T>::remove(const T&), as in Figure 5(b), where it has been applied to list b.

Multithreading

The inability of an iterator to be bound to a specific data structure is a general problem and can lead to other limitations. For example, the STL cannot be made multithread safe. A multithread-safe data structure ensures proper synchronization between accessing threads. One way to do this, pioneered by the C++ Booch Components, is to have each thread, upon entry to a member function, lock the data structure, preventing its access by another thread. This lock is then released when the thread returns from the function.

Unfortunately, it is impossible to write a multithreaded STL algorithm that uses standard STL iterators. For example, suppose you wanted to find an element in a vector while locking it from access by another thread that could potentially modify the vector while the search is being performed. The find() function accepts InputIterators to do its work. InputIterators do not have any locking concept in their interface. One could lock the data structure to which an iterator refers, but given an STL iterator, there is no way to get a reference to the data structure into which it points.

Again, this problem arises because of the STL's approach of separating data from the algorithm. Because of this separation, there is no way for the algorithm to control state (a synchronization lock) that is not directly related to iterators.

Making the Best of the STL

With its flexible design, the STL is a powerful resource. However, for all its power, it can be quite complicated and error prone. To avoid errors, consider subclassing the STL data structures (vector<T>, list<T>, and so on) and putting the algorithms you most commonly use (such as find()) directly in the subclass interface. For example, you can create the class in Figure 6(a). With this approach, we put the algorithm right into the interface. The context of the search is fully known: The search will be done within the data structure. You don't have to break the vector apart to supply starting and ending iterators. There is no possibility of the ending iterator being unreachable from the starting iterator. Hence, the results are simpler and less error prone, as shown in Figure 6(b).

Because MyVector<T> publicly derives from its corresponding STL class, all of the iterator facilities will be available. It can be used anywhere an STL data structure can be used, allowing you the full use of the STL algorithms; see Figure 7.

This approach also allows you to offer a multithread-safe version. Note how this class does not publicly inherit from its STL equivalent (an implementation could, however, inherit privately). This is because STL iterators cannot be used safely in a multithread-safe environment. Hence, they have not been exposed. Instead, access must be gained through member functions, so they can ensure that a lock has been acquired. For example, the implementation of sort() would look like Figure 7(d).

Summary

I've presented some of the fundamental ideas behind the Standard Template Library and shown how its design can lead to great flexibility. The STL represents a breakthrough in the genericity of data structures and algorithms, but it can be tedious and tricky to use.

I believe the STL will eventually be most useful in the hands of component writers who will use its powerful facilities to build components with larger granularity. Internally, these components can take advantage of the algorithms and efficiencies of the STL, while offering a more object-oriented interface to their clients.

Figure 1: (a) Linear-search algorithm; (b) using find1() to locate the integer 5 in an array; (c) using templates to generalize the linear-search algorithm; (d) find2() will work for type that supports an inequality operator.

(a)
int* find1(int* first, int* end, const int& value)
{
  while(first != end && *first != value)
    first++;
  return first;
}

(b)
int a[20];
// ... fill the array "a" ...
int* pos = find1(a, a+20, 5);  // Searches a[0] through a[19]
                             // Returns &a[20] on failure
assert( pos==a+20 || *pos == 5 );

(c)
template <class T> T* find2(T* first, T* end, const T& value)
{
  while(first != end && *first != value)
    first++;
  return first;
}

(d)
class Foo { ... };
bool operator!=(const Foo&, const Foo&);
Foo fa[20];
// ... fill the array "fa" ...
Foo f;
Foo* pos = find2(fa, fa+20, f);
assert( pos==fa+20 || !(*pos!=f) );

Figure 2: General linear-search algorithm as presented in the STL.

template <class InputIterator, class T> InputIterator
find(InputIterator first, InputIterator end, const T& value)
{
  while(first != end && *first != value)
    first++;
  return first;
}

Figure 3: (a) A template class for linked lists; (b) a list-iterator class.

(a)
// Forward declaration:
template <class T> class ListIterator<T>;
template <class T> struct Link
{
  Link<T>* next_;
  T val_;
  Link(T p) : val_(p), next_(0) {;}
};
template <class T> class List
{
  Link<T>* head_;
public:
  typedef ListIterator<T> iterator;
  List() : head_(0) {;}
  void addLink(Link<T>* link)
    {link->next_=head_; head_=link;}
  ListIterator<T> begin() const
    {return ListIterator<T>(head_);}
  ListIterator<T> end() const
    {return ListIterator<T>(0);}
  friend class ListIterator<T>;
  };

(b)
template <class T> class ListIterator
{
  Link<T>* current_;
public:
  ListIterator(Link<T>* link) : current_(link) {;}
  T operator*()
    {return current_->val_;}
  void operator++(int)
    { current_ = current_->next_; }
  int operator!=(const ListIterator<T>& it)
    {return current_!=it.current_;}
};

Figure 4: Using an iterator's type information.

int main()
{
  List<int> list;
  list.addLink(new Link<int>(1));
  list.addLink(new Link<int>(2));
  list.addLink(new Link<int>(3));
  list.addLink(new Link<int>(4));
  List<int>::iterator pos  = find(list.begin(), list.end(),2);
  assert(*pos == 2);
  List<int>::iterator pos2 = find(list.begin(), list.end(),6);
  assert(!(pos2 != list.end()));
  return 0;
}

Figure 5: STL's separation of data structures from algorithms exposes the programmer to a number of potential programming errors.
(a)
vector<int> v(10), x(10);
// ...
// Oops!
vector<int>::const_iteratorposition =
  find(v.begin(), x.end(), 5);
assert(*position == 5);

(b)
void main()
{
  list<int> a, b;
  for(int i=0; i<10; i++)
  {
    a.push_front(i);
    b.push_front(i);
  }
  remove(a.begin(), a.end(), 5);
  assert(a.size()==9); // fails
  b.remove(5);
  assert(b.size()==9);  // OK
}

Figure 6: (a) Subclassing an STL data structure, such as vector <T>, and encapsulating STL algorithms such as find(); (b) the result is a much simpler class interface.

(a)
template <class T> class MyVector : public vector<T>
{
public:
size_t index(const T& val) const;
void sort();
};
template <class T> size_t
MyVector<T>::index(const T& val) const
{
  const_iterator pos = ::find(begin(), end(), val);
  return pos==end() ? (size_t)(-1) : pos-begin();
}
template <class T> void MyVector<T>::sort()
{
  sort(begin(), end());
}

(b)
MyVector<int> v(10);
size_t position = v.index(5);   // Much simpler interface
assert(v[position] == 5);

Figure 7: (a) Because MyVector<T> publicly derives from its corresponding STL class, it can be used anywhere an STL data structure can be used; (b) this includes being passed by reference where an STL structure is expected; (c) this approach also allows a multithread-safe version; (d) implementing a multithread-safe sort().

(a)
MyVector<int> v(10);
// Sort only the first 7 elements, using the STL sort
// algorithm:
sort(v.begin(), v.begin()+7);

(b)
void foo(const vector<int>&);
void main() {
  MyVector<int> v(10);
  foo(v);   // OK
}

(c)
template <class T, class Lock> class MyVector_MT {
private:
  Lock lock_;
  MyVector<T> vec_;
public:
  ...
size_t index(const T&) const;
void sort();
};

(d)
template <class T, class Lock> void MyVector_MT::sort()
{
  lock_.lock();     // Acquire lock
  vec_.sort();      // Call non MT-hot algorithm
  lock_.unlock();   // Release lock
}

Table 1: STL data structures.


Data         Description
Structure
vector       Linear sequence.
list         Doubly linked list.
deque        Linear sequence, with constant time prepend and append.
set          Associative array of unique keys.
multiset     Associative array of (possibly) nonunique keys.
map          Associative array of unique keys and associated values.
multimap     Associative array of (possibly) nonunique keys and values.