STL has a number of containers. Vectors are, in many ways, the easiest to understand.
Introduction
Last month, I gave a general introduction to the template container classes defined by the Standard Template Library (STL). (See "Standard C/C++: Containers," CUJ, December 1996.) I now begin a more detailed consideration of those containers. The tour begins with the header <vector>, whose sole purpose in life is to define template class vector.
A container is a class that manages a sequence. How it does so determines which operations on the container are easy and fast to perform, and which are hard and slow. STL offers an assortment of containers with different properties so you can choose the one with the most desirable attributes for a given application.
Template class vector is, in many ways, the simplest of all the containers. It stores a controlled sequence of length N as an array of N or more contiguous elements. The advantage of an array is well known to C and C++ programmers you can access any element in small and constant time. All you have to do is add an integer N to the pointer to the first element of the array X and you have a pointer to element X[N]. Using STL terminology, template class vector supports random-access iterators.
Table 1, reproduced from last month's installment, shows how template class vector stacks up against the other STL containers. Note that template class deque also promises constant-time access to the arbitrary element X[N]. What the table doesn't tell you, however, is that rather more computation is involved in locating that arbitrary element. Moreover, deque is obliged to store a pointer object for each element of the controlled sequence, for what that may be worth. (Actually, a deque object uses slightly more storage than this, but that's the topic of a later essay.) A vector object, by contrast, requires no additional storage per element. vector loses big time, however, when the size of the array it controls must increase. All the implementation can do is allocate storage for the new, larger array, copy over the previous contents, and add in the new contents. That operation clearly takes time proportional to N, the length of the new sequence. The other STL containers can always do better when prepending to the controlled sequence. Most can do better when inserting elements somewhere inside the sequence.
Appending to the controlled sequence is a slightly different matter. If a vector object always maintains an array of exactly N elements to hold a sequence of N elements, the work required to append an element is the same as that required to prepend an element. But that is not the case. Template class vector defines additional member functions that let you maintain a bit of head room:
- reserve lets you specify an array size larger than currently needed to store the controlled sequence.
- capacity lets you discover the actual current array size.
Moreover, erasing (deleting) elements does not oblige a vector object to shrink the array.
If the array has spare capacity, appending an element requires no reallocation of storage. The operation can take place in constant time. It thus makes sense, in some circumstances, to use a vector object as the underlying implementation of a stack, or last-in first-out queue. You must guess in advance how much spare capacity to provide, of course, or be willing to accept an occasional delay as the vector object reallocates storage. But a vector can nevertheless make a sensible stack.
Template class vector warrants comparison with one other creature. The draft Standard C++ library also defines template class string, as a generalization of numerous string classes from past practice. (See "Standard C/C++: The Header <string>," CUJ July 1995 and "Standard C/C++: Implementing <string>," CUJ August 1995.) Template class string was not part of STL as originally defined by Hewlett-Packard. It has nevertheless accreted, over the past year or two, all the properties required of an STL container.
One template parameter of class string is the element type it need not always be type char. And a string object is pretty much obliged to store its controlled sequence as an array. So the question arises, when does it make sense to use a string and when a vector? Here are a few guidelines:
- A string element cannot have a constructor or destructor. It must, in fact, be the kind of type you can declare in a C program. (The formal name for this category of type is POD, or "plain old data" type.) A vector element, by contrast, is any type that has a default constructor.
- Template class string requires a "traits" class as another of its template parameters. Traits specify critical aspects of how to move and compare sequences of elements, and how to read and write files of elements. The draft Standard C++ library supplies template class char_traits, with specializations for the element types char and wchar_t. If none of these meet your needs, you can and must define your own. A vector element, by contrast, makes no use of such traits.
- A string object can deliver up a null-terminated sequence. A vector object just deals with the sequence of elements you store in it.
- A string object can use copy-on-write semantics, which can improve performance considerably. Template class vector cannot, in general, perform such an optimization.
Template class string also defines oodles of additional member functions that do string-ish things to the controlled sequence. You could, of course, avoid using these additional member functions and just do vector-ish things. Usually, one of the considerations I listed above dictates which of the two containers makes more sense in a given application.
So in summary, you use template class vector when you need fast random access to elements of the controlled sequence. Growing a vector object is relatively expensive, unless you can anticipate growth well enough to reserve spare capacity in advance. For elements of type char or wchar_t, it might make more sense to use template class string instead.
Implementing <vector>
Listing 1 shows template class vector from the header <vector>. I remind you that the /// comments indicate compromises needed for current compiler technology. Features required by the draft C++ Standard but not supported by many current compilers are commented out with a leading ///. Replacement code is "commented in," and flagged for later removal, with a trailing ///.
Last month, I described the properties common to all containers. I won't describe here how vector meets its obvious obligations as a container. Instead, I focus on peculiarities of its implementation. The heart of a vector object, as usual, is the data it stores to represent the controlled sequence:
- First is a pointer to the allocated array that stores the controlled sequence. It is a null pointer if no array has been allocated.
- Last is a pointer just past the last element, if First is not null.
- End is a pointer just past the last element of the array, if First is not null.
An obvious invariant, then, is:
first == 0 || first <= last && last <= nextAll controlled storage is managed through the allocator object specified when you construct a vector object. (See "Standard C/C++: Allocators," CUJ June 1996.) Allocators were originally invented to allocate and free arrays of objects of some element type T. They have since been made far more ambitious, and complex. But template class vector is content with the most direct use of allocator functionality:
- It allocates arrays by calling allocator.allocate and frees them by calling allocator.deallocate.
- It destroys elements by calling allocator.destroy (but avoids allocator.construct in favor of template functions such as uninitialized_fill).
- It estimates the maximum permissible sequence length by calling allocator.max_size.
As you will see in the coming months, it doesn't get any simpler than this.
I commend Listing 1 to your careful study. Template class vector makes a good introduction to STL containers in general. It also makes a good prototype for any simple container you might choose to define yourself.
If you want to see a slightly more complicated container, however, take a look at Listing 2. It shows how to explicitly specialize vector for elements of type bool. A Boolean object requires only one bit, but type bool typically requires eight to 32 bits to represent. A large vector<bool> object can waste considerable storage, if you rely on the template to generate the necessary code. It thus makes sense to provide special handling for this useful case.
The trick here is to implement vector<bool> in terms of vector<_Vbase>, where _Vbase is an unsigned integer type that the implementation manipulates reasonably efficiently. The vector<bool> object can thus store _VBITS Boolean elements per element of the underlying sequence.
The real trickery comes in defining the member classes iterator and const_iterator. An iterator object must be able to pack and unpack bits from the underlying sequence on demand. Note that these iterators, and class vector<bool> itself, define the added member function flip, which flips (inverts the truth value of) one or all of the bits in question.
Once again, I leave it to you to study Listing 2 at leisure. Compare it to Listing 1 to highlight the differences. Just as template class vector reveals STL containers at their simplest, vector<bool> provides a simple introduction to some of the more clever games you can play with containers. o
P.J. Plauger is senior editor of C/C++ Users Journal. He is the author of the Standard C++ Library shipped with Microsoft's Visual C++, v4.2. For eight years, he served as convener of the ISO C standards committee, WG14. He remains active on the C++ committee, WG21. His latest books are The Draft Standard C++ Library, Programming on Purpose (three volumes), and Standard C (with Jim Brodie), all published by Prentice-Hall. You can reach him at pjp@plauger.com.