December 1999/Standard C/C++

C/C++ Contributing Editors

Standard C/C++: Frequently Answered Questions: STL

P. J. Plauger

Here are some of the things you might want to know about using STL with Standard C++, but maybe haven't discovered the need to ask.

Last month, I answered a handful of the questions about the C++ Standard that I've seen pop up most often on the newsgroups. (See "Standard C/C++: Frequently Answered Questions," CUJ, November 1999.) Specifically, I monitor:
comp.lang.c
comp.lang.c++
comp.std.c
comp.std.c++
and the moderated versions of the above, as appropriate. I also monitor a couple of the Microsoft VC++ newsgroups, but they're less relevant here.

Every newsgroup, near as no matter, has its associated Frequently Asked Questions (or FAQ) file. Here is where you go to get the common lore. A wise newbie will first read the FAQ file before posting a question, lest a lower-case version of the wrath of Jehovah descend from the old hands. My limited experience as a teacher has taught me how easy it is to lump all students together. You find yourself getting mad at student number 50 for asking the same question as the previous 49. But the techies who camp on newsgroups often lack the same genteel tolerance as us semi-pros.

So how does something become a FAQ? Obviously, by being asked frequently in the past. A tougher question is, where does the answer come from? Less obviously, by being put forth as the same answer frequently enough in the past. An FAQ in waiting often unearths a certain amount of disagreement. The eventual answer may be argued out over weeks or months (yes, they do go on, and on, and on) before something resembling consensus arises. Only then does a frequently asked question qualify for the FAQ file, because the self-appointed experts think they now know how to answer it.

What I supplied last month was a little different. I put forth some answers in waiting, as it were. The questions occur frequently enough, to be sure, and they are frequently answered. But the answers have not always settled down. So I took the liberty of promoting my own answers, as a step toward turning frequently answered questions into FAQs.

I soon realized I had sketched the neat little top of a rather larger iceberg. A bit more review of past postings, and email to Dinkumware support, revealed a longer list of questions of comparable importance to the first installment. So I continue this month with a grouping that focuses on the Standard Template Library (or STL), at least as it manifests itself in the C++ Standard.

Q: What's the difference (if any) between STL and the Standard C++ library?

A: Much more than you might think. The two terms are often confused, to the detriment of both. Here is a brief bit of history, to help you distinguish the two.

The Standard Template Library is a collection of C++ templates developed at Hewlett-Packard Labs, primarily by Alex Stepanov (now at Silicon Graphics) and Meng Lee. It is based heavily on earlier work by Stepanov and David Musser (then at Rensselaer Polytechnic Institute), in Ada and other languages.

STL consists of three major components: containers, algorithms, and iterators. The containers are a handful of template classes that implement the commonest data structures, such as lists, vectors, and binary trees. The algorithms are a huge assortment of template functions that perform common (and often hard to get right) operations on sequences of homogeneous elements. Sorting, searching, and summing are but a few of the many algorithms in STL. Finally, iterators are a generalization of pointers in C and C++ for stepping through sequences in various ways. Algorithms use iterators religiously, and containers supply iterators for stepping through their contents as a sequence. Thus, STL provides an enormous combination of possibilities. And it does so with code that is remarkably efficient and reusable.

Alex Stepanov presented this work to the C++ standards committee in late 1993. The draft C++ Standard was nominally about to freeze, but STL created such excitement that the committee reopened the patient, as it were. By summer of 1994, the committee had voted to incorporate STL largely unchanged as a single major addition to the library described in the draft C++ Standard.

Meanwhile, that library was evolving in other directions as well. It began as a few language-support functions, such as operator new and set_new_handler, plus a tidied-up version of the traditional iostreams classes. Along the way, it picked up a couple of string classes, one based on char and one based on wchar_t, plus complex arithmetic classes for the three floating-point types. It looked not unlike a typical C++ library shipped with cfront in the early 1990s. (See my book, The Draft Standard C++ Library, Prentice-Hall, 1995.)

But the advent of STL brought with it a flood of new proposals. The string classes became a template class, plus a "traits" class to tailor it for each character type. The complex classes likewise became a single template class (but without the traits class, unfortunately). Then all of iostreams got "templatized" as well. And into the middle of this got plunked an ambitious mechanism for maintaining multiple cultural locales (akin to the locales of C), each stored in separate objects. Moreover, locales were now collections of dozens of "facets," each a specialization of a template class, to which the programmer could add still more.

By the time the dust settled and the C++ Standard froze in late 1997, STL was but a small part of the entire Standard C++ library. Big as it is, STL is easily dwarfed by the locale machinery alone. So to refer to the entire Standard C++ library simply as STL is very much a misnomer. And to assume that something called STL will serve as a replacement for a full Standard C++ library is an invitation to disappointment.

Q: I've created a vector and populated it, as follows:
#include <vector>
. ....
std::vector<char> vec;
vec[0] = 'a';
vec[1] = 'b';
. ..
vec[25] = 'z';
But I'm getting all sorts of nasty traps when I try to run it. Sometimes the program doesn't die until the vector is destroyed. Why isn't my vector managing storage properly?

A: You're right to assume that the template containers manage storage for you, but this is not how you make it happen. Subscripting a vector is a way to access elements that already exist. The vector will not grow automatically to make a subscripted store valid. It is, in fact, always an error to access an element outside the half-open interval [0, vec.size()). You're getting traps because you're sometimes storing outside the space allocated for the array controlled by the vector object. The trap that occurs when the vector is destroyed is probably caused by damage to the pointers that link blocks of the heap. They often lie just outside the area set aside for the allocated object, so they're easily overwritten.

Your code creates a vector vec of length zero. To add elements, you can call any of several member functions, as in:
#include <vector>
. ....
std::vector<char> vec;

// insert element at end
vec.push_back('a');     

// also at end
vec.insert(end(), '*'); 

const char az[] =
   {"abcdefghijklmnopqrstuvwxyz"};

// change * to b-y
vec.replace(end() - 1,
   az + 1, az + 24);
vec.resize(26);     // pad with null
vec[25] = 'z';  // okay to alter existing null
All of the member functions push_back, insert, replace, and resize can change the length of the controlled sequence (which happens to be an array in the case of a vector container). But please note that the last assignment does not change the length; it works only because the previous expression statement ensures that vec[25] is a valid lvalue expression.

Note also that the vector grows on demand. It may well set aside no storage at all when you create it empty. Inserting the first element with vec.push_back('a') causes the container to allocate storage for an array of at least one element. If that's all the storage the container buys, then the very next expression statement causes the container to allocate storage for an array of at least two elements, copy over the defined element, and free the previous array storage. The insert member function can then grow the size of the vector by defining yet another array element.

This may sound like a lot of churning, but it's not as bad as it seems. A vector is at liberty to allocate more additional storage than it needs just to satisfy the current operation. In fact, a vector is obliged to grow by some factor (by doubling in size, for example) rather than just growing exactly as needed or by some fixed increment. That way, the amortized cost of reallocating storage and copying over existing elements contributes much less to the time complexity of growing an array.

The down side of this anticipatory growth is occasional wasted space, and a certain loss of control. A vector that grows by doubling, for example, can have up to 50 per cent wasted space. If it grows steadily over time, the average wasted space is around 25 per cent. Equally, you have only limited control over how big the allocated array is at any given time. It is easy to set aside extra space, to minimize churning as the vector grows. Just call vec.reserve(N) to set aside space for N elements. But it is essentially impossible to restrict a vector from reserving more space than you desire.

Q: Okay, so how do I trim a vector to size?

A: There's no direct way to do so, but you do have an out. It turns out that a vector constructor is obliged to construct an object of exactly the length you specify. The same is true of template class basic_string<Elem>, which maintains a string of "characters" of type Elem. Strings are the only other critters in the Standard C++ library with a member function reserve to set aside reserve space for growth. So here is a cute dance you can perform:
vec.swap(vector<int>(vec));
This expression statement first constructs a temporary vector by copying the contents of vec. The temporary is obliged to be trimmed to the current active size of vec. The member function swap then swaps the vec controlled sequence with the temporary. The temporary gets any excess fat and vec gets the copy trimmed to size. Finally, at the end of the expression statement, the temporary is destroyed. The old fat copy gets sent to its final reward.

Nobody pretends that this trick is elegant, but it is pretty much guaranteed to work on all implementations that conform to the C++ Standard. Since the C++ Standard failed to provide any more explicit mechanism, it is the only idiom known to do the job. Get used to it.

Q: When it is destroyed, will a vector<thing *> destroy all the objects of class thing pointed at by its elements?

A: You're right to assume that the template containers construct and destroy objects for you, but this is not how you make it happen. A vector of pointers stores pointers to objects that have been separately allocated. But it can store null pointers, pointers to objects allocated on the heap (which must eventually be explicitly freed), or pointers to objects allocated in automatic storage (which must never be explicitly freed). Since the vector cannot know the story behind each pointer, you can't expect it to do the right thing with each pointer. So it does nothing. It is thus up to you, the programmer, to do any necessary destroying or deleting before the vector itself is destroyed and the pointers go away.

If you create a vector<thing>, however, the vector knows what to do. Every element is constructed when it is first born. As you rearrange the vector, the contents of the element may change by assignment. But when the vector is destroyed, you can be sure that every element that was ever constructed is destroyed before storage for the vector evaporates.

It is worth noting that all these operations take place for a vector<thing *>, but that is generally not very exciting. To construct a pointer, the program simply has to store a pointer value in it. (It doesn't even have to do that for a pointer in automatic storage, which can be left uninitialized, but the template containers are a bit more tidy.) To destroy a pointer, the program simply abandons its storage. The same is true for all the other scalar types, all of which are predefined by the language. In this regard, C++ behaves much like C, which is rather more cavalier about creating, copying, and destroying objects.

What you need to remember is that the template containers in STL do indeed create, copy, and destroy objects, but all of these operations are shallow. That is, the containers just exercise constructors, assignment operators, and destructors for the elements themselves. If they don't deal with any deeper structure, neither does the container.

Q: Is STL thread safe?

A: The officious answer is, "Your question is out of line." The C++ Standard says nothing about multithreading, so STL has no obligations in this department one way or the other. An implementation can make any guarantees that it wants, or none at all, and still conform to the C++ Standard.

The evasive answer is, "That depends on what you mean by ‘thread safe.' " It is also the most honest answer, because people have put forth all sorts of criteria for writing thread-safe code.

But the most informative answer is, "More and more implementations of STL endeavor to be thread safe to a practical level." That practical level requires you to protect any container objects you share across multiple threads. If one thread is writing a container, you'd better block all others from reading or writing it at the same time. Otherwise, you can get race conditions that cause a thread to read inconsistent data from the container. Or worse, writes by multiple threads can leave the container in an inconsistent state.

I talked about thread safety at considerable length just over a year ago. (See "Standard C/C++: Exception Safety in STL," CUJ, October 1998.) But it is worth excerpting that discussion here. First, here are several principals to keep in mind when writing thread-safe code.

The Standard C++ library can't solve all your multithreading problems.

Even if every library call were thread safe, your program might still not be. A critical section in your code may well involve calls to several library functions. Locking each call doesn't make the critical section thread safe. And if you supply the thread lock to make the critical section safe, then the locks on the individual calls just slow things down.

The programming language Java provides an interesting contrast. It builds synchronization right into the language. It's very easy to lock a critical section in Java, or every method call for a class if you choose. It supplies library classes that are protected as necessary so that each library method call is thread safe. But you can still get race conditions and deadlocks in a Java program. You may not notice all the extra locking overhead in an interpreted environment that's slow to begin with, but you don't profit all that much from the overkill either.

The Standard C++ library should minimize the need for thread locking.

The best way to make code thread safe is to make multithreading irrelevant. A pure function such as abs, for example, should have nothing to worry about. It can do all its work in local storage, which is private to each thread. Indeed, you'd have to go out of your way to make problems with such a function.

One way to make problems is to maintain writable static objects in a class or function. The Standard C library has several notorious functions of this ilk. strtok, for example, stores state information between calls. It's rather hard to support two or more threads using strtok to advantage, at least not without adding some protective code. The Standard C++ library tries not to require such shared writable statics. A good implementation shouldn't add them where they are not strictly required.

Another, more subtle, way to make problems for multithreading is through the use of "mutable" member objects. You may think you're only reading the contents of an object, but in reality you're causing the internal state of the object to change. For example, an apparent read may cause a cache to be updated. The net effect is that two threads may innocently try to read a shared object, which is generally a safe operation that requires no protection. But the secret writes can interfere, possibly curdling one or more reads, or even leaving the object in an inconsistent state. A good implementation uses mutable member objects only where necessary, and supplies locked access to keep reads as safe as they appear to outsiders.

All locks should be encapsulated.

Exceptions happen (to paraphrase a popular bumper sticker). They can even happen in critical sections. When they do, the last thing you want is to leave a lock set forever. So you want to put the unlocking code somewhere that is sure to get executed even if an exception is thrown. That place, as you doubtless know, is in the destructor of a local object.

So the essential discipline of thread locking is to encapsulate the locking operations in a class. Put the locking code in the constructor and the unlocking code in the destructor. Every critical section is then written as a compound statement that looks like:
{_Lockit _Lock;   // lock this block
// critical section
}
The folks at Silicon Graphics have continued to enhance STL, as a freely available product, carrying on the excellent tradition begun at Hewlett-Packard. They advance a working definition of thread safety for STL containers that is eminently practical:

A container object can safely be read by two or more threads. (A container contains no unprotected mutable objects.)

A container object can safely be read by one thread while a different container object of the same type is being written by another thread. (A container class contains no unprotected sharable statics.)

A container object cannot be safely written by one thread and read or written by another. (It's up to you, the programmer, to protect against obvious conflicts.)

Put simply, these rules promise that the Standard C++ library (or at least the STL part) won't surprise you in a multithreaded program. Equally, it won't go out of its way to help you, lest that help be ill advised.

As an important example, Visual C++ as shipped by Microsoft is not completely thread safe. This is true of both V5.0 and V6.0, despite some effort on the part of Microsoft to fix the known problems. On the other hand, you can now obtain a complete set of fixes from:
http://www.dinkumware.com/vc_fixes.html
With these changes, VC++ is indeed thread safe, at least as defined by Silicon Graphics.

To find out whether your favorite compiler has a thread-safe STL, ask the folks who wrote it.

P.J. Plauger is Senior Editor of C/C++ Users Journal and President of Dinkumware, Ltd. He is the author of the Standard C++ Library shipped with Microsoft's Visual C++, v5.0. For eight years, he served as convener of the ISO C standards committee, WG14. He remains active on the C++ committee, J16. His latest books are The Draft Standard C++ Library, Programming on Purpose (three volumes), and Standard C (with Jim Brodie), all published by Prentice-Hall. You can reach him at pjp@plauger.com.