P.J. Plauger is senior editor of C/C++ Users Journal. He is Convener of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Draft Standard C++ Library, and Programming on Purpose (three volumes), all published by Prentice-Hall. You can reach him at pjp@plauger.com.
Introduction
Last month, I described the current state of the workhorse string class in the draft Standard C++ library. (See "Standard C/C++: The Header <string>," CUJ, July 1995.) I discussed at length how this important class has changed repeatedly over the past couple of years. Various forces have pulled it in different directions, each adding its own bit of complexity. The result today is a template class, called basic_string, with three class parameters and over a hundred member functions. Several of those are template member functions, which still have next to no support by commercially available compilers.The unfortunate result is that only a few implementations of the header <string> currently exist. All of these are snapshots of a rapidly moving target. None exactly reflect the string class as of today, if only because it can't be compiled. An implementor can compromise, as I have done here, but that's only part of the problem. With significant changes coming every four months, it's hard for a book or a software release to stay current.
Even this essay can do little better. By the time you read these words, the standards committees X3J16 and WG21 will have met once more. They will almost certainly make still more changes in this area. As I write, second-round balloting is currently under way within ISO to approve the current draft as the C++ Standard. Nominally, changes can be made only to fix problems pointed out during the review process that accompanies the ballot. But given the current state of the draft, and recent committee activity, you can expect more change than that process normally implies.
I'm particularly concerned about strings because they're so widely used within the draft Standard C++ library. As I indicated last month, you can't even compile the classic "Hello world" program (using C++-style inserters, at least) without including <string>. So the stability and usefulness of this header affects practically all future users of Standard C++. It doesn't help that the public review process is going on while the community is largely ignorant of the scope and content of the existing string class. You can't adequately critique software that you can't even execute.
I published The Draft Standard C++ Library (Prentice-Hall, 1995) last year partly to help educate the community about what's in the emerging library. I fully expected changes to occur after that book was published thus the use of the word "draft" in the title but I was unprepared for the sheer volume of change that has since occurred. You can learn a lot that's still relevant about the header <string> by reading that book. The underlying design has simply been buried under several layers of templatization and other improvements. But I won't pretend that the book is anywhere near the last word on string classes in C++.
For most of the last two years, I've devoted this column to a detailed walk-through of the draft Standard C++ library. That agenda calls for me to present at this point the implementation of <string> I described in my book, with a blow-by-blow of how it works. It hardly makes sense to do so for such an embattled topic as strings, however. So I won't.
Instead, I've chosen to present a fairly recent version of template class basic_string with a minimum of commentary. I've left out a few more recent improvements, and quite a bit of the superstructure that template classes seem to demand. The idea is to show some of the complexity mandated by the current requirements for the string class, without overwhelming the presentation completely with details that are hard to explain.
Parameters and Traits
Template class basic_string has three parameters, which I tend to write as:
template<class _E, class _T, class _A> basic_string;(You need the underscores and capital letters, in a serious implementation, to protect the parameter names from being hijacked by user-defined macros.) These three classes are:
The first parameter is the most obvious one. The third is a borrowing from the Standard Template Library (STL), which uses allocators in all its container template classes. A recent change requires that strings and all STL container objects be constructed with an actual allocator object. The default allocator supplied with the draft Standard C++ library requires no storage and hence makes no use of this object. But you might want to define your own allocator that, say, manages a private heap for one or more of the strings you manipulate.
- _E the type of a "character" element, such as char or wchar_t
- _T the "traits," or assorted useful attributes, associated with such an element
- _A the type of "allocator" that allocates and frees storage for the actual character sequences controlled by basic_string objects
I'll defer most discussion of allocators to a future essay on STL. For now, I'll merely observe that they too are subject to continued tinkering. The current standardized version makes heavy use of template member classes and functions, and hence is as unimplementable as basic_string itself. My workaround for the nonce is to implement allocators much as in STL. Several macros hide stuff that's subject to change. (The change can be moving to a compiler that implements all the new features, or getting the committees to agree to revert to a simpler form of allocators.)
The second parameter, describing character traits, is modeled after allocators. A single class encloses an assortment of information related to the handling of characters, at least when they're managed as part of varying-length sequences called strings. The default traits class supplied with the draft Standard C++ library is a template class called string_char_traits.
Listing 1 shows how the library might specialize this template class for strings of type char. As you can see, it supplies static member functions for all the elemental operations a string class might have to perform. The string class, in turn, must be religious about calling the traits static member functions to manipulate character sequences. (It must be equally religious about using its supplied allocator to allocate and free storage for the character sequences.) That way, you can trot up your own traits class, instantiate basic_string to make use of it, and get a string class with the tailored behavior you desire.
Note, by the way, that the traits supply no writable storage. Hence, there's no need for a traits object to be stored within a string.
Template Class basic_string
Now for the main event, Listing 2 shows one way to implement template class basic_string. I emphasize again that it's not the absolute latest word on the subject, for reasons cited above. It also lacks one or two features that a more professional implementation might desire. But I know from experience that it basically works, modulo transcription errors. And it basically meets the requirements of the draft C++ Standard.Several funny macros hide implementation compromises, as I mentioned above. In particular, the uncertain form of allocators is hidden behind the macros _ALLOCATOR, _ALLOCATE, _DEALLOCATE, _MAX_SIZE, _PTR_TYPE, and SIZ_TYPE. A practical implementation should probably also hide the default values of the template class parameters. Few commercial compilers support this feature as of today.
You can see two other macros that deal with variations among current C++ compilers. _HAS_MEMBER_TEMPLATES evaluates to nonzero only for compilers that support member templates within classes. Absent this feature, the code supplies an alternative member function that does part of the job. Similarly _HAS_STATIC_MEMBER_INIT evaluates to nonzero only for compilers that permit the initialization of static const members a fairly recent enhancement to the C++ language.
The template class defines several secret protected member functions. Two of the most important ones are:
The code also makes extensive use of two functions which are not members of class basic_string:
- _Grow, which alters the storage reserved for the character sequence
- _Tidy, which initializes (_Tidy()) the member objects at construction time or discards any character sequence (_Tidy(true)) and reinitializes the member objects
I discussed exceptions associated with length and range errors in an earlier essay. (See "Standard C: The Header <exception>)," CUJ, Feb. 1994.)
- _Xlen, which reports a length error
- _Xran, which reports a range error
The constant MIN_SIZE is both a minimum size for character-sequence storage and a minimum increment for adding more. The function _Grow uses this parameter to round up requests for more storage, in the hopes of minimizing reallocations. It also uses the parameter trim as an indication that shrinking storage may be a good idea. The function is otherwise reluctant to do so.
Many member functions for class string come in groups, as I described last month. In this implementation, you will usually find that one member of the group does all the work. The others merely reformat their arguments and call the workhorse member.
The size of Listing 2 strains my space budget for this column to the limit, so I'll say little more about class basic_string. You can study it at your leisure, or simply marvel at its sheer size. In either case, it should give you a better understanding of the current requirements for strings in the draft Standard C++ library. I'll say no more.
This article is excerpted in part from P.J. Plauger, The Draft Standard C++ Library, (Englewood Cliffs, N.J.: PrenticeHall, 1995).