After several years of reading and thinking about STL, listening to experts talk about it, and even writing the scaffolding to test all the STL code fragments from Scott Meyers' Effective STL [2], I came up with a half-way original, STL-related application of my own: a library of templates to assist in the initialization of containers from lists of arbitrary constant values. This project evolved into my freeware InitUtil library [3].
Dan Saks has described the process of learning how to program as akin to reliving the evolutionary stages of computer programming, from machine/assembly language through procedural and then object-oriented languages [4]. If some of us take longer to traverse this path than others, then I could be poster child for the slow learners. About half way (two to three iterations) into the development of a key template function of InitUtil, I realized that this code had begun life as a C-style function (with all the attendant pointer/memory-management complexity modern C++ is so good at eliminating) and that I was now in the midst of "STL-izing" it. I was, in a way, reliving the evolutionary step from pointer-based to STL/iterator-based programming. Boy, did Dan's observation hit home in that moment.
I suspect I may not be the only one experiencing this transition anxiety. In this article, I document the steps I went through before arriving at a "pure" STL-based implementation of a simple little template function named make_s.
I'm not smart enough to go from scratch directly to the best solution to a programming problem, skipping the intermediate stages. Plus, I'm seldom careful enough to produce any program over about five lines long that compiles and runs perfectly the first time. Thus, the InitUtil library is the result of the usual iterative development process.
It all began with the template function make_s. make_s parses a comma-delimited list of string values (passed as a single text parameter) and produces an STL-compatible container of std::string objects as its return value. A second Boolean parameter, with the default value of false, tells whether or not to preserve leading spaces in a string. The specific flavor of container desired is specified via explicit template specialization during instantiation. Some examples:
// Create a list of strings, // skip leading spaces [5]: list<string> ls = make_s<list<string> >( "this, is, a, test, of, the," "emergency, broadcast, system"); // Create a deque of strings, // preserve leading spaces: deque<string> dl = make_s<deque<string> >( "here, is another, one , with," "all , leading, spaces ," "preserved", true);(There was also a companion function template named make_i for parsing numeric rather than string values [6]. However, since the subject of this article is the evolution of the parsing algorithm for value_types of std::string, I'm only going to discuss the evolution of make_s. The algorithm I show in the final version of make_s does survive essentially intact throughout the naming/calling convention improvements subsequently made to InitUtil.)
The first issue I decided to excise was the NUL stuffing. While C pointers are dependent upon a specific value, NUL, to delimit the end of a string, the workings of past-the-end iterators in STL are such that the specific value, if any, beyond the last element of a sequence is immaterial. I began my next version of make_s by switching from the std::string constructor that accepts a C string to the one that accepts a pair of iterators. I had now eliminated the need to modify the source text via NUL stuffing.
At the same time, I noticed that the temporary std::string object s was only being used in two adjacent lines of code (Figure 1, lines 28-29). Rather than separately instantiating the string on one line and then using it on the next, I wrote one statement (Figure 2, line 28) combining the instantiation of a temporary string object (using the aforementioned two-iterator constructor) with the insertion of that object into its container. Along with eliminating a named variable, this approach also seems more likely to offer aggressive code optimizers the opportunity to earn their keep.
Having streamlined the parsing algorithm, I next turned my attention to my choice of data types. Even though the main loop's code appears quite clean while employing the "pointer as iterator" technique, the initialization and cleanup at the extremes of the function make it painfully obvious you're still dealing with raw pointers here. Does the use of these pointers actually offer any benefits? Well, the generated code is nice and short ... but so what?
Code size isn't everything. That's a lesson I recently learned well: part of my STL Error Decryptor package [8] is the Proxy CL program, a plug-in replacement for Microsoft Visual C's CL.EXE compiler driver. The original implementation, using C strings exclusively, resulted in a 40-KB executable. I had resisted switching over to using std::string because I was afraid of the potential code bloat the inclusion of std::string would cause. After lamenting to him about the many internal buffer overflows and related problems I'd been grappling with, Thomas Becker (a major contributor to the Proxy CL program) finally shamed me into dropping the C strings, and the subsequent version of the Proxy CL used std::string exclusively for text manipulation. This STL-ized version weighed in at around 200 KB, but I don't think anybody noticed. [9]. Even though 200 KB is over three times the entire system address space of the machines I'd cut my teeth on as an 8080 programmer, by today's standards a program of that size seems to be considered minuscule. In any case, not a single string-related problem ever arose within the Proxy CL program again. I deemed this a worthwhile tradeoff ... it was time to bid make_s's C strings farewell.
Fortunately, due to pointer/iterator duality, the logic of make_s's parsing loop remains essentially the same whether the variables are char pointers or iterators into a std::string. Whereas originally I had to compute the pointer to the end of the input text by adding the text length to the beginning pointer (Figure 2, line 15), std::string provides the past-the-end iterator I need directly via its end() member function. I take advantage of end() in the third version of make_s (Figure 3, line 19). At this point, it also finally dawned on me that there was no longer any need to manually replicate the input text. In fact, all the dynamic allocation code could have been stripped out back in version 2, and the only change I'd have had to make was to add the const qualifier to a few char pointer definitions. The fact I hadn't realized this just goes to show that to a C programmer, the association between text processing and dynamic allocation is deeply ingrained, indeed.
The third version of make_s still accepts a char * as a function parameter, but uses it only once to define the automatic std::string object (Figure 3, line 11) used for all subsequent processing. My original reason for leaving the in-text parameter as a char * in version 3 (rather than accepting a std::string and letting string's constructor handle actual parameters of type char *) was that a char * is cheaper to pass than a std::string ... but that would actually not be the case if the std::string were passed by reference. Even if the caller of make_s passes a char * as a parameter, it would work because the compiler would generate a temporary std::string object to be referenced by the formal parameter. There would even be a potential gain in efficiency, for if the caller maintained a single std::string object for use in multiple calls to make_s, a std::string construction would be avoided for every call. There would no longer any std::string construction within make_s, resulting in a net savings of one string constructor call for each use (beyond the first) of the same string in a call to the function. The penultimate version of make_s (Figure 4) thus accepts the text input parameter by reference-to-const-string, and that's the end of it: make_s has now been fully converted from using raw char pointers to using std::string.
There was just one more thing still troubling me. It was so subtle, that I thought of it once, forgot about it, and later remembered there was something else I could do, but not what it was. It took another week (in fact, until the morning of the day of writing this) before I remembered the issue: that pesky inner loop (Figure 4, lines 21-22). You're supposed to prefer the use of STL algorithms to explicit loops, right? So they say. As simple as this loop (to locate the first non-whitespace character) is, it's still a loop and thus might qualify for rewriting using a straight call to some appropriate STL algorithm. In fact it does, as illustrated in Figure 5 (lines 20-21) via use of the std::find_if algorithm, the std::not1 function adapter, and the std::ptr_fun pointer adapter for ordinary functions (to allow the application of isspace to the algorithm). Despite the perk of no longer needing to explicitly guard against the possibility of dereferencing an end iterator, does this conversion necessarily constitute an improvement? I'm not really sure. I'd have to characterize that as a religious issue. (But, I'd tend toward the non-STL solution. Call me a Luddite.)
[2] This source code archive may be downloaded from <www.bdsoft.com/resources/estlcode.html>.
[3] The complete container initialization library may be downloaded from <www.bdsoft.com/tools/initutil.html>. I say this package is "half-way" original because the structure of these template functions was first put forth by Musser et. al. in their book, STL Tutorial and Reference Guide, 2nd Edition (Addison-Wesley, 2001). InitUtil expands upon the make template presented in that book.
[4] Dan Saks. "Stepping Up to C++: Writing Your First Class," C Users Journal, March 1991.
[5] As Scott Meyers pointed out to me, specifying the value_type of the container shouldn't be necessary since make_s only creates containers of strings. However, that is an issue somewhat tangential to the point of this article; it and other design improvements resulting from pre-publication review (such as Robert Schwartz's suggestion to generalize the ',' delimiter between strings) have been incorporated into the distribution versions of InitUtil. As far as the versions of make_s shown here are concerned, what you see is what I wrote....
[6] Ultimately, through the use of partial template specialization, InitUtil evolved to the point where a single template function name, make_cont, would serve to initialize containers of both string and non-string value_types. This supercedes both make_i and make_s, even though the two cases are handled quite distinctly. The only compilers I own that can actually compile the latest version of InitUtil (version .92), however, are Comeau C++, gcc 3x, and Metrowerks CodeWarrior 8. Since many current compilers do not yet support partial template specialization and/or other ANSI/ISO C++ features that InitUtil v.92 depends upon, I'm continuing to distribute an alternate version that tolerates Microsoft Visual C++ 6 and .NET by dropping set/multiset support (so as not to require partial template specialization).
[7] push_back won't work for set and multiset, since those containers don't provide it. In the distribution versions of InitUtil, the insertion of values into a container is performed with the aid of a function object using push_back, and twin specializations (for set and multiset) both employing insert. Thus, all standard containers are supported orthogonally for the more Standard-compliant compilers (with the exception of map and multimap, which aren't currently supported).
[8] Leor Zolman. "An STL Error Message Decryptor for Visual C++," C/C++ Users Journal, July 2001, <www.cuj.com/articles/2001/0107/0107b/0107b.htm?topic=articles>.
[9] Of course, this wasn't an embedded system. As Dan Saks pointed out to me, such a jump in code size might not be as easily tolerated in that genre.