Non-Intrusive Container Initialization

Thorsten Ottosen

Introduction

It is often necessary to initialize data structures with constant data, and C++ is not by default as elegant as other languages in this respect. For example, in Perl one can say

@date = (8, 24, 70);
%fruit = ('apples', 3, 'oranges', 6);

to assign to an array or map, respectively. The array syntax is almost identical in C++, but since built-in arrays are not particularly handy, one will usually stuff data into an STL container [1].

Without library support, the best way to fill a container with constant data is to use an array and then use a constructor that takes an iterator range:

int ai[] = {1,2,3};
vector<int> v( ai, end( ai ) );
typedef pair<string, int> P;
P ap[] = { P("horse",1),
           P("cow",2) };
map<string,int> m( ap, end( ap ) );

where end is a user-supplied macro that returns a pointer one past the end of the array. The technique is a little more clumsy when we deal with initialization of a map. The goal of this article is to show how you can get this syntax instead:

vector<int> v;
v += 1,2,3;
map<string,int> m;
assign( m ) ("horse",1) ("cow",2);

by overloading operator+=, operator,, and operator(). It would have been nice to be able to use operator+= for maps, but that would remove the ability to use operator(). It might not seem like a big win, but initialization parts of our programs will be smaller and easier to maintain. The method is not just for STL containers, and I will show how easy it is to fit your favorite container into the framework. (Notice that technically this new syntax is not initialization in its strict meaning of construction; it simply means the initial setup of a container that must take place before the container is used.)

What is Wrong with Array-Initialization?

The goal of this framework is to provide the easiest and safest way to initialization in C++. Run-time overhead is not a great concern, but for most node-based containers, the two methods will give similar results anyway. To make array initialization a little handier I used end which could be a simple macro

#define end( a ) \
a + sizeof(a)/sizeof(a[0])

A better approach would be to use the end function that comes with array_traits in Boost [2]. The array_traits is safer because it is type-safe and avoids pitfalls with pointers to arrays (which can give false results). The clever array_traits template can deduce the size of the array and hence its end; unfortunately, the compiler must support class template partial specialization. (Have you ever thought that this must be the single most irritating missing compiler feature ever?) As a result, array_traits cannot be coded 100% portably, yet.

Another problem with array initialization is that it requires the container to have an iterator range constructor, which we cannot expect from all containers. If you do not have access to the source code, you cannot even provide such a constructor yourself, and therefore a non-intrusive solution is more attractive.

Implementation Overview

When the code

v += 1,2,3;

executes, the following takes place:

operator+= returns an assigner object;
the assigner object overloads operator, which is called;
operator, calls a free-standing function which stuffs the value into the container.

operator+= has been overloaded as shown in Listing 1. The first argument is the container and the second argument is the first value to insert into the container. Note that the second argument is specified as const typename C::value_type&, which means you can only use containers with this typedef. I could have relied on a small trait class that defined the value_type of the container, but that would inevitably lead to class template partial specialization, which is hard to make portable. Moreover, from a client’s perspective, there is little difference between specializing a trait class and overloading a function.

operator+= then returns an assigner<C> object that is responsible for appending all remaining values in the comma-separated list. The assigner class is shown in Listing 2; it stores a reference to the container and uses operator, to delegate insertion to the free-standing insert. Moreover, operator, returns a reference to this to allow chaining. Notice again that I use a second template argument instead of a trait class.

The actual forwarding to a container member function is done by a free-standing function (as opposed to a functor) because I can then use function template overloading instead of class template partial specialization to forward to the correct member function. Listing 3 shows the default insert function, which will work with all STL containers except container adapters [3]. Here I have used a second template parameter V to avoid hard-coding typedef information.

So far I have passed all non-mutable arguments as const references. You might think for built-in types that copying would be cheaper. I could have used Boost’s call_traits to define “optimal” parameter passing [4]. However, almost all code will be inlined and optimized away, and I could not measure any difference, so it is best not to include the extra header. (call_traits can still be important when dealing with loops and the reference-to-reference problem; this is not the case here.)

Stuffing Pairs into Containers

When I stuff pairs of something into a container (for example, a map or a vector of pairs), I could just do it like this:

typedef pair<int,int> P;
c += P(1,2), P(2,3);

Originally I had even imagined this syntax:

c += 1,2,  2,3;

but this syntax is not safe in the sense that it is hard to see what is mapped to (or paired with) what; the outcome could be that I forgot a value. Even though such a missing value can be detected the first time the code runs, it could be a problem if some piece of code has not been tested and therefore a compile-time error would be preferable (notice that the former syntax correctly prohibits this error). After discussing the problem on the Boost developer mailing list, a new syntax was suggested that delivers compile-time safety and improves the calling syntax so you do not have to use a pair typedef:

assign( c ) (1,2) (2,3);

The statement is simply an assignment of a list of pairs to the container c. At first the syntax might seem a little unnatural, but it is quite easy to get acquainted with it. assign returns an assigner<C> object like operator+=; In Listing 2 you can see how operator() is implemented; it is worth noticing that I forward to the same insert function as before. The operator has again been implemented as a template to minimize requirements on the container type; if I instead had relied on typedefs in the container, it would not have been possible to use assign with a vector of pairs (for example, there is no key_type typedef in vector).

Although the original intention was to stuff pairs into maps, the code also makes it easier to put any class with a two-argument constructor into, e.g., a vector since we do not need to mention the class or use a shorter typedef. (And providing support for constructors with more arguments is trivial.)

Extending the Framework

Enabling the syntax for a new container takes one or two steps. If the container does not define value_type, one must first overload operator+= and/or assign to provide the right types for the assigner class. Second, overload insert for the container. By using function template overloading, one relies on partial ordering of function templates to choose the right version. As an example, Listing 4 shows overloading for stacks. When implementing the overloaded function, it is important to specify as many template arguments as the container can take; if I had only specified V, then the function would only match stacks that use the default second template argument.

There is a little caveat to this extension mechanism: some compilers do not even handle partial ordering of function templates properly, and the biggest problem I had was the Microsoft compilers. However, by removing the default template function and implementing overloads for all supported containers, I was able to get the code to work (in general not a very comfortable solution).

Another problem with the extension is that I have to include headers in generic code; for example, the overload for stacks requires inclusion of the <stack> header. This is a problem because a substantial amount of compile-time is simply parsing of headers, and pre-compiled headers can easily be so large that the hard disk becomes a bottleneck to compilation [5].

Throw Headers Away!

It is actually possible to remove all inclusion of headers by using a scheme that can be described as intrusive tagging, but unfortunately, I could only compile it with the Comeau compiler. The “substitution failure is not an error” (SFINAE) concept is a language feature whose intended use is to facilitate function template argument deduction [6]. In short, SFINAE means that a C++ compiler can attempt instantiations of overloaded function templates that fails on one or more of them provided that a correct match is found. Consider the overloaded version of insert in Listing 5. Here I have exploited the fact that container adapters have a special typedef container_type. If A is a normal container, this version of insert cannot be instantiated, but due to SFINAE, the compiler just chooses the default version of insert. However, when A is a container adapter, both versions are instantiated; the compiler sees the new version as more specialized than the default and hence the new version is chosen [7]. Although this example is a bit pathological (since it relies on container_type to be special to adapters), a library designed from the ground up could make unique typedefs in all classes; this would enable algorithms to overload for different types without seeing the definition of the classes. Pretty cool, eh? (I reckon that forward declarations and partially specialized traits classes would give similar results, but with a lot more work.)

Discussion

The idea of an initialization library is not new and one could imagine other ways to achieve similar results. Leor Zolman’s STL Container Initialization Library initializes containers by extracting values from strings. The caveat with this technique is mainly type-safety.

During the discussion on the Boost developer mailing list, a scheme that provided true initialization via special initialization iterators was shown. However, if you need direct initialization to construct, for example, a static member, you can simply implement a function that returns the desired container.

Overloading operator, is sometimes viewed as a bad practice [8], but the approach I take is safe; the approach has been used with success in the Generative Matrix Computation Library and Blitz to initialize matrices [9], [10]. With little effort, I have extended the framework to stuff values into fixed-sized containers like matrices.

Acknowledgments

Thanks to Leor Zolman, Pavol Droba and the people at the Boost developer mailing list for feedback.

References

[1] The boost array class does make arrays a lot more convenient. See <http://www.boost.org/libs/array/index.htm>.

[2] Download the boost sandbox CVS repository to get the header.

[3] Thanks to Peter Dimov for pointing out this more generic forwarding; to begin with I naively specialized insert for several containers which required unnecessary inclusion of several headers.

[4] See <http://www.boost.org/libs/utility/call_traits.htm>

[5] <http://osl.iu.edu/~tveldhui/papers/techniques/>. I know the standard committee is considering a new header called <std> which would simply include all headers. This would make the standard library much easier to use and it would allow for an efficient use of precompiled headers. The downside to such a header would be its use with compilers that do not support precompiled headers.

[6] Steve Dewhurst’s Once, Weakly. November 24, 2002 article: “SFINAE Sono Buoni,” downloadable from <http://www.semantics.org/>.

[7] To be perfectly honest, I tried looking in the standard to get an explanation of why the new version is more specialized, but as Stroustrup once said “the standard is unreadable” — at least a little bit. For the interested, Graeme Prentice gave an explanation on comp.std.c++ (search google groups for “function template specialization deduction”).

[8] Scott Meyers, More Effective C++, Item 7, Addison Wesley, 1996.

[9] K. Czarnecki and U.W. Eisenecker, Generative Programming, Addison-Wesley, 2000.

[10] <http://www.oonumerics.org/blitz/>

Thorsten Ottosen holds a Bsc in Computing Science from Aalborg University, Denmark. He is currently writing a major in Computer Graphics and is a co-owner and part-time employee of Dezide (<www.dezide.com>), a company which specializes in trouble-shooting programs based on Bayesian-network technology. He can be reached at thorsten.ottosen@dezide.com or nesotto@cs.auc.dk.