Instantiator: A (Mostly) Portable Framework for Separate Compilation of Templates

By Eric Niebler

Like many other diehard C++ fans, I was excited when I first heard about the export keyword. Finally, I would be able to compile my templates separately! I could move all my implementation details out of my headers and bury them out of sight. My compiles would be faster, my header dependencies would be simplified, and I could ship my templates as a library without the source code. I simply had to wait for my favorite compiler vendor to implement this magic keyword, and then all my template troubles would be gone. Then reality intruded. It was Herb Sutter who dispelled my misconceptions in his series of CUJ articles entitled, “Export Restrictions.”[1] Export, he explained, is not the cure-all for which I had hoped. It doesn’t greatly reduce dependencies, make compiles significantly faster or let me ship templates without the sources. It seemed like my template problems were here to stay.

It turns out that there is a way to achieve separate compilation of templates in a way that speeds up compiles and reduces dependencies. There are two ways, in fact. One way, explicit instantiation, comes with its own set of headaches. The other way is to use the Instantiator framework. Instantiator lets you put your template implementation into a separate compilation unit, giving you the benefits of explicit instantiation without its limitations.

But first, a disclaimer: Instantiator is a great big hack. I freely acknowledge it. It relies on some rather implementation-defined behavior. However, I have verified that it works unmodified on three of the most compliant compilers I could find [2], and as hacks go, I’ve found this one to be particularly useful. You may, too.

The Problem

I had a mountain of string manipulation code. It searched strings, and sliced and diced them, and it was all template-based, handling std::string as easily as it handled wchar_t*. I was having a problem, though. Since it was all written using templates, I had to include the full implementation in order to use the code - all 6000 lines of it! It was killing my compile times. I reluctantly made the decision to move the implementation into a separate compilation unit and explicitly instantiate it, more or less as follows:

template class foo<string::iterator>;
template class foo<wstring::iterator>;
template class foo<char*>;
template class foo<wchar_t*>;

It didn’t compile, and I learned my first hard lesson about explicit instantiation. It forces all member functions to be instantiated. Consider what happens if the foo template has a member like:

template<class Iter>
class foo
{
public:
    int length( Iter iter )
    {
        return std::strlen( iter );
    }
};

You’ll notice that the length() member will only compile if iter happens to be implicitly convertible to const char*. That’s not a problem, because members of class templates are only instantiated when they are used, right? Not so if you use explicit instantiation. When the compiler sees template class foo<wchar_t*>; it immediately tries to instantiate all of foo’s members, including length(). It will make some noise about not being able to convert a wchar_t* to a const char* and then give up.

If you want to use explicit instantiation, you have to be very careful that all your members compile successfully with all of your template parameters. This is not always possible, or even desirable. Some library developers intentionally write member functions that won’t compile with certain template parameters to prevent people from using them incorrectly. This technique doesn’t mix well with explicit instantiation [3].

I grudgingly hacked up my code to make it instantiate correctly, and things worked, more or less. My template’s implementation was compiled in a separate file and my compile times dropped dramatically. I was happy, until I tried to port my code. The new compiler was giving me a strange warning to the effect that it was illegal to explicitly instantiate a template twice with the same template parameters. After much head-scratching, I figured out it was complaining about this:

template class foo<string::iterator>;
template class foo<char*>;

On the new platform, string::iterator was a simple typedef for char*. On my old platform, it wasn’t. The new compiler was warning me that I had stumbled into a restricted area. According to section 14.7/5 of the ISO/ANSI C++ standard, you can only explicitly instantiate a template once with a given set of parameters. You can see my dilemma as a template library developer — how am I supposed to know if string::iterator is a simple typedef for char* on my users’ platforms?

The more advanced template hackers out there may be thinking, “I can tell if two types are the same with a simple compile-time type trait.” But all of the template meta-programming in the world won’t help you here. Explicit instantiations must appear at namespace scope, and you can’t wrap a whole namespace in a template. This limitation is a double-whammy. Not only can you not use meta-programming to work around the double-instantiation problem, but you also can’t use meta-programming to make any compile-time decisions about which types to instantiate. Ouch!

The Solution — Instantiator

At this point, I’m pretty much fed up with explicit instantiation. It’s an unwieldy tool with artificial limitations. What I need is a way to move some members into a separate compilation unit and instantiate only those. And I don’t want it to fall over if I accidentally instantiate on the same type twice. And I want to be able to use meta-programming to pick my instantiations at compile time. What I came up with is Instantiator. Here’s how it works. First, in the header file, I declare my template as so:

// foo.h
#include "instantiator.h"

template<class T>
class foo
{
public:
    foo() {}
    ~foo() {}
    int bar();
    int baz();
protected:
    DECLARE_INSTANTIATOR((
        &foo::bar,
        &foo::baz))
};

The constructor and the destructor must be defined in the header, but everything else can be moved to an implementation file. I use the DECLARE_INSTANTIATOR macro to list the methods that I would like to be instantiated. When expanded, DECLARE_INSTANTIATOR becomes this:

static instantiator instantiate()
{
    return instantiator_helper(
        &foo::bar,
        &foo::baz);
}

And instantiator and instantiator_helper() are defined as follows:

typedef int instantiator;

int instantiator_helper( ... )
{
    return 0;
}

The basic idea is to create a method, foo::instantiate(), such that when it is instantiated, it causes the implicit instantiation of foo::bar() and foo::baz(). That is accomplished by taking the address of the members and passing them to the instantiator_helper() function. Here’s the trick - instantiator_helper() is a vararg function. This serves two purposes. The obvious reason is so that it can accept an arbitrary number of parameters, allowing it to instantiate a bunch of members all at once. The trickier reason is to foil the compiler’s optimizer, and here is where we get into the murky realm of implementation-defined behavior. Simply taking the address of a member isn’t enough to instantiate it. The compiler may see that the resulting pointer is unused and decide to do nothing. But to most compilers, vararg functions are like the Great Wall of China. Its optimizer can’t see past the “...” to tell whether the resulting pointer is used or not.[4] To be safe, it performs the implicit instantiation and tosses the address over the wall to instantiator_helper(), which simply ignores it.

The implementation and instantiation look like this:

// foo.cpp

template<class T>
int foo<T>::bar()
{  // do bar stuff
}

template<class T>
int foo<T>::baz()
{  // do baz stuff
}

namespace
{
typedef typelist<int,long> types;

instantiator foo_inst =
    instantiate< foo, types >();
}

The typelist [5] called “types” contains the types on which I would like to instantiate foo. The instantiate() function takes foo and types as template parameters and does the instantiation for me. I don’t really care what this function returns, but I have to assign it to something to keep my program well formed, so I assign to the dummy variable foo_inst.

Conceptually, the instantiate() function is quite simple. For each type in the typelist, it invokes foo<type>::instantiate(), causing bar() and baz() to be instantiated with that template parameter. Since typelists are recursive data structures, instantiate() is implemented recursively. It calls foo<head>::instantiate() followed by a call to instantiate<foo,tail>(). Actually, instantiate() can’t call foo<head>::instantiate() directly because it is protected. Instead, it uses a helper class, which inherits from foo to get access to the protected member.

Already, Instantiator meets many of the goals I set out for it. I can use it to instantiate only the methods I specify, unlike explicit instantiation, which gives me all or nothing. And it doesn’t matter if I instantiate multiple times with the same template parameters since the instantiation is implicit. Finally, the instantiate() function accepts a typelist containing the types on which to instantiate. Typelists can be manipulated at compile time, which means I can use my bag of meta-programming tricks to decide at compile-time on which types to instantiate.

That’s swell, but I feel compelled to tell you a dirty secret. Instantiator is non-standard, and not just because of the vararg trick. As Daveed Vandevoorde tells me, “According to the letter of the standard, every non-exported template must be defined in every translation unit that contains a point of instantiation for that template.” In English, that means that if you’re not using export, you need to include the full implementation of your template in every file that uses it. With Instantiator, only one translation unit sees the actual implementation. All the others merely get the template’s empty shell, and a vague promise that it will be filled in later. So does this concern me? Yes, but not too much. All the compilers and linkers I’ve tested accept this code without a complaint. In fact, it’s an artifact of the way modern compilers and linkers work. For the compiler, unresolved references are business as usual; it usually lets the linker worry about that. And linking object files is like a big game of Go Fish. If the linker sees an unresolved reference in one object file, it asks the other object files, “I’m looking for a foo<int>::bar(). Does anybody have one?” If foo.obj has one, it’s not likely to tell the linker to “go fish” just because its foo<int>::bar() was instantiated implicitly. It’ll say, “Sure, I got one. Here it is,” and the linker will gladly accept it. The above description should be taken with a big grain of salt. The standard is silent about how compilers and linkers work together to build executables, and existing practice varies widely. I am not aware of any compiler/linker implementation that will catch this slight of hand.

Generalize, Generalize, Generalize!

The Instantiator framework, as described so far, is pretty weak. It only works with templates that take one parameter. The publicly available version handles templates with up to 3 parameters. The trick is to provide overloads for the instantiate() function that handle class templates of different order. The unary version of instantiate() used in the example above has the following prototype:

template<
  template<class> class T,
  class TypeList>
instantiator instantiate();

Notice the use of the template template parameter T. It matches anything that is a class with one template parameter (e.g., foo in the example above). To handle templates with two template parameters, I provide the following overload:

template<
  template<class,class> class T,
  class TypeList1,
  class TypeList2>
instantiator instantiate();

It takes a class with two template parameters and it instantiates T< TypeList1::Head, TypeList2::Head >, and then recursively calls instantiate< T, TypeList1::Tail, TypeList2::Tail >(). You might prefer a version that instantiates every combination of types from the two typelists. There is a separate function called combinatorial_instantiate() that combines them. As an aside, you may be surprised to discover that you can have a set of overloaded functions, each of which take no arguments. Usually the compiler uses the arguments to do the overload resolution. In this case, since there are no arguments, the compiler must match the template parameters to choose the correct overload.

Finally, there is the sticky problem of how to handle templates like std::basic_string. Consider std::basic_string’s prototype:

template<
  class CH,
  class TR=char_traits<CH>,
  class AL=allocator<CH> >
class basic_string;

The second and third template parameters depend on the first template parameter. In general, the Nth template parameter can depend on any or all of the first N-1 template parameters. It would be nice to be able say “instantiate< std::basic_string, typelist<char,wchar_t> >” and have the compiler fill in the other template parameters for me based on their defaults, but this won’t work. Instead, the Instantiator framework lets me instantiate templates like basic_string as follows:

template<class CH>
struct default_char_traits
{
  typedef std::char_traits<CH>
    type;
};

template<class CH,class TR>
struct default_allocator
{
  typedef std::allocator<CH>
    type;
};

instantiator string_inst =
  instantiate<
    std::basic_string,
    typelist<char,wchar_t>
    default_char_traits,
    default_allocator > ();

Before calling instantiate(), I define helper classes default_char_traits and default_allocator. These classes are used to generate the 2nd and 3rd template parameters from the first. Then I use the helper classes in my call to the instantiate() function. The Instantiator framework provides an overload of instantiate() to handle this situation. Its prototype looks like this:

template<
  template<
    class,class,class> class T,
  class TypeList1,
  template<class> class Tr1,
  template<class,class> class Tr2>
instantiator instantiate();

This ugly template mess takes a bit of explanation. The first template parameter accepts templates with 3 arguments, like std::basic_string. The second template parameter is an ordinary typelist. The third template parameter is another template template parameter. It accepts my default_char_traits helper class. Likewise, the fourth accepts my default_allocator helper class. In pseudo-code, instantiate() uses this information to create instantiations as follows:

// Instantiate! (in pseudo-code)
typedef TypeList1::head  H1;
typedef Tr1<H1>::type    H2;
typedef Tr2<H1,H2>::type H3;
T<H1,H2,H3>::instantiate();
// Recurse!
typedef TypeList1::tail  Tail;
instantiate<T,Tail,Tr1,Tr2>();

In this way, you can use the Instantiator framework to instantiate rather complicated template classes.

Summary

It would be nice if export did all that I had hoped it would. And it would be nice if explicit instantiation were more flexible and accommodating. Perhaps in C++0x things will be different. But unfortunately, the language we have to work with today comes up a bit short in the area of separate template compilation. It is in this context that the Instantiator framework looks like an appealing stopgap. It provides a great deal of power and flexibility in creating template instantiations. Though not 100% portable, it falls into that category of hacks which are “portable enough” to be useful in the real world. I have successfully used Instantiator to improve compile times and reduce header dependencies in my projects. I hope you will, too.

Acknowledgements

I would like to thank Rani Sharoni for his valuable feedback on early Instantiator designs, and Andrei Alexandrescu for his suggestions and encouragement. Finally, I would like to extend a hearty thanks to Daveed Vandevoorde for pointing out where Instantiator is non-standard, and for proofing an early draft of this article.

References and Notes

[1] Sutter, Herb. “Export Restrictions,” parts 1 and 2, C/C++ Users Journal, September / November 2002.

[2] The Instantiator framework has been tested on GCC 3.2, Microsoft Visual C++ .NET 2003, and Comeau 4.3.

[3] Rather than explicitly instantiate the whole class, you could explicitly instantiate each individual member function except those which don't compile. Although this could be made to work, it leads to an explosion of complex and repetitive code which is difficult to maintain. The Instantiator framework eliminates this complexity.

[4] Daveed Vandevoorde tells me that the HP aC++ compiler actually does see through the “...” in vararg functions at its highest optimization level. For such a clever compiler, a more sophisticated mechanism would be needed to flummox the optimizer.

[5] For a discussion of typelists, see Andrei Alexandrescu’s February 2002 CUJ C++ Experts’ Forum article, available online at www.cuj.com/documents/cujcexp2002alexandr/. The code for Instantiator comes with a rudimentary typelist implementation.

Eric Niebler studied Computer Science at the University of Virginia. He spent several years in the Windows 2000 group at Microsoft before moving to a development position at Microsoft Research in the Natural Language Processing group. He is now a library developer in the Visual C++ group. His interestes include data structures and algorithms; compiler, language, and library design; data serialization and persistence; and pattern matching. He can be contacted at ericne@microsoft.com.