In his August 2003 column, Al Stevens attempted to explain the export feature of C++ templates. To clarify:
- Templates as macros. If you wanted to provide a one-sentence description of C++ templates to someone who is not familiar with C++, you could, as Al did in his column, say they are like macros that expand to class or function definitions. But like all one-sentence descriptions, it's not adequate.
- Unlike macros, templates are part of the C++ type system. You can partially specialize class templates. You can overload function templates. The compiler often is required to infer the types involved in creating a template instance. I don't think these characteristics of templates can be explained or understood as macros.
- Template function instances generated as static functions. Some implementations take this approach, but it violates the C++ language definition. A conforming implementation must ensure that exactly one copy of a needed instance exists in the entire program. You could take the address of the function in two different compilation units, and the addresses are required to compare equal.
- Al speculates that since the compiler cannot know how many times an instance might be generated in the whole program, it must generate them as static functions. He wonders if any smart linkers are available that would discard duplicates. In fact, making template instances global and avoiding duplicates is probably the most common implementation, apart from g++.
- Two approaches in common use: 1. Generate an instance as global, but mark it in a way the linker recognizes as "duplicates allowed, discard all but one." Borland C++ 1.0 (about 1988) had this feature for ensuring unique copies of inline virtual member functions. 2. Generate instances as global in a "repository," or "cache." Before generating an instance, see if it is already in the cache. When linking the program, get needed instances from the cache. The first C++ compiler having templates (Cfront 3.0, about 1993) used this method.
- The problem of duplicate instances was thus addressed by many compilers before the export feature was introduced in C++.
- The export feature. The problem addressed by the export feature is having a standard way to use template declarations without definitions, when definitions would otherwise be required. As in Al's example in the column, the declaration and definition can be declared with export. It then becomes the programmer's responsibility to provide the definition to the program somehow when a template instance is needed.
- Some people have the incorrect idea that export means you can ship template declarations with a template-based library product without providing the source code for the definitions. If clients are to be able to create their own template instances, the compiler must have access to the definition. The C++ export feature does not address this information-hiding problem.
- Some compilers that do not implement export provide a roughly equivalent feature by allowing the declarations and definitions to be separated, with the location of the definitions deducible by the compiler. The common usage is to put the declaration in a .h file and the definition in a corresponding automatically read .cpp file. This functionality is not standard, so programmers cannot use it portably, and some details of how template instances are created are not the same as with export. But for most programs, you could not tell the difference.
Stephen Clamage
stephen.clamage@sun.com
C++ Compiler Correction
Dear DDJ,
In my article "Comparing C++ Compilers" (DDJ, October 2003), a transcription error in Table 4 suggested Visual C++ 7.1 does not support some uses of typename, nor the uses of member template functions and member template constructors. Visual C++ supports them all; see Table 1. Thanks to Herb Sutter for the heads up.
I also used the wrong flags for the Watcom compiler in the Dhrystone test, in which it actually can be demonstrated to do a great deal better than I showed on that test. Thanks to Michal Necasek of the Open Watcom team for pointing this out.
Finally, I'd like to clarify the rationale with respect to testing speed of generated code: Standard advice has it that optimization for speed is meaningful for test/demonstration purposes, but that real applications should optimize for size since the better cache performance outweighs any specific localized advantages in the larger speed-optimized code. On this premise, I used the corresponding test-for-size executables in the test-for-speed comparisons. Being a library kind of chap, I'm interested in portability, so I elected to target P5, rather than anything more "modern."
Errata for this can be found at http://synesis.com.au/articles.html#errata.
Matthew Wilson
matthew@synesis.com.au
DDJ