December 1993/Questions & Answers

Columns

Questions & Answers

Compiling C++ Templates

Kenneth Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++ language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member of the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@allen.com (Internet) and on Compuserve 70125,1142.
Q
I enjoy your column in CUJ; it is always insightful and often topical to something I have done or am doing. Although your column usually answers questions about standard C, your biography mentions that you also teach C++. In that vein, I hope you will address some of the finer points of implementing C++ class templates. I use the Borland C++ 3.1 compiler, which supports that extension. Unfortunately, Borland provides scanty documentation on the use of templates, covering only simple implementations of templates. The documentation does not explain whether the following difficulties I have encountered are due to my ignorance of the proper syntax or due to the compiler's deficiencies:
A. A potentially powerful way of creating generic objects is to nest templates; programmers can combine pre-defined class templates like Linkedtist<> and Queue<> to easily declare types such as LinkedList<Queue<char * > >. Yet, through extensive trial and error, I have discovered no straightforward way to provide an argument list to a nested template type's inner constructor. Listing 1 illustrates the problem.
B. A template class may declare nested classes within the scope of its declaration, as in Listing 2. The compiler acknowledges inline expanded bodies of subordinate class function bodies, but fails to recognize externally expanded bodies.
C. Template classes may take constant scalar arguments. The preprocessor #if statement supposedly takes any constant expression as an argument. Therefore, I expected conditional compilation based on the value of the template argument 1. Unfortunately, on instantiation the template in Listing 3 compiles the same regardless of its argument. Why?
Hiram A. Berry
Clearwater, Fl
A
I will pose your first complaint as a question: Why can't a program pass an argument to the inner constructor of a nested template?
A. Passing arguments to such a constructor creates compiler problems analogous to declaring a two-dimensional array while specifying only one dimension. Dan Saks suggests that the restriction against passing arguments to the inner constructor of nested templates is similar to the restriction requiring an array's element constructor to be a default (no parameter) constructor. An implicit constraint on this template requires the class whose template forms an array to be a type with a default constructor. Templates are not the perfect solution for every problem. Most templates do not work for every template parameter. (This limitation is called "constrained genericity.") For example, suppose you created a template for an absolute value function. The template might look like:
template <class X>
    X absolute_value(X value)
        {
        if ( value < 0)
            return -value;
        else
            return value;
        }
This template will work only with a class in which the < and - operators are defined. The C++ template specification does not specify this constraint. Rather, the constraint is implicit in the operations performed within the template code. The ANSI/ISO standard group is working on ways to explicitly state template constraints.
To restate your second complaint, within a template declaration, the compiler does not recognize externally expanded bodies of nested classes.
B. Your compiler may not have caught up with the current standards for nested templates and classes; or, it may have evolved at unequal rates in these two areas. Since the C++ draft standard is a slowly moving target, this kind of problem can be expected. For example, Borland C++ works fine with:
class Outer
    {
    class Inner
        {
        void function();
        };
    };
and a function header declared as:
void Outer::Inner::function()
   { ... }
The following template and its corresponding function definition compile okay:
template <class X>
class Outer
    {
    void function();
    };

template <class X>
    void Outer<X>::function()
    { ... }
But the following function definition does not compile okay:
template <class X>
int outside<X>::inside::value()
    {
    return i;
    }
C. Your third complaint described unexpected behavior of pre-processor directives combined with templates. You may mistakenly assume that the translation of a C or C++ program takes place all at once. Speaking to this commonly made assumption, the draft standard states the "as if" rules for translation, which separate the actions of the preprocessor and the compiler. In brief, the preprocessor finishes its substitutions on the source code before the compiler starts operating. In addition, this separation of preprocessing and compilation operations implies that the preprocessor might not understand the complete syntax of the language. In C, for example, you may be tempted to code:
#if sizeof(int) == 2
/* Do something */
#endif
but sizeof is a compiler operator, not a preprocessor operator. The preprocessor cannot evaluate sizeof; it leaves that task to the compiler. (To perform an equivalent operation that works, you can test the limits defined in limits.h.) This same condition applies to C++. In the definition
template <const int i>
i has no assigned value until the template is compiled. Since i is not a preprocessor macro, the preprocessor interprets i as 0, which makes "i == 0" a true condition. Therefore, the compiler always compiles the phrase "int val;". As a side note, C++ language developers introduced values as parameters to templates after Stroustrup published his original book, The C++ Programming Language. Stroustrup intended the arguments to be class types, but they now can be compile-time constant expressions, addresses of objects or functions with external linkage, or addresses of static class members. Most compilers require that constant expressions be of integral type (e.g. char, int, or long) when provided as template arguments.

Character pointers
Q
I've come across an interesting problem regarding pointers and character strings. I have two modules, module A and module B. Module A contains a global character string S and module B contains an external reference to the global string S. When I declare the extern statement in module B as:

extern char *S;
I can't access the data, but when I declare the extern statement as

extern char S[];
I can access the character string data. I've always thought that the name of a character string was a pointer to that string. Could you please explain this effect to me? I'm using MSC as my compiler.
Barry Ward
Burnaby, B.C., Canada
A
To begin my answer, I will rephrase one of your last statements: "the name of a character string is a pointer to that string." The array name of a particular data type evaluates, in most expressions, to the address of the array's first element. The type of the array name becomes pointer to data type constant. For example, in

char char_array[10];
char_array usually becomes, in an expression, type pointer to char constant. You cannot change the value of char_array. The elements of char_array require program memory space (10 bytes); the value of char_array as a pointer is not stored in program memory space. The value of char_array as a pointer is just a constant (within the scope of its declaration). In this sense, an array's name is often similar to a member name in an enumeration. Each member name has a value, but that value is not stored in any program memory location. When you declare a pointer, as in:

char * pointer_to_char;
then pointer_to_char takes up a small amount of memory — the number of bytes required to form an address. You can assign values to this variable, as in:

pointer_to_char = char_array;
This operation is equivalent to setting an integer variable to a constant value, such as:

int i; i = 10;
If you use char_array as an extern variable, then you must declare it as an array with the extern statement, as in:

extern char_array[10];
The compiler ignores the value in the brackets, because the extern keyword indicates that this statement is a reference, rather than the definition of an external variable. You could have declared char_array as:

extern char_array[];
With this declaration, the compiler treats each occurrence of char_array as a reference to the true definition, which will be resolved at link time. Suppose you used extern with a pointer, as in:

extern char * pointer_to_char;
Wherever the program references pointer_to_char, the compiler creates code that loads the contents of that variable and uses it as an address. The linker resolves pointer_to_char's address. (Note: Though the linker resolves char_array's and pointer_to_char's address relative to a certain base address, these addresses may not be final; if the resulting executable module is relocatable, the operating system will relocate those addresses when it loads and runs the module.) Why did the linker allow you to create a program that could not access a global string? Because the linker does not type-check for data types. For example, you could place the following in one module:

int i;
and in another module:

extern double i;
and the compiler will allocate two bytes (size of an int) for i; but all code that references i in the second module assumes that i is eight bytes long (size of a double). The linker matches up the definition in the first module and the reference in the second, perhaps without even complaining; but the program does not work properly. In your example, you define a variable as:

char char_array[10];
in one module and as

extern char * char_array;
in another module. Though the results may vary depending on your computer, in general, the second module will attempt to reference char_array's elements through the first two or four bytes (size of an address) of char_array's elements in the first module. (You probably did not attempt to change the char_array's values in the second module. Writing to what was probably a way-out-of-bounds address might have caused some problems worse than those you mentioned. To my knowledge there are no linkers that perform across-module data-type checking, although this feature could easily be added. You should use lint to perform this form of checking.
Your confusion about array declarations is understandable. Here is another feature of array declarations to muddy the waters: you can use either declaration form (array name or pointer) when referring to an array name passed as a function parameter. For example, suppose you have:

int int_array[10]; an_array_function(int_array);
then the function can look like either:

an_array_function(int int_array[]);
or

an_array_function(int *int_array);
In the former case, the compiler knows you are not passing an array. The name-with-brackets syntax implicitly declares the parameter as a pointer, as is done explicitly in the second case. I prefer the first notation if the function expects to access an array and the second notation if the function expects to access a single data item. However, the compiler interprets both notations the same.