July 2000/Uncaught Exceptions

C/C++ Contributing Editors

Uncaught Exceptions: Adventures of Aquaman

Bobby Schmidt

Bobby shows how to outsmart the compiler several different ways, even if he can't outsmart the stock market.

Copyright © 2000 Robert H. Schmidt

Three weeks after I rejoined Microsoft, the company's stock price dropped $30 below my stock-option buy price. In option-speak: I'm underwater, so far that my teammates call me Aquaman. When considering whether to accept my new job, I ignored the options; yet now that I'm surrounded by coworkers who get beeped when the stock moves, it's hard to stay detached.

During this same period, Apple's stock reached all-time highs, and the company announced its first stock split in more than a decade. Much market anticipation surrounds Apple's coming OS X and its new Aqua interface. That OS will make me a different kind of Aquaman in a home filled with Macs.

The twain of Microsoft and Apple do sometimes meet. Microsoft just released version 5 of Internet Explorer for the Mac. Supposedly this browser is highly compliant with W3C standards, whereas the Windows version is getting nailed in the press for being non-compliant. Not being terribly hip to HTML standards, I like IE 5 for its new interface. Perhaps subliminally, I've chosen an aqua-colored theme.

Mousetrap 2000

Q

Hi Bobby,

In the April issue of CUJ you report a possible portability problem in the LENGTHOF code I sent you some time ago. [The LENGTHOF macro returns the length of a previously defined fixed-size array. For example, if you define a as an array of ten ints: int a[10]; then LENGTHOF(a) returns 10. —mb]

While (as you found out) the problem is unlikely to occur in practice, it's easy to come up with an improved version of LENGTHOF. Basically, given that the size of the array is guaranteed, while the size of the struct may be padded, all you have to do is use the array instead of the struct as the return type of the template function.

Here it comes:
template<int N>
struct Sized
    {
    typedef char x[N];
    };

template<class T, int N>
typename Sized<N>::x &
   LengthofHelper(T (&)[N]);

#define LENGTHOF(x) \
sizeof(LengthofHelper(x))
You may notice a few changes with respect to the original version. Aside from the change on the return type, I renamed the template function to LengthofHelper to make its purpose clearer. Also, I realized that there is no need to provide an implementation for LengthofHelper, since it never gets called.

Thanks for pointing out the problem, and keep up the good work. — Carlo Pescio

A

For those tuning in late: I published Carlo's original LENGTHOF solution one year ago, in my July 1999 column. The "portability problem" in that solution came courtesy of Diligent Reader Branko Cibej, whose letter I published in my April 2000 column.

The old version uses a structure as the operand to sizeof. That structure contains a single char-array member object. Because structures can have internal alignment padding, the old version could report a total structure size larger than the array member's size.

Carlo's remedy: instead of using the entire Sized structure as the operand of sizeof, use a reference to the char-array's type as the operand. This works for three reasons:

Unlike structures, arrays have no internal alignment padding.
sizeof a type == sizeof an object of that type.
sizeof a reference == sizeof the referenced type.

Thus, LENGTHOF yields the size of an array of N chars, not the size of a structure containing such an array.

The new version does indeed seem immune to the portability problems I cited in that April 2000 column [1]. While those problems are "unlikely to occur," they apparently can happen. Diligent Reader Richard Lagerstrom sent me the following in email:

In the April CUJ "More Mousetraps" section you asked if anyone had seen sizeof struct three_chars greater than three. Yes, I have...The C++ compiler for Cray machines (64-bit words, non-character addressed) generates a multiple of eight bytes for char arrays.

Richard asserts that
struct three_chars
    {
    char x[3];
    };

cout << sizeof(three_chars);
yields 8 with the Cray compiler.

Round and Round

Q

Hi Bobby.

I discovered that the following program is perfectly legal:
void main()
   {
   cout <<
     "A new random number generator!"
        << endl;
   int a = a;
   cout << a << endl;
   }
I posted the issue to borland.public.cppbuilder.language, and they told me that the line int a = a is correct, because the declaration introduces the identifier a and then the right side encounters a valid symbol and uses it as the initialization argument. This was referred to as a change in the specs.

I fail to see how this spec is right. int a = a is an initialization and not an assignment. It is okay to introduce the new identifier a in scope, but to actually use it as part of its own initialization list is another thing. Identifiers shouldn't be allowed to appear in their initialization list just as they are not allowed to appear as part of their declarations.

What am I missing?

Thanks. — Fernando Cacciola

A

Before I address your real question, I must take this opportunity to trump a common belief.

In your example, main returns void. From the C++ Standard [2]:

A program shall contain a global function called main, which is the designated start of the program. This function shall not be overloaded. It shall have a return type of type int, but otherwise its type is implementation-defined.

That's it. Nowhere does the Standard permit main at global scope to return void. About the closest you can come is to declare main as returning int, but don't actually return a value:
int main()
    {
    }
The Standard covers this also [3]:

A return statement in main has the effect of leaving the main function (destroying any objects with automatic storage duration) and calling exit with the return value as the argument. If control reaches the end of main without encountering a return statement, the effect is that of executing return 0;

In my simple example, the program acts as if I'd really written
int main()
    {
    return 0;
    }
Not all compilers support implicit main returns. Microsoft's Visual C++ is probably the most popular offender here. When I'm writing examples specifically targeting VC++, either in CUJ or elsewhere, I always add an explicit
return 0;
to main.

Some compilers allow main to return void. Such allowance is a compiler-specific non-standard extension. Your compiler apparently supports this extension. I recommend that you avoid such extensions and always declare main to return int.

As for your real question: the declaration
int a = a;
is indeed "legal." In that declaration,
int
is the declaration specifier,
a
is the declarator, and
= a
is the initializer. At the point of the initializer, a is declared; the presence or lack of an initializer does not affect a's state as a declared entity. Further, because a is of built-in type, the statement
int a = a;
is tantamount to
int a;
a = a;
While both examples may appear nonsensical, they are allowed by the language rules. And while you may think that compilers can surely catch such mischief, consider
int a, b = a;
and
int a;
int &b = a;
b = a;
and
int &f(int &x)
    {
    return x;
    }

int a = f(a);
As you can see, even if the language rules were rewritten to disallow int a = a, they couldn't necessarily prevent the net effect of int a = a.

I think the language rules are actually okay in their current form. From 4.1/1 (edited for brevity):

An lvalue can be converted to an rvalue. If the object to which the lvalue refers is uninitialized, a program that necessitates this conversion has undefined behavior.

In your example, the a to the right of = is an lvalue acting as an rvalue. As that lvalue is uninitialized, the program behavior is undefined — just as it would be for any uninitialized value in this context. I prefer this approach over a specific exception for cases such as int a = a.

Match Game

Q

I'm attempting some template specializations in some code I'm writing. The enclosed program produces incorrect results upon running the program under VC++ 6.0:
ft(T const &)
ft(T const &)
f(char const &)
f(char const *)
The compiler correctly deduces the type of the parameters for the overloaded f function, but incorrectly for the ft function. The only difference between the two is that one is a template, the other is not. Borland's C++ Builder v5 gets it right.

Now, I could live without the pointer version of the function but for one necessity. This all arose out of my needing to specialize a template function for wchar_t and char types. When I tried to do the following,
template<class T>
void foo(const T &)

template<>
void foo(const T *)
the compiler complains at the point of specialization that no suitable template function exists to so specialize. To which I agree, and augment the declarations with:
template<class T>
void foo(const T &)

template<class T>
void foo(const T *)

template<>
void foo(const T *)
The compiler then gladly accepts the specialization, but then binds to the wrong version as shown above. If instead the declarations are like so:
template<class T>
void foo(const T *)

template<>
void foo(const T *)
then the specialization works, but I am then resigned to forever pass arguments to my template function as pointers. — Bob Beauchaine

A

I've modified and simplified your original program; my version appears as Figure 1. It demonstrates a subtle problem, one I underestimated at first glance: the rules governing argument matching and overload resolution differ between templates and non-templates.

ft is a function template, overloaded to accept either a T const & or a T const * parameter, where T is bound to the templatized type. f is a regular function, overloaded to accept either a char const & or a char const *. When the compiler sees
f(c);
f(p);
it correctly matches c to char const & and p to char const *. (c is of type char; p is of type char *.) When the compiler sees
ft(c);
ft(p);
you expect it to correspondingly match c to char const & and p to char const *. Yet you are finding that the compiler matches p to some instantiation of T const &.

If it's any consolation, I believe your compiler is wrong. The consolation is faint, however, because I think the compiler actually should find ft(p) to be ambiguous.

Function templates describe families of potential functions. The template
template<typename T>
void ft(T const &)
    {
    // ...
    }
lets you create a family of ft overloads such as
void ft(char const &)
    {
    // ...
    }

void ft(int const &)
    {
    // ...
    }

void ft(long const &)
    {
    // ...
    }
and so on. The template
template<typename T>
void ft(T const *)
    {
    // ...
    }
similarly allows a different family of functions. For the compiler to mint an ft instance, it must unambiguously match an argument's type against the parameter types possible in those families.

When you write ft(c) the compiler tries to specialize an ft function having a parameter type matching c's type (char). No possible specialization of the parameter pattern
T const *
matches char. However, the pattern
T const &
matches if T is replaced with char. The end result is the specialization
template<>
void ft(char const &)
    {
    // ...
    }
If you explicitly create such a specialization, you'll find that ft(c) calls it.

Now consider the expression ft(p). The compiler again tries to specialize ft, so that the resulting parameter type matches p's type (char *). The pattern
T const &
matches if T is replaced with char *. In addition, the pattern
T const *
matches if T is replaced with char. (This is apparently the match you hoped for.) Both matches are equally good [4]; as a result, your compiler should flag the ambiguity.

You can eliminate the ambiguity through explicit argument specification. The expression
ft<char *>(p)
instantiates
template<>
ft(char * const &)
    {
    // ...
    }
Likewise, the expression
ft<char>(p)
instantiates
template<>
ft(char const *)
    {
    // ...
    }
You can test my theory by defining those two explicit specializations, then tracing the two calls.

If you don't like explicit argument specification, you can still make the implicit argument deduction work. The easiest solution is the change p's type to char const *. Then the call
ft(p)
will directly match
ft(T const *)
with char substituted for T.

For a more generic solution, add a third function family:
template<typename T>
void f(T *)
    {
    // ...
    }
With such a template, the call
ft(p)
will directly match
ft(T *)
with char substituted for T. This is actually my recommended solution, as it solves similar ambiguity problems for other pointer types, and lets you tune the ft algorithm depending on its parameter's const properties.

Notes

[1] Although it is open to a different portability problem: dimensioning the array with an int. I recommend you declare the template parameter N to be of type ptrdiff_t or possibly size_t. To understand why, check out my April/May 1998 CUJ columns.

[2] Subclause 3.6.1/1-2.

[3] Subclause 3.6.1/5.

[4] At least according to my interpretation of the Standard's implicit-conversion rules, especially those in subclause 13.3.3.

Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also an alumnus of Microsoft, a speaker at the Software Development and Embedded Systems Conferences, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him on the Internet via BobbySchmidt@mac.com.