September 1996/The Learning C/C++urve

Columns

The Learning C/C++urve

Bobby Schmidt

C->C++ Mutations, Part 4

In this installment, Bobby wraps up his series on porting C code to C++. Once again he shows that adapting to C++'s stricter rules is painful, but ultimately results in better code.
Copyright © 1996 Robert H. Schmidt

What you have in your hot little hands is the fourth and final exploration of C code that mutates behavior when ported to C++.

Type of Character Literals

In the Cretaceous era of Classic C (K&R C), everything seemed to be an int. Unspecified function return and object types were ints. Pointers were ints. shorts were ints (and often still are). longs were ints. floats and doubles weren't ints, but then, real programmers didn't use floating-point [1] . And while chars typically weren't ints, char arguments were promoted to ints, and lots of programmers mixed int and char like beer and pretzels. After all, who among us has never seen or written constructs like [2]
char index_to_letter(int const index)
    {
    return 'A' + index;
    }
Standard C has maintained much of the int legacy. Given this trend, you should be nonplused to learn that so-called "character" literals like 'A' are actually of type int. This leads to the interesting anomaly that, for an ASCII execution character set,
char c = 'A';
is morally equivalent to
char c = 65;
Even so, I encourage you to use 'A' here. 'A' is much more self-documenting, implying that you are thinking of the logical abstracted character 'A', and not the physical numerical implementation 65. Also, your code will port correctly to a non-ASCII environment, since 'A' will map to the appropriate encoding for the new target.

That 'A' is an int is typically invisible to you. The one place you may become aware of it is in statements like
if (sizeof('A') == sizeof(int))
    printf(
      "'A' is the size of an int\n");
else if (sizeof('A') == sizeof(char))
    printf(
      "'A' is the size of a char\n");
else
    printf(
      "'A' is of mysterious size\n");
This code, built as C, produces
'A' is the size of an int
The same code built as C++ produces
'A' is the size of a char
The reason, of course, is that character literals in C++ are truly of type char. Why this change from C? To better support function overloading, a concept foreign to C. Consider the two C++ overloaded functions
void spew(char const value)
    {
    printf(
      "the value is '%c'\n",
      (char) value);
    }
void spew(int const value)
    {
    printf("the value is %d\n",
           (int) value);
    }
If you call spew as
spew('A');
what do you expect to see? Were 'A' still of type int, you'd get the unintuitive result
the value is 65
Because C++ reckons 'A' as a char, you actually get the more useful
the value is 'A'
Where Types are Declared

You may think, rationally enough, that all C types must be declared in declarations. But C lets you declare types in expressions also, as demonstrated by the C snippet
void *p = (struct s {char c;} *) 0;
struct s x;
Here the new type struct s is declared in the type-cast expression, allowing us to define objects (like x) of that struct type. C++ disallows such type-declaring expressions. In C++, all types must be declared in declarations.

One possible C++ rewrite of the above is
struct s
    {
    char c;
    };
void *p = 0;
s x;
which not only changes where the type is declared, but also takes advantage of the tagged name s appearing in the untagged name space.

Vaulting Past Object Initializations

The apparently simple definition
purple_doofus Barney = 1;
has very different meanings, depending on source language. In C the statement (loosely) means

Reserve enough storage for a purple_doofus-sized object.

Name that storage Barney.

Set the initial value of that storage to 1.

In C++ the definition (again loosely) means

Reserve enough storage for a purple_doofus-sized object.

Name that storage Barney.

Call the purple_doofus constructor that accepts a single integer argument, passing 1 as that argument, and passing &Barney as the constructor's hidden this pointer.

Put another way, initialization in C sets a bit pattern, while initialization in C++ calls a constructor function. That C++ constructor may generate side effects (e.g., allocating resources) requiring collateral side effects (freeing resources) in the destructor. Were an object not properly initialized, use and destruction of that object could produce disaster.

Since C initialization calls no construction function, there are no such side-effect concerns. At worst, an uninitialized C object simply contains the wrong bit pattern, a problem easily correctable by assignment. This laxity allows C compilers to accept code like
void f(int const i)
    {
goto jail;
    if (i)
        {
        int const x = 1;
    jail:
        printf("x = %d\n", (int) x);
        }
    }
Because the goto skips x's initialization, you have no clue what will get written by the printf statement.

C++ expressly disallows such leaps past object initialization. Allowing them would induce run-time checks at object destruction, to see if the object's constructor had indeed been called, and would almost certainly lead to unpleasant object behavior.

No Returned Value From Function

Some C functions don't want to return anything useful. In Standard C these functions are prototyped to return void. Cretaceous era C didn't have void, so such functions were always declared to return something. Although their compile-time declarations suggested they had to return a value, at run time these functions didn't actually have to. This prevented programmers from needing to return a dummy value just to satisfy the compiler, when they knew the value would be unexpected and ignored by the caller.

Standard C retains this feature, allowing
int initialize(char *p)
    {
    *p = '\0';
    }
C++ has had void from the get-go; since there's no compelling reason to write functions like initialize, C++ does not allow this.

Storage Class Specifiers

C supports two storage class specifiers, extern and static [3] . The very phrase "storage class specifier" suggests that these keywords affect things occupying run-time storage. While objects of a particular type take up such storage, the types themselves do not. Nonetheless, according to the C grammar, storage class specifiers are valid (and ignored) in type definitions, giving rise to
static struct s
    {
    int i;
    };
The C++ grammar prevents such constructs. Among other things, preventing these constructs avoids the confusion that could arise between declaring a class member static, and declaring the entire class static.

Note that, while extern types are disallowed in C++, some C++ implementations hold the notion of exported types. For example, Microsoft Windows DLLs export entities they want to be visible to other applications or DLLs. Traditionally those entities have been objects and functions; however, with the rise of complex class libraries like MFC, DLLs also expose classes, which are conceptually compile-time entities. Exposing a class enables a system to share one copy of the library among all applications, instead of duplicating the class library object code in every client application. Such type exporting is non-standard and definitely non-portable across platforms.

Uninitialized const Objects

As bizarre as it may seem, C lets you define uninitialized const objects. That is, you can create an object in C with random values that you can never change, as in
double const pi; /* ooops, forgot to initialize */

double degrees(double const radians)
    {
    return radians * 180 / pi;
    }
C++, with its stronger object integrity, requires you to initialize a const object. That C does not require this is quite unfortunate. I don't know why C has this weakness; I'm guessing it's to help accommodate Classic C code ported to Standard C (as Classic C did not have the const keyword).

Type of enum

Classic C also didn't have enums. The quintessential work-around was a series of macros, such as
typedef int Roy_G_Biv;
#define RED 1
#define ORANGE (RED + 1)
#define YELLOW (ORANGE + 1)
#define GREEN (YELLOW + 1)
#define BLUE (GREEN + 1)
#define INDIGO (BLUE + 1)
#define VIOLET (INDIGO + 1)
Ick. As I explored in my January article (first of the Boolean series), macros are evil. Standard C enums are far superior:
enum Roy_G_Biv
    {
    RED = 1,
    ORANGE,
    YELLOW,
    GREEN,
    BLUE,
    INDIGO,
    VIOLET
    };
Unfortunately, Standard C leaves a gaping type hole in its enum support. In C, enumerators (constants like RED, defined in an enum definition) are of type int, meaning enumeration objects can be initialized and assigned from int:
enum Roy_G_Biv chroma = INDIGO; /* OK */

enum Roy_G_Biv colour = 6;  /* also OK; identical
                               to above example */
As I'm sure you've guessed, C++'s type checking does not allow this. Enumerators are of their defining enum's type, meaning enum objects can be initialized and assigned only from the same enum type. If you are porting C code, you can throw in a type cast like
Roy_G_Biv colour = static_cast<Roy_G_Biv>(6);
to get around C++'s stronger rules. I find this to be a hack. Enumerations are generally supposed to be abstractions. If you find yourself relying on, or even being aware of, the underlying encoding for an enumerator, you may want to revisit your design.

Another way your ported code may break is if you assume the size of an enum. In C,
sizeof(INDIGO) == sizeof(int)
since enumerators are ints. In C++
sizeof(INDIGO) == sizeof(Roy_G_Biv)
since enumerators are of their defining enum's type. The size of an enum is implementation defined, and it may well be that ints and enums are the same on your compiler. This is not guaranteed, however, so in general portable code should not make this assumption.

Initializing char arrays

This next one helps fix the sort of bugs that drive programmers to become yak herders. Consider my self-aggrandizing program
#include <stdio.h>

typedef char const name_part[7];

struct name
    {
    name_part last,
    name_part first;
    };

struct name moi = {"Schmidt", "Robert"};

int main(void)
    {
    printf("CUJ's coolest dude is %s %s!",
            (char *) moi.first,
            (char *) moi.last);
    return 0;
    }
This code compiles and links as C, but when run on my Mac gives the less than desirable result
CUJ's coolest dude is Robert SchmidtRobert!
Who is that, the second cousin of Bond JamesBond? No, it's the result of a subtle bug that snuck by C. C++ is smarter [4] , refusing to compile the code as written. If I change the definition of part_name to
typedef char part_name[8];
the code builds cleanly in both languages, yielding the much more reasonable and eminently accurate
CUJ's coolest dude is Robert Schmidt!
The problem was that the last array was too short to hold its initialization value: last was seven characters long, while "Schmidt" is eight (the seven letters plus the terminating '\0' byte). The letters made it on board, but that final '\0' byte got left at the altar. Because printf didn't find the terminating byte in last, it sailed past the end of that array, marching right into the immediately following storage (first).

C++'s semantics require that a char array be defined long enough to contain its initializer. The first version fails that requirement, leading to the compiler diagnostic.

Note that your results for the incorrect C version may differ from mine. For example, a compiler could align such objects on four-byte boundaries, so that up to three extra bytes of padding existed between last and first.

It's Miller (Freeman) Time!

These past four months, I've highlighted all the big changes, and most of the little ones, that you'll encounter porting valid Standard C code to C++. Oh, to be sure, there are a few other minor examples; but they are so turgid and dull, you would risk slipping into a coma upon their mention. The full canon of such mutations appears as Appendix C of the Standard C++ Working Paper, and in fact I've based some of my examples on those given in that Appendix. By the time you read this, the committees may have foisted a new version of the Working Paper upon the world; if so, Appendix C could deviate some from what I've covered.

Erratica

As I write, diligent reader Dan Saks is in town for the Seattle leg of his North American rock tour. Over a tasty evening repast at the downtown Bellevue Red Robin, we discussed one of my topics from last month: tagged vs. untagged name spaces. Dan brought up a subtlety which I'd failed to mention, but want to touch on here.

Two of the points I raised in the column are that

C++ implicitly declares tagged type names in both the tagged and untagged type name spaces.

The untagged name can be hidden by other declarations for the same name at the same scope, leading to the name being used simultaneously as a type and non-type.

In C, to achieve point 1 above, you must use an explicit typedef. This has the side-effect of avoiding point 2. Thus, the C typedef serves two distinct purposes: it puts the name in the untagged space, and prevents that name from being trumped by another declaration.

Because C++ gives you point 1 for free, you don't need the typedef. If, however, you want to avoid point 2, you must use a typedef anyway. Moral: In C++, there are still reasons to usetypedefs, even though that use may appear redundant.

Dan says colloquially that C++ has one and a half name spaces, since the implicit untagged name space entries are so easily masked. Mister Twister, a.k.a. Marc Briand, CUJ's Managing Editor, suggests this tagged/untagged business would make great fodder, er, subject matter for a future column. I dunno ... I was instead thinking of a series on abstract declarators. Can't get enough of that sexy topic.

Notes

[1] Not to be confused with the Evergreen Point Floating bridge, the proverbial weak link in the Seattle area's freeway chain.

[2] This function makes gross (but often correct) assumptions about its target character set. For demonstration purposes only. Cars driven by professional drivers on a closed course.

[3] See my December '95 column for an exegesis on these specifiers.

[4] (Sung to the tune of the old calypso song): Let us put C and C++ together and see which one is smarter. Some say C, but I say no, C++ got C like a puppet show. Ain't me, it's WG21 that say C is leading C++ astray. But I say it's C++ today smarter than C in every way ...

Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also a member of the ANSI/ISO C standards committee, an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 3543 167th Ct NE #BB-301, Redmond WA 98052; by phone at (206) 881-6990, or via Internet e-mail as rschmidt@netcom.com.