Columns


The Learning C/C++urve: Building a Data Type in C and C++

by Bobby Schmidt


Introduction

In this series of articles, I develop a boolean data type, culminating in a full-featured C++ class. Those of you programming exclusively in C may be ready to chuck in the towel at the mention of "class." I urge you not to, for all the solutions this month apply to either language. Later C++-specific solutions illustrate design considerations you should still find useful in C, especially if you plan on porting to C++.

Booleans in C

One of the early surprises for many new C programmers, especially those versed in Pascal, Ada, or even FORTRAN, is C's lack of a built-in logical or boolean data type. The presence of logical keywords such as if, and logical operators such as !, makes this more surprising still. C accommodates logical keywords and operators by treating scalar expressions in these contexts as logical expressions, in which zero means false and anything else means true.

For example, in the code fragment

int i;
if (i > 0)
    {
    }

C renders i > 0 true if i is positive, and false if i is zero or negative. You can't discern from this example how the compiler internally represents false and true. However, if you capture a logical expression explicitly as in

int i;
int j = (i > 0);

the picture changes — you can now see the numerical values of false and true. Specifically, if i > 0 is false, j is set to 0; if i > 0 is true, j is set to 1, rather than to some arbitrary non-zero value.

In sum, when mapping a scalar to a logical, C turns 0 to false and everything else to true; when C maps the other way (from a logical to a scalar) false still translates to 0, but true translates to 1. As a side effect of this mapping, C logical expressions don't always observe boolean logic identities. For instance, the expression

i == !!i

is a tautology (that is, invariably true) by boolean logic; in C, the expression is true if and only if i has the value zero or one. Try plugging in different values for i and mentally "running" the code to see the effect.

Booleans in C++

C++ inherited (no pun intended) its original approach to logical operations from C. However, in November 1993, the C++ committees added a new builtin boolean type to C++'s Danish Army knife of standard features. This extension introduced only three syntactic changes, adding the keywords bool, false, and true. Semantically, the picture is more complex, and to me a little disappointing.

As I mentioned last month, bool is an integral type, freely miscible with other integrals and their operations. You may think of bool as the smallest-sized integral type, implemented as a single bit. This is a conceptual model; in actual practice, sizeof(bool) is implementation defined, so that sizeof(bool) may equal sizeof(long).

Perhaps the best analog for bool is char. Although char is an integral type, you typically don't assign integer constants to objects of type char; rather, you assign more human-friendly character constants. For example, in an ASCII environment, the statements

char c = 'A';

and

char c = 65;

net the same value in the variable c. The machine, living in the solution domain, sees no difference. But you, living in the problem domain, see a huge difference between the text concept 'A' and the mathematical concept 65. Similarly, you probably find the statement

bool b = true;

more compelling and intuitive than you do

bool b = 1;

Even though C++ now sports a bool type, you could still have good cause to create your own. Possible reasons:

As illustration of this last point, consider the C++ statements

bool b = "Oh, the Payne!";
int i = false;
if (++b > 5)
    i = -1 / true;

Although I have not yet seen a C++ compiler that implements bool — and thus can't call my own bluff — my interpretation of the Standard tells me these statements are conforming. The reason, as with many such C++ compromises, is backward compatibility with existing C and C++ code. But what existing code? After all, this feature never existed before — what code can be in conflict?

As it turns out, plenty. For years, many C and C++ projects have defined their own boolean types. Even Microsoft has made BOOL a standard Windows data type. In every case I've seen, these boolean types map into some integral type. The members of the C++ committees know this, and I have to believe such prior art heavily influenced their decisions. Unfortunately, this benefit of compatibility comes at the cost of lessened type safety.

Rolling a User-Defined Boolean Type

As counterpoint, I will incrementally develop a boolean type offering increased type safety. At each step, I will pick a specific weakness in the current implementation, identifying ways to solve it in the next step. The end product will be a simple yet quite useful boolean type which has advantages and disadvantages compared to the proposed C++ bool.

Along the way, I will exercise each candidate boolean with two suites of test cases. A test case is a set of declarations or statements meant to show how the particular boolean implementation interacts with a specific aspect of C or C++. Each test case is prepended with a comment describing what aspect of boolean behavior it tests.

Listing 1 shows the "acceptance" suite. I expect all test cases in that suite to compile without diagnostics. Those test cases that compile (meeting my expectation) I tag as "pass" in the resulting summary table; those that produce compile-time diagnostics I tag as "fail." Conversely, Listing 2 is a "rejection" suite, in which I expect all test cases to draw compiler flags (that is, I expect none of the cases to compile); those that are flagged by the compiler "pass," while those that are not flagged (i.e., those that compile) "fail." Each suite #includes booltest.h (Listing 3) , and conditionally compiles as either C or C++. In every instance, I include the particular boolean implementation under consideration into booltest.h before compiling.

These tests are neither exhaustive nor reduced. Rather, I intend them as a first-order metric, demonstrating how closely each boolean candidate matches my notions of what an ideal boolean type should be. You may not agree with some of those notions; even so, watching how I make my design decisions may well give you a new perspective of your own. I further believe test cases are essential for ensuring that the code walks the design's talk. While not intended as a tutorial in test development, this column series should demonstrate the insight such tests give into program design.

Finally, please note I am compiling with Microsoft C 8.0 (a.k.a Visual C++ 1.0) and MetaWare High C/C++ 3.11; if you use different compilers, you may get results that veer slightly from those in my tables.

First Attempt

As a start, I need a name for this new type. I'd prefer a lower-case name, to match the C++ practice of making types look builtin; but at the same time, I don't want to collide with the real builtin C++ bool. As a compromise, I'll call this type boolean.

I have less latitude with the constant names — lower-case true and false are already taken. These are the first truly predefined constants in either language, and seem to be setting a lower-case precedent. All C and C++ macro pseudo-constants (like EOF and NULL) are upper-case, though, so I'll go with FALSE and TRUE for my type boolean.

For my first attempt, I turn for inspiration to Microsoft's win- dows.h, finding

typedef int BOOL;
#define FALSE 0
#define TRUE 1

Changing the type name to boolean gives

typedef int boolean;
#define FALSE 0
#define TRUE 1

This is almost certainly the most common C implementation for booleans, dating back to K&R vintage. The selection of 0 and 1 is no accident; remember, logical false and true map to scalar zero and one.

Call this implementation of boolean Version 1. Include it in booltest.h, then compile the acceptance suite (Listing 1) and rejection suite (Listing 2) as both C and C++. I summarize my C results in Table 1A, C++ results in Table 1B. In general, I find this implementation of boolean to be type-unsafe, to wit:

In total, only 106 of 261, or 40%, of the test cases passed. The principal weakness here is the use of typedef. As discussed last month, typedef introduces not a new type, but a new name (boolean) for an existing type (int). In this example, the type names boolean and int are 100% interchangeable, so that anywhere you write int, you could just as easily write boolean. As a result, the compiler cannot restrict boolean to contexts where it makes sense; you must be self-regulating.

Macros vs. Constant Objects

Version 1 contains one additional problem, independent of its utility as a boolean type: the use of macros as constants. Macros suffer several limitations, including

Required in C, which has no genuine symbolic constants, macro constants are considered pass among the C++ cognoscenti, who instead favor constant objects. In contrast to the macro deficiencies cited above, such objects

Substituting constants for macros in Version 1 yields Version 2:

typedef int boolean;
static boolean const FALSE = 0;
static boolean const TRUE = 1;

Note that in C++ you can drop the static with impunity, as constant objects at file scope default to internal linkage. As before, include this version in booltest.h, and compile as both C and C++. Results appear as Tables 2A and 2B. The C results (Table 2A) are noticeably improved: all the acceptance tests succeed, while the rejection tests show a modicum of improved type safety (principally by disallowing some mixing of boolean woolens and scalar linens).

The C++ results (Table 2B) are clones of those from Version 1, with a big difference: FALSE and TRUE are now lvalues, meaning you can take their addresses, and even (via cast) change their values. However, I think the advantages of not using macros far outweighs this cost. Version 2 may not be a better boolean, but it is better C++.

Adding Type Safety

Other than providing the syntactic sugar of symbolic boolean names instead of int, 0, and 1, I have gained little with these first versions. I certainly have gained no type safety, as boolean is identical to int in every respect. In fact, I could write the Version 2 constants as

int const FALSE = 0;
int const TRUE = 1;

and have the same effect. This is too wimpy. I want a type that is distinct from all other builtin types, so that the language does care whether I use boolean or int, and it does matter that FALSE and TRUE are of type boolean.

One avenue that holds promise is enumerations, which have the appealing property of defining both a type and the symbolic constant values of that type. Further, while C considers enumerations just another integral type, C++ renders them as truly distinct types, offering stronger type safety. As example,

enum APPLE
    {
    };
enum ORANGE
    {
    };
enum APPLE apple;
enum ORANGE orange = apple;

compiles in C but not in C++, which considers the two types incompatible. Unfortunately, C++ subverts this type safety some by letting enumerations silently convert to integral types, as in

int strawberry = apple; /* OK in both C and C++ */

The definition for Version 3, using an enumeration, is

typedef enum boolean
    {
    FALSE,
    TRUE
    }
boolean;

In C++ you could use the simpler

enum boolean
    {
    FALSE,
    TRUE
    };

C++ automatically declares enum tag names in the "normal" type name space as well, while C requires an explicit typedef. C++ will accept, and ignore, the typedef for backwards compatibility with C code, so you can use either version for C++.

My results of running Version 3 through the test appear in Tables 3A and 3B. Perhaps surprisingly, C shows little difference between Versions 2 and 3. Because C enumerations are integral types, they offer scant type safety; they are, in effect, little more than glorified ints. The real benefit with C enumerations comes not in the type name, but more in the scoped enumerator constant names (replacing the nefarious unscoped macros, as discussed earlier).

C++ is another matter. I find the C++ Version 3 to be, almost without exception, superior to Version 2. For the first time, the passes (barely) outnumber the failures. New on the plus side

Unfortunately, there is one big new minus:

Version 3 also preserves the type-unsafe ability to assign from boolean to many scalar types, although the reason is different — before, boolean was really an int, while now it is an enumeration type that turns itself into integral types.

In an absolute sense, I consider the ability to assign from relational and logical expressions necessary in a robust boolean type. But relative to all the C++ versions we've explored so far, Version 3 is overwhelmingly the best choice, so much so that you could stop here and have a fairly useful boolean type that buys you clarity of expression and some type safety.

Tear Down the Wall

Were I programming exclusively in C, I'd have pretty much hit the wall here — I know of no way to more easily and transparently define boolean in C. Fortunately, I have C++ available, which offers as raison d'tre the ability to define new types, via classes, that act builtin. As I surmised in my introduction, you C programmers may be ready to bail here, but I encourage you to at least audit the rest of this class, especially if you plan to someday change majors to C++. Those of you already programming in C++ should keep taking the class for credit — next month we start exploring fundamentals of class design that will affect all non-trivial classes you create. o

Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 3543 167th Ct NE #BB-301, Redmond WA 98052; by phone at (206) 881-6990, or via Internet e-mail as rschmidt@netcom.com.