Features


Building A Better Boolean With C++

Ron Burk


Ron Burk has a B.S.E.E. from the University of Kansas and has been a programmer for the past 10 years. He is currently president of Burk Labs, a small software consulting firm.

C++ continues to evolve as a set of incremental changes and additions to C. As a C programmer, you can begin to learn and use C++ in exactly the same way — by making incremental changes to your programming style and to the language features that you use. This article provides a simplified introduction to C++ data types by altering a C data type a step at a time to construct a better, C++ data type.

C++ offers many features that can be used when constructing new data types, but this article will focus on just two: data hiding and user-defined conversions. These two language features give you the ability to define C++ data types that have many of the same privileges as built-in data types such as int or float.

What Do You Want From A Data Type?

Different languages take different approaches to data types. On one end of the spectrum are typeless languages; for example, a BASIC interpreter may have a single string data type and automatically convert between numeric and string formats when required. On the other end of the spectrum are languages with strict type checking; Pascal, for instance, detects and disallows any attempt to use a variable that is not of the correct data type. C falls somewhere between these two extremes. It contains multiple data types (and allows the user to create more), but it also provides automatic conversions between certain data types.

C++ is stricter about type checking than C. For example, the following code fragment is legal C, but not legal C++:

main() {
    void f();

    f(45);
}
In C, you don't have to specify the number and type of arguments that a function requires; in C++ you must.

Even a simple data type such as Boolean can be implemented in a number of different ways. The best implementation depends upon your circumstances. For example, if you are making a data type for a project that three programmers will work on for four months, you will probably favor an implementation that is easy to use and simple to construct. On the other hand, if you are making a data type for a five-year project that will involve 100 programmers, you may be willing to give up some ease of use in exchange for stricter type checking so the compiler can catch more programmer mistakes.

It is in the larger coding projects, involving more than one person, that C++ holds a clear advantage over C. Why invest time carefully designing a Boolean data type if you never write programs more than a few pages long? As the type-constructing tools provided by C++ are contrasted with those of C, you will see how you can use C++ to tailor a type definition to your own unique needs.

Designing a Boolean Type

Although C does not contain a built-in Boolean type, the concept arises twice in the language. First, the result of a relational expression is defined in C to be of type int and equal to either one or zero. Second, C control statements consider any non-zero expression to be true. Therefore, it could be desirable for a Boolean type to have analagous properties. First, a Boolean variable should always be equal to either zero or one. Second, you should be able to assign any integer to a Boolean variable and the result should be one if the integer was non-zero; the goal is to construct a Boolean type which is compatible with the C type int. The following code, for example, should perform correctly:

bool func(){
bool more;
while (more=fread (/*args*/)){
      if(/*some condition*/)
          return more;
      }
}
The standard library function fread() reads data items from a file and returns the number of items successfully read. In this example, the caller is only interested in whether the number of items read successfully is zero (end-of-file) or non-zero. The code above can only work if the type bool is compatible with the normal built-in scalar types. Otherwise, assigning an int to a bool would require gyrations like this:

while(more=(bool)fread(/*args*/)){
/* or worse: */
while (more=(fread (/*args*/) !=0)) {
Of course, if you accidentally assigned an int to a bool in some program, the compiler would not complain. Choosing to make a Boolean type that is compatible with scalars exchanges some compiler error detection for ease of use.

typedef Does Not Create Types

Here is a possible implementation of a Boolean type in C:

/* bool.h */
typedef int bool;
One problem with this implementation is that typedef does not create a new, distinct type; it only creates a synonym for an existing type. In a big project, you might have more than one data type that is an int- compatible scalar, but they may have no relationship to the bool data type. Unfortunately, if they are created with typedefs as shown above, they will all be synonyms for int and the compiler will let you mix and match them without complaining.

Although typedef does not create a new data type, struct does. Here is an alternative implementation of bool in C:

/* bool.h */
typedef struct{int val;} bool;
This introduces a new type called bool which is incompatible with other data types. The compiler will object if you try to assign a variable of a different data type to a variable of type bool. Unfortunately, the compiler will also object if you try to assign an integer to your bool variable:

bool a = 1; /* error */
bool b.val = 1; /* ok,
    but painful to type */

User Conversions

Although a C struct creates a new data type, it does not have any facility for telling the compiler what other data types should be compatible with the new one. In this case, you want to be able to specify that an int can be converted into a bool, and a bool can be converted into an int. You could augment your C Boolean type with type conversion functions. The result might look like this:

/* bool.h */
typedef struct{int val;} bool;
bool booli(int i);
int ibool(bool b);
The definition of the conversion functions would be placed in another file:

/* bool.c */
#include <bool.h>
bool booli(int i) {
    bool ret;
    ret.val = i;
    return ret;
}
int ibool (bool b) {return b.val;}
Finally, you have a new data type, bool, which can be converted to and from integers. This C implementation has several problems, however.

First, this bool implementation is clumsy to use; the syntax is less than elegant. A typical use of this implementation might be:

bool b;
    b    = booli(x > 5);
    printf("b = %d\n", ibool(b));
Second, there is a problem with the structure member val. Each variable of type bool should always be either zero or one. So long as the programmer uses the conversion functions, all is well. Unfortunately, there is nothing to prevent the data type user from making mistakes such as this one:

int        l = 45;
bool      b;
/* set b to 1 */
    b.val = l;
In this case, the Boolean has been set equal to 45 because the letter "l" looks like the numeral "1".

Finally, the conversion functions create extra time and space overhead in the generated code. A function call is relatively cheap in C, but it should not be necessary for such simple conversions as these.

Converting To C++

You can repair these deficiencies by taking advantage of C++ language features. The first change is to remove the typedef:

/* bool.h */
struct     bool {int val;};
bool      booli(int i);
int        ibool (bool b);
In C++, unlike C, a struct declaration causes the structure tag name to become a new data type; in other words, the following code compiles in C++ but it doesn't in C:

#include <bool.h>
/* legal C & C++ */
struct bool b;
/* legal C++, not C */
bool b;
C++ also has a keyword, called private, that you can use to prevent the data type user from accidentally setting val to something other than zero or one. Consider the following header file:

/* bool.h */
struct bool {
private:
    int val;
    };
bool booli(int i);
int  ibool(bool b);
The private keyword tells the compiler that the structure members that follow it cannot be accessed from outside this data type. Thus, the following code becomes illegal:

include <bool.>h
...
bool a;
    a.val = 5;
/* error: val is private */
Of course, we've painted ourselves into a corner — now booli() and ibool() won't compile because they need to access val. Private structure members aren't much use without a syntax for allowing certain functions to access them. The easiest way to do that is with the friend keyword, like this:

/* bool.h */
struct    bool {
friend    bool     booli(int i);
friend    int      ibool(bool b);
private:
    int val;
    };
This makes booli and ibool friend member functions of the structure bool. Now the conversion functions will compile, since the compiler knows they are allowed to access the private members of bool. Since you control the functions that can modify a bool, you can guarantee that it only takes on the values zero and one.

You can make the connection between the conversion functions and the data type more explicit by making the conversion functions member functions of the structure. A member function of a structure has the same privileges as a friend member function, but it has a different invocation syntax and can only be used to operate on the data type with which it was declared. Changing the two access functions into member functions results in the following header file:

/* bool.h */
struct    bool {
    void  booli(int i);
    int   ibool ();
private:
    int val;
    };
Two things have changed: the friend keyword is removed, and bool is no longer passed to, or returned by, the member functions. Here is an example of legal invocations of the two functions:

#include "bool.h"
bool b;
b.booli(45);
int i = b.ibool();
As you can see, the syntax for calling member functions in a structure is analogous to the syntax for selecting data members in a structure. A member function is implicitly passed a pointer to the variable it was invoked with; that is why ibool() no longer needs an explicit bool argument and booli() no longer needs to explicitly return a bool value.

The definition of the two member functions must also change if they are not friend functions. The corresponding changes in bool.c are:

/* bool.c */
#include <bool .h>
void bool::booli(int i) {
    val = i;
}
int bool::ibool(){ return val; }
The first change results from the fact that the complete name of a C++ member function is typename::function, which distinguishes it from any non-member functions. This syntax allows you to use the same member function name in different data types without any conflict. The second change is that the functions can refer to the data members of the structure (in this case, val) as though they were local variables. This is possible because member functions cannot be invoked without some variable of the correct data type.

Of course, the user of data type bool still has to type things like b.ibool(), just to reference a Boolean; however, you now have hidden the implementation of the data type. If you ever discover a bool that has been set to something other than zero or one, you know the bug must be in the member functions. If you want to change val to be a char instead of an int, you can be confident that only the member functions will be affected by the change.

Data Type Initialization

Actually, there is a problem in the protection for bool variables; if the user doesn't initialize a bool variable, then it may contain garbage instead of a one or a zero. You can eliminate this loophole by telling the compiler that each variable of data type bool must be initialized when it is declared. This is done by declaring a special member function called a constructor. A constructor looks like an ordinary member function except that it has the same name as the data type and it has no return type. In this case, you can simply change the name booli() to bool() as follows:

/* bool.h */
struct    bool {
    bool (int i);
    int     ibool ();
private:
    int val;
    };
Creating a constructor for a data type has three main implications. First, the compiler will no longer allow a variable of that data type to be declared without being initialized. Second, if the constructor takes a single argument as this one does, the compiler will use the constructor whenever it sees a cast from the argument data type into the data type the constructor is a member of. Finally, if the constructor takes a single argument, it defines an implicit conversion from that argument's data type into the data type the constructor is a member of. These implications deserve further explanation.

Defining a constructor guarantees you won't have any uninitialized variables of a particular data type. In other words, with the newest header file, the following code won't compile:

#include "bool.h"
bool b; /* error */
The compiler will complain that the variable b must be initialized. The constructor also introduces a new syntax for initializing the variables of its data type:

#include "bool.h"
bool b(9);
In the statement shown, the function bool::bool() gets called to convert the integer 9 into a Boolean 1.

Defining a constructor with a single argument also enables the cast operator for that data type. In this case, the constructor will be called whenever you cast an integer into a Boolean as in the following example:

#include "bool.h"
bool b(1);
int i = 38;
...
b   = (bool) i;
In addition to the C-style type cast, C++ allows a function-style type cast. For example, in C++ you can say:

int      i;
long l;
l        = (long)i; /* legal C & C++ */
l        = long(i); /* legal C++ not C */
The function-style syntax is simply easier to read. This expands the number of ways you can initialize a bool to three:

#include "bool .h"
bool b(9);
bool c = (bool)9;
bool d = bool(9);
In all three cases, the function bool::bool() gets called to perform the conversion.

Finally, a constructor with one argument defines an implicit conversion. Just as you can assign an int to a long because of C's built-in implicit conversion, you can now assign an int to a bool, because an implicit conversion has been defined for it. In other words, you don't have to use the cast operator and the following code fragment is legal C++.

#include "bool .h"
bool b(1);
int       i;
    b = i; /* OK */
Now you have a Boolean that can be assigned an integer value which gets converted by the function you've defined.

There is no way to define a constructor function that takes a single argument without getting all three of these effects. This is a fact to ponder when you define a constructor for a data type. You can't tell the compiler to require the user to use an explicit cast to convert an integer into a Boolean. You can't tell the compiler to allow a Boolean to be initialized with an integer, but not assigned an integer. Sometimes, this forces you to avoid using a constructor and return to the explicit function notation used previously.

If you want all of your Booleans to be initialized, but don't want to have to type all those initializers, you can take advantage of a C++ feature called default arguments. Here is how it works:

/* bool.h */
stuct     bool {
    bool(int i=0);
    int    ibool ();
private:
    int val;
    };
Now, whenever the compiler would normally call bool::bool() but does not have an argument for it, it will use a value of zero. Thus, both of the following statements cause bool::bool() to be invoked with an argument of zero:

#include "bool .h"
bool b; /* initialize to zero */
bool b = bool();

Overloading The Cast Operator

Telling the compiler how to implicitly convert ints into bools is only half the job of making the two data types compatible. The remaining task is to replace bool::ibool() with a conversion function that tells the compiler how to implicitly convert bools into ints. If int were a user-defined type instead of a built-in type, you could define within it a constructor that takes a single bool argument and that would do the trick. Since that is not possible, C++ gives you another way to define implicit conversions: you can redefine how the cast operator operates on your data type. (In fact, you can redefine how most any operator operates on your data type).

Once again, the type conversion requires a member function, but this time it is an operator member function. This revised header replaces ibool() with a type conversion function:

/* bool.h */
struct    bool {
    bool (int i);
    operator int();
private:
    int val;
    };
The revised specification of bool tells the compiler you have defined a function that it can use whenever casting a bool into an int. The revised implementation looks like this:

/* bool.c */
#include <bool.h>
bool::booli(int i) {
    val = (i != 0);
}
bool::operator int(){ return
val; }
Having a function named bool::operator int() is a little disconcerting, but the code is otherwise the same.

Just as with constructor functions, the conversion function can be used either implicitly or explicitly. The following code, for example, is legal:

#include "bool.h"
bool b=1;
int i;
/*...*/
i     = (int)b;
i     = int(b);
i     = b;
In all three assignments, the function bool::operator int() is called to transform the bool into an int. There is no way to define a type conversion that must be used explicitly. Just as with constructors, when implicit conversions are undesirable, you must resort to something like the explicit member function call defined previously using bool::ibool().

The bool data type is nearly finished now. The original example of the desired usage of the data type is completely legal now:

bool func (){
bool more;
while (more=fread (/*args*/)){
     if(/*some condition*/)
         return more;
     }
}
You can freely mix bools and ints while still controlling initializations of, and assignments to, bools and still hiding the actual representation of a Boolean so that you can change it to short or char without impacting the data type user's code.

Efficiency

The final deficiency to remove from the bool definition is the function call overhead. In defining complicated data types in which the conversion functions contain a lot of code, the function call overhead would not be significant. In this example, however, the function call overhead is probably larger than the entire body of the conversion functions. You can remove that overhead quite easily by placing the function definition right in the data type specification. Doing that and adding some useful constants results in the following header file:

/* bool.h */
#define     FALSE 0
#define     TRUE 1
struct      bool {
    bool(int i){val = (i!=O);}
    operator int(){return val;}
private:
    int val;
    };
This style of function definition makes the functions inline, which means the compiler will try to generate inline code for them each time they are used, rather than call them as functions.

There is actually an inline keyword in C++ which you can place in front of any function you would like to be inline. The simpler the function is, the more likely the compiler can generate inline code for it. When the compiler cannot inline the code, it simply generates a normal function call. Functions that are defined inside data type specifications are automatically inline functions, so there was no need for the keyword in the above specification.

Now the bool data type is as efficient as anything you could implement in C to do the same thing. Each variable of type bool is guaranteed to be either zero or one at all times and you are free to change how the Boolean is stored and how it is converted to and from integers without affecting any other code. Unfortunately our current specification still looks a bit like a C programmer coded it; here is how a more fluent C++ programmer might write it:

// bool .h
class bool {
    int val;
public:
    bool(int i){val = (i!=0);}
    operator int (){return val ;}
    };
const bool FALSE=0,TRUE=1
For single-line comments, it's often easier to use the C++ comment convention //. The keyword class is just like the keyword struct except that all of its members are private by default, whereas in a struct all the members are public by default. C++ has both a private keyword and a public keyword and they complement each other. Finally, the #define constants are replaced with constants that have the type bool rather than type int. Now the specification looks more like "real" C++.

Making Incompatible Types

As mentioned, a large project might contain more than one data type that is compatible with ints. For example, a game simulation might contain a data type called playercount (a non-negative count of the number of players within a particular grid). It could be desirable for a data type like this to be compatible with ints just as bool is, but would code like this also be legal?

#include "bool .h"
#include "player.h"
playercount p;
bool      b;
    p    = b;
The answer, thankfully, is "no". Even if the compiler has been told how to implicitly convert a playercount into an int, and how to implicitly convert an int into a bool, the rule is that the compiler never uses more than one user-defined implicit conversion on a single value. Without this rule, the number of possibly unwanted implicit conversions would mushroom as you added each data type with a user-defined conversion.

If you really want a second user-defined conversion to be applied, you can code it explicitly:

#include "bool .h"
#include "player.h"
playercount p;
bool      b;
    p    = playercount (b);

Conclusion

C++ makes it easy to construct data types that isolate implementation details. Additionally, user conversions give you a degree of control over how your data types interact with other data types. Like many features of C++, it is possible to get into trouble defining new data types. In particular, the danger of user-defined conversions is that the compiler will perform silent conversions you did not intend. This is a very real problem when you construct multiple data types for a large project, each with its own conversion functions. However, these potential problems are outweighed by the advantages of being able to design a type system that suits the individual needs of your project.