Introduction
This month I continue evolving type boolean, partially as a functional replacement for the builtin C++ type bool, but principally as an excuse to explore the mysteries of designing a type that's robust and looks like a builtin. My belief system surrounding such design can be captured by the following thought problem.
Pretend your compiler did not support int as a fundamental type. If you were to try synthesizing int, which behaviors of the real int would you need to mimic? What is the domain of syntax and semantics enveloped by the word int? What makes int act like a builtin? These questions are deceptively simple, for the language predefines so many int aspects that we are typically not conscious of them all.
I choose int for this thought problem because you almost certainly know that type well. In practice, synthesizing builtin int is too large a problem, so I use builtin bool as a type guinea pig instead. Regardless of the type, the questions and their answers remain the same. With each new boolean version below, I'll mention some of these answers as we discover them.
The Wall Comes Tumbling Down
Last month ended with boolean Version 3, implemented as
enum boolean { false, true };I said then that in C we had hit a wall I knew of no simple way to implement a better boolean type in C. I further suggested that creating a C++ boolean class could let us tear that wall down. Probably the simplest way to turn Version 3 into a class is by wrapping the existing implementation, as in
class boolean { public: enum { FALSE, TRUE }; };Unfortunately, essential expressions like
TRUEno longer work; TRUE is within the scope of boolean, requiring scope resolution:
boolean::TRUE
Such usage goes against programmers' experience, expectations, and desires. After all, imagine the fun if char were a class type similarly defined, requiring// kids, don't try this at home char c = char::'a';or, worse, something like
// the horror, the horror ... char *s = char *::"DAP";
To remedy this, move the enumeration outside the class, givingenum { FALSE, TRUE }; class boolean { };To the earlier questions about type design, we now have the first answer, which I'll call Abstract Data Type (or ADT) Rule #1: For a type to look like a builtin it must not require scope resolution.
If I Only Had a Brain
Even though FALSE and TRUE are directly available, the boolean object as defined above doesn't have brains enough to remember which of those values it has taken on. This brain needs only one neuron, in the form of a scalar data member. The specific type is largely irrelevant; to minimize storage, I'll choose char. And of course, good encapsulation dictates the data member not have public access[1] .
Naming the data member, however, is a point of some minor religion. Microsoft, for example, would have us prepend all data members with m_. There is no single right answer here; the solution is largely a matter of taste. For this exercise, I'll settle on the unimaginative name value_, establishing my somewhat arbitrary convention that non-public data members end in a trailing underscore[2] .
Making the boolean class value_-added creates the complete Version 4:
enum { FALSE, TRUE }; class boolean { private: char value_; };
Adding a data member also illustrates ADT Rule #2: An object completely knows and encapsulates its own state.Save Version 4 as booltest.h, and compile the two test suites from last month's article. My results, appearing in Table 1, are quite interesting. The percentage of passing test cases is the highest yet, 58%. Even so, Version 4 betrays some glaring weaknesses; for example, this version cannot
- mix with FALSE and TRUE
- assign from equality, relational, and logical operators
- act as an operand with equality, relational, logical, and conditional operators
In short, the very contexts in which we most need boolean are anathema to this implementation. To its credit, though, Version 4
- prevents assigning boolean to scalars (unlike enumerations, the class doesn't silently convert to integral type)
- can't be used with any arithmetic or bitwise operators
The high success ratio is somewhat deceiving, and demonstrates the danger of relying on one statistic. Were these test cases weighted, or the number of cases in each category proportional to that category's importance, we could infer more from the ratio. Put another way, even though Version 3 had fewer passing test cases, I rate it overall superior to Version 4.
Too Much of a Good Thing?
Class boolean in Version 4 is quite xenophobic it won't play nicely with anyone else. In my quest to make boolean type-safe, I've ended up making it too type-safe, to the point it doesn't even recognize it's own spiritual kin (FALSE and TRUE). Remember, no type is an island; boolean is useless if it can't integrate within a larger context. Consider this to be ADT Rule #3: A class must balance type safety against the need to interact with other types.
The trick is to slowly release type control, letting boolean mingle with other types without sacrificing type safety, the moral equivalent of letting it date while you set a curfew and leave the porch light on. I think the first order of business for boolean is reestablishing symbiosis with FALSE and TRUE, so that initializations and assignments such as
boolean b = FALSE; b = TRUE;become valid again. We have to somehow let the enumeration turn itself into a type that boolean accepts. But what type is that? Well, at a minimum, every type is compatible with itself. Perhaps casting FALSE and TRUE via
boolean b = (boolean) FALSE; b = (boolean) TRUE;will work. If you try this with Version 4, the compiler will complain there is no suitable conversion available to make this work. You need to augment boolean to support these casts.
Note that in C++, these casts can alternatively be written as
boolean b = boolean(FALSE); b = boolean(TRUE);These are called function style casts, to differentiate them from the more traditional C-style casts. The casts do indeed look like calls to a function called boolean. And how is such a function defined? As a boolean constructor.
Anatomy of a Constructor
For any class, C++ will attempt to automatically define two constructors for you. In the case of our boolean class, those constructors are the default constructor boolean(), and the copy constructor boolean(boolean const &). I say "will attempt" because for some classes, the language can't perform the synthesis. In the case of boolean, the language can, so visualize Version 4 as really being
enum { FALSE, TRUE }; class boolean { public: boolean(); boolean(boolean const &); private: char value_; };with the two constructor implementations defined internally by the language. Think of the default constructor as turning "nothing" into a boolean, and the copy constructor as turning an existing boolean into another boolean. I want to add a third constructor that performs a little alchemy, turning the leaden FALSE and TRUE into gold boolean. But in order to perform that conversion, I need to know what type I'm converting from.
As written, FALSE and TRUE have no explicit type they are constants of an unnamed enumeration. That wasn't a problem for Version 4, but it is now. I want to choose a name that meaningfully relates to boolean, yet probably won't accidentally be used in place of boolean. For this discussion, I'll use the name boolean_value.
Having settled on the enumeration name, I can now add the constructor to boolean, forming the class definition of Version 5
enum boolean_value { FALSE, TRUE }; class boolean { public: boolean(boolean_value); private: char value_; };
This brings up ADT Rule #4: When mapping one type's value to another type, make the mapping simple, consistent, and predictable[3] . This new constructor is really mapping the incoming boolean_value to an internal boolean state representation. I think the simplest mapping is to let the boolean_value decay to its underlying integer value, giving the following constructor implementation:boolean::boolean(boolean_value value) : value_(value ? TRUE : FALSE) { }
Why the conditional expression? Why couldn't I just writeboolean::boolean(boolean_value value) : value_(value) { }After all, aren't the only possible boolean_values FALSE and TRUE? The answer is shown by the statement
boolean b((boolean_value) 2);which is not well-formed (enumeration constant out-of-bounds), but may still compile and run. The constructor as written ensures value_ is set to only FALSE and TRUE, regardless of what argument is passed in. With this constructor, the final Version 5 is:
enum boolean_value { FALSE, TRUE }; class boolean { public: boolean(boolean_value); private: char value_; }; boolean::boolean(boolean_value value) : value_(value ? TRUE : FALSE) { }Plugging this version into the test cases and compiling, we get ... more failed test cases than with any other version so many that I won't even bother with a summary. Why? It turns out many of these errors stem from failure of the declaration
boolean b;But how can this fail? It worked until now; besides, what can be more benign than defining an object? Were this C, the answer would be "not much." But in C++, what looks like a simple variable declaration is in reality a call to a boolean constructor, in this case the default constructor.
Whither the Default Constructor
As I mentioned earlier, the compiler will synthesize a default constructor if you don't provide one, and if the language is able. Here is one of those conditions in which the language is unable: when you provide any of your own constructors, the compiler can no longer synthesize a default constructor. In our case, we provided boolean(boolean_value), meaning the compiler could not provide boolean(). To make
boolean b;work, we need to write our own default constructor. The imme- diate solution is
boolean::boolean() { }However, this leaves the object uninitialized: the data member value_ is probably neither FALSE nor TRUE (voting as "undecided" in CNN news polls). A better solution is to initialize value_, as in
boolean::boolean() : value_(FALSE) { }That way a boolean object, as seen by client programs, always has some well-defined value (ADT Rule #5: Minimize the domain of exceptions by keeping an object in a well-defined state). We have defined boolean so that it can take on one of two values, FALSE and TRUE, and nothing else. Barring some egregious error (like an errant pointer write), boolean objects cannot take on ill-defined values.
Notice the similarity between this default constructor and the constructor we already provided:
boolean::boolean(boolean_value value) : value_(value ? TRUE : FALSE) { }Both initialize value_, the default constructor with FALSE and the non-default constructor with FALSE or TRUE. In fact, the declarations
boolean b;and
boolean b(FALSE);have the same net effect. Through the magic of a default argument, you can combine these two constructors into one. When you include an argument, the constructor acts like a boolean_value-to-boolean conversion, as in Version 5; when you leave out argument, the constructor acts as a default constructor.
Adding the dual-identity constructor yields Version 6:
enum boolean_value { FALSE, TRUE }; class boolean { public: boolean(boolean_value = FALSE); private: char value_; }; boolean::boolean(boolean_value value) : value_(value ? TRUE : FALSE) { }
Prepending to Listing 1 and compiling makes for a much happier experience. I summarize the results in Table 2. The passing percentage has crept up a bit again, to 63%. These results are identical to those of Version 4 with one valuable exception: boolean can once again initialize and assign from FALSE and TRUE.Conclusion
The Version 6 boolean type is still lacking in three key ways:
- boolean can't be used in expressions.
- boolean can't assign from expressions.
- While boolean is type-safe, FALSE and TRUE are still relatively type-unsafe.
In fact, you could argue that this solution is no better overall than the enumeration solution (Version 3) ending last-month's episode. So what have we really gained?
Software development is like economic investment: long-term benefits often come at short-term cost. The enumeration solution, while cheap and easy, had reached its potential. Also bear in mind that enumerations are really glorified integers, so we had about as much control over Version 3 as we do over int. By writing our own class, we have more complete and precise control over how the type acts. That long-term payoff begins next month, when we'll fix all the above limitations, and more.
Erratica
Diligent reader the first, Rex Jaeschke, notes a couple of code bugs in my November 1995 column. In Listing 2 I wrote that
int n = NULL; /* value of NULL after expansion */ int *p = 1; /* casting int to pointer */represented C implementation-defined behavior. More accurately, I meant the comments gave examples of such behavior, while the code showed how the cited behavior could manifest.
For the first statement, Rex points out that the entire line
int n = NULL;may be undefined, since NULL can be defined as ((void *) 0). This would make the statement expand to
int n = ((void *) 0);
While the general case of assigning a pointer to an integral type is implementation-defined, if the target integral type (int here) is too small to hold the source pointer (void *), the result is undefined. To make my intent more clear, I should have usedvoid *n = NULL.
In the second statement, I mention a cast in the comment, but don't actually show the cast in the code. The corrected line isint *p = (int *) 1;
Diligent reader the second, Stan Kelly-Bootle[4] , commenting on the same column, suggests the Halting Problem may not be the impediment to a completely specified language. Stan offers a program likevoid main(void) { while (1) ; }as being conforming but not halting. Here Stan refers to an arbi- trary program halting; I referred (obliquely) to a specific kind of compiler halting while compiling itself. Stan further opines that language specification strays into the realm of Kurt G[sinvcircumflex]del's Incompleteness Theorem. In his letter, Stan cites a couple references discussing this problem, and suggests the topic would be worthy of a future column.
We'll have to see; my future article ideas are stacking up like Sea-Tac at Christmas. Still, as a former DJ, I'm used to taking requests if there's something you'd like to see covered, drop me an e-mail and we'll rap about it.
Footnotes and References
[1] Given that I think of boolean as a conceptual replacement for the builtin bool, and that you can't inherit from builtin types, I'm not designing boolean to be a base class hence value_ is private, not protected. As a rule, I make all data members private anyway, even for classes designed to be base classes.
[2] Note that, depending on context, ANSI reserves some identifiers with leading underscores for the compiler implementor. Even though value_ does not appear in such a context meaning I could have chosen _value I take the more conservative route of avoiding leading underscores altogether, and strongly recommend you do too. See the ANSI C spec, subclause 4.1.2.1, for details.
[3] Builtin types can be poor role models here, as evidenced by the conversion rules for integrals of different size and sign.
[4] Unlike a certain someone, who shall remain as anonymous as his December column, I do have the foggiest notion who this Stan Kelly-Bootle is.
Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 3543 167th Ct NE #BB-301, Redmond WA 98052; by phone at (206) 881-6990, or via Internet e-mail as rschmidt@netcom.com.