June 1993/Stepping Up To C++

Columns

Stepping Up To C++

Recent Language Extensions to C++

Dan Saks

Dan Saks is the founder and principal of Saks & Associates, which offers consulting and training in C ++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513) 324-3601, or electronically at dsaks@wittenberg.edu.
As part of the C Users Journal's expanding coverage of C++, I'll be writing periodic reports on the progress of the C++ standard. This is the first such report.
For the last three years, I've been writing these reports for The C++ Report. Over that time, the standards committee added several extensions to the C++ language. This article summarizes most of those extensions.
ANSI (the American National Standards Institute) chartered technical committee X3J16 way back in late 1989 to write a US standard for the C++ programming language. X3J16 started working in early 1990, and has been meeting three times a year ever since.
About a year later, JTC1, the joint technical committee of ISO (the International Standards Organization) and IEC (the International Electrotechnical Commission) chartered SC22/WG21 to produce an international C++ standard. WG21 has joined X3J16 for all of its meetings since the summer of 1991. I refer to the joint C++ committee as WG21+X3J16.
I'm reluctant to predict when the standard will be complete. Formal approval is at least three years away. But the essential features of the language and libraries should be pretty stable much sooner than that.
As you may have noticed from the brief biography below, I'm the secretary of both the ANSI and ISO C++ standards committees. I try to remain detached when I talk about the committees, but I will occasionally lapse into talking about the joint committee as "we."

Where We Started
Language standards often start from one or more base documents. That is, rather than write the standard from scratch, the committee adopts an existing written description of the language and libraries as the first draft. The committee then spends years refining that draft. These refinements include:

reconciling inconsistent statements

improving the precision of the words

adding missing features
X3J16 selected two base documents:

the AT&T C++ 2.1 Product Reference Manual (the PRM)

the C standard
C++, like C, began at AT&T Bell Labs. AT&T started distributing a C++ compiler called cfront in the mid-1980s. The 2.1 PRM describes the language more or less as AT&T had implemented it by early 1990.
Most C++ programmers know the PRM as the Annotated C++ Reference Manual, or ARM (Ellis and Stroustrup 1990). The ARM includes the text of the PRM along with annotations and commentary that elaborate the language description, explain many language design decisions and suggest implementation techniques. However, AT&T (which holds the copyrights to both the PRM and the ARM) only granted the C++ committee the right to use the PRM, so the current draft of the standard does not include the annotations and commentary.
The committee's first decisions brought the Working Paper closer to the ARM. The ARM includes chapters on templates and exception handling. cfront 2.1 did not include these features, so the PRM, and hence the base document, only had empty place holders for those chapters. The committee added these chapters to the Working Paper during its first year, so for all practical purposes, it's fair to say that the ARM is the base document.
X3J16 selected the ANSI C Standard as its second base document in the hope of minimizing unnecessary differences between C++ and C. I think some of us on the committee expected that, whenever C++ and C share a feature, the C++ standard would use the same, or nearly the same, wording as the C Standard. But to date, the C++ draft still employs wording that is closer to the ARM, even when describing features from C.
By the way, the politically correct term for the C++ draft is not the "draft," but the "Working Paper." The Working Paper won't be a draft until the committee submits it for public review. The first public review is at least a year away.

Committee Groups
The joint committee conducts most of its technical work in smaller ad hoc groups. These groups are:

C Compatibility — This group compares the C++ Working Paper with the C Standard and makes recommendations for reconciling their differences.

Core Language — This group resolves ambiguities and inconsistencies in the core features of the language as specified by the PRM. This group recently divided into smaller groups: the "hard" core group tackles the broader issues that take many meetings to resolve, and the "soft" core group addresses a numerous collection of smaller issues.

Environments — This group focuses on requirements for the translation environment (including the preprocessor and linker) and the execution environment (including program startup, execution, and termination).

Extensions — This group evaluates proposals to add new language features to C++. It makes recommendations for accepting some and provides rationale for rejecting others. (WG21+X3J16 considers templates and exceptions to be extensions.)

Libraries — This group drafts the specification of standard library components, including iostreams and the C++ version of the Standard C library.

Syntax — This group recommends refinements to the formal C++ grammar to make the language more precise and easier to parse.

The New Stuff
In addition to templates and exception handling, the committee added a few other new features to C++:

support for European translation environments

wchar_t as a keyword and a distinct type

operator overloading for enumeration types

overloading of new and delete operators for arrays

relaxed restrictions on the return types for virtual functions

runtime type identification
None of these features is universally available. Of the MS-DOS compilers I own, only three provide templates (Borland, Comeau, and MetaWare), and only one supports overloading on enumerations (Microsoft).
I will briefly summarize these extensions here, and cover them in detail in future columns.

Templates
Templates come in two flavors: function templates and class templates. A function template defines a family of overloaded functions employing the same algorithm, but applied to operands of different types. For example,

template <class T> inline T abs(T x) { return x >= 0 ? x : -x; }
defines a template for the absolute value function abs.
The template itself doesn't generate any object code. Rather, the translator instantiates (creates) an abs function for a particular parameter type the first time you call abs with an argument of that type. For instance, if you call abs(i) for int i, the translator creates a definition for abs(int) as if you had defined

inline int abs(int x) { return x >= 0 ? x : -x; }
If you later call abs(f) for float f, the translator instantiates

inline float abs(float x) { return x >= 0 ? x : -x; }
Class templates provide a similar facility for classes. For example, in my recent columns on dynamic arrays (CUJ, November 1992) and operator[] (CUJ, January 1993), I wrote a class for a dynamic array containing elements of type float. A version of the float_array class definition appears in Listing 1. Rather than rewrite the class for each different element type, you can write a single class template for a dynamic array with an arbitrary element type. The template appears in Listing 2. Using that template,
array<int> ai (n);
declares a dynamic array with n int elements, and
typedef array<float> float_array;
defines type float_array as an instance of the class template.

Exception Handling
Exception handling provides an orderly way to respond to disruptive events that occur during program execution. C++ exception handlers can handle synchronous, rather than asynchronous, events. Synchronous events are the kinds of problems you can detect by conditional expressions (as in an if statement or the argumentto a call on the assert macro). For example,

p = new T; if (p == 0) // you're out of memory
An asynchronous event is one triggered by an external event, like a device interrupt or a hardware fault. The standard header <signal.h> defines the C library facilities for handling asynchronous events.
Listing 3 sketches a simple exception handler. C++ reserves three new keywords for the syntax of exception handling: try, catch, and throw. A try-block is a compound-statement (a sequence of statements enclosed in brackets) followed by a sequence of one or more handlers, also known as catch clauses. The handlers "catch" exceptions "thrown" by throw expressions executed in the compound-statement or in functions called from within the compound-statement.
Each catch clause in a try-block handles an exception of a different type. The exception-handling mechanism matches the type of the expression in the throw expression with the formal arguments in the catch clauses. Throwing an exception terminates the block throwing the expression and transfers control to the chosen catch clause. In addition, throwing an exception "unwinds" the runtime stack. That is, it terminates all active functions invoked from the try-block and deallocates their local variables.
For example, function h in Listing 3 contains a try-block that calls function g. If g detects a problem, it throws an exception of type XXX which is caught by the third catch clause in h. Otherwise, g calls f. f may throw an exception of type int, which will be caught by the first catch clause in h.
Some of you may have noticed that exception-handling behaves much like the Standard C functions setjmp and longjmp. The key difference is that throwing an exception invokes destructors for local objects as it unwinds the stack on the way back to the catch clause, whereas longjmp merely discards intervening stack frames as it returns to the setjmp point. longjmp works in C++ as long as it never terminates a function with local variables that have destructors.

A European Representation for C++
WG14 (ISO C) recently approved a normative addendum to the ISO C standard. (See P. J. Plauger's "Formal Changes to C," CUJ, April 1993). The addendum includes a proposal from the Danish delegation to add a way for programmers to write C programs using only the invariant ISO 646 characters.
This need arose because C uses the US ASCII character set as its alphabet. ASCII is the US national variant of the ISO 646 standard character set. Some or all of the ASCII characters [] { } | ^ ~aren't available in non-US variants of ISO 646. Other national variants of ISO 646 replace some of these characters with native language characters, making C programs a bit harder to write and read.
The normative addendum to the C Standard corrects the problem by adding new spellings as alternates for tokens that use ASCII-specific characters. The new spellings only use characters from the invariant ISO 646 character set.
Recognizing that C++ shared this problem with C, the C++ committee also adopted a set of alternate spellings. However, C++ adopted a slightly different set. Table 1 lists the C and C++ alternatives, noting the differences. At one time the sets were the same, but they got out of sync. I believe the committees will try to reconcile the differences.
In C, the alternate spellings that are identifiers (everything from bitand on down in Table 1) are macros defined in the new standard header <iso646. h>. In C++, these new spellings are keywords (reserved for all uses by the language). C++ also provides <iso646.h> for compatibility with C, but on many C++ implementations, it might just be empty.

wchar_t as a Keyword
Standard C provides wide characters and multibyte strings so that C programmers can manipulate very large character sets, like Japanese Kanji. Several headers in the Standard C library define wchar_t as the wide character type. wchar_t must be an integral type sufficiently large to represent all character codes in the largest character set among the supported locales.
Defining wchar_t as a typedef poses a problem for C++. Members of the libraries group wanted to be able to overload functions with arguments of type wchar_t, such as

int foo(int); int foo(wchar_t);
In particular, they wanted to be able to overload the output operators

ostream &operator<< (ostream &os, char c); ostream &operator<< (ostream &os, int i); ostream &operator<< (ostream &os, wchar_t w);
so that given

int i; wchar_t w;
the expression

cout << i;
displays i as an int, and

cout << W;
displays w in its proper graphic representation. The problem is that, if wchar_t is a typedef, it is indistinguishable from at least one of the other integral types. In C+ +, a typedef is not a distinct type; it is an alias for another type. If the library defines

typedef wchar_t int;
then

ostream &operator<<(ostream &os, int i); ostream &operator<<(ostream &os, wchar_t w);
are the same function. That function will (most likely) display objects of type wchar_t as numbers.
Thus, WG21+X3J16 made wchar_t a new keyword in C++ representing a distinct type. wchar_t is still represented the same way as one of the standard integral types (meaning it has the same size, alignment requirements, and "signedness" as one of the integral types), but it is now a distinct type for the purposes of overload resolution.

Operator Overloading for Enumerations
In C, enumerations are integral types. The ARM also states that enumerations are integral types in C++. However, the current C++ Working Paper says that "Enumerations are not integral, but they can be promoted to signed and unsigned ints." Thus, Standard C code such as

enum color { RED, WHITE, BLUE }; enum color c = 0;
is no longer C+ +. (I don't think this is any great loss; it was always poor style.) An assignment like

int n = BLUE;
which is valid C, is still valid C++.
This change introduces a more serious incompatibility with C: the predefined arithmetic operators, most notably ++, no longer apply to enumerations. Therefore, a loop such as

for (c = RED; c <= BLUE; ++c)
no longer compiles as C++. (This is a loss because it can be good style.) The present wording of the Working Paper doesn't seem to allow

c <= BLUE
But I believe it is, in fact, allowed because both operands promote to int.
WG21+X3J16 reduced the trauma of this incompatibility by extending C++ to permit overloading on enumerations. For example, you can write the for loop above by defining

inline color &operator++(color &c) { return c = color(c + 1); }
as the prefix form of ++ for objects of type color.

Overloading new and delete for Arrays
In C, you allocate dynamic memory using the standard library functions malloc (or calloc or realloc), and you deallocate memory by calling free. In C++, you use the new and delete operators instead.
Both malloc and new allocate memory, but new also applies a constructor to the storage if the allocated object has a class type with a constructor. That is, if X is a class with a constructor

p = new X;
allocates and constructs an X object, but

p = (X *)malloc(sizeof(X));
merely allocates storage and leaves the object unconstructed. Similarly, both free and delete deallocate storage, but only delete applies a destructor (if any) to the object just before deallocating it.
Each C++ environment provides a default implementation for the new and delete operators. However, if this general-purpose allocator isn't right for your application, you can write you own versions of new and delete. For example,

void *operator new(size_t n) { ... }
defines a replacement allocation function, so that calling the new operator calls this allocator instead of the system-supplied allocator. Similarly

void operator delete(void *p) { ... }
defines a replacement deallocator for use by the delete operator.
C++ not only lets you replace the global dynamic memory allocator, but even lets you define a different allocator for each class. That is,

class Y { public: void *operator new(size_t); void operator delete(void *p); ... };
defines class Y with its own special-purpose versions of new and delete. A call such as

q = new Y;
allocates memory using Y::operator new, rather than the global operator new, and

delete q;
deallocates memory using Y::operator delete. Those classes that do not define their own new and delete use the global operators.
The ARM states that, even for a class Y that defines its own operator new, allocating an array of Y objects always uses the global operator new. That is,

q = new Y[n];
ignores Y::operator new and uses ::operator new. Consequently, deleting that array using

delete [] q;
ignores Y::operator delete and uses ::operator delete.
The committee recently extended C++ to provide a separate set of dynamic memory management functions for arrays of objects:

void *operator new[](size_t n); void operator delete[](void *p);
With this extension,

p : new X[m];
allocates memory using operator new[] instead of operator new, and

delete [] p;
uses operator delete[] instead of operator delete.
You can even define operators new[] and delete[] for an individual class, such as

class Y { public: void *operator new(size_t); void *operator new[] (size_t); void operator delete(void *p); void operator delete[](void *p); ... };
so that

q = new Y[n];
uses Y::operator new[] instead of Y::operator new.

Enhanced OOP Support
The standards committee enhanced object-oriented programming (OOP) in C++ by

relaxing restrictions on the return types for virtual functions

adding runtime type identification
C++ supports OOP by providing inheritance and virtual functions. I described inheritance in a recent two-part tutorial ("Inheritance, Part 1," CUJ, March 1993 and "Inheritance, Part 2," CUJ, May 1993). I will explain virtual functions, and these extensions, in future columns.

Reference
Ellis, Margaret A. and Stroustrup, Bjarne. 1990. The Annotated C++ Reference Manual. Reading, MA: Addison-Wesley.