March 1994/Standard C

Columns

Standard C

C++ Language Support Library

P.J. Plauger

P.J. Plauger is senior editor of The C Users Journal. He is convenor of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Standard C Library, and Programming on Purpose (three volumes), all published by Prentice-Hall. You can reach him at pjp@plauger.com.

Introduction
I conclude my discussion of the language support portion of the library specified by the draft C++ standard. (See "Standard C: The Header <exception>," CUJ, February 1994, "The C Library in C++," CUJ, December 1993, "C++ Library Ground Rules," CUJ, November 1993, and "Developing the Standard C++ Library," CUJ, October 1993.)
"Language support" consists of those functions that can be called implicitly by C++ code, even when the code apparently contains no function calls. It also consists of the types required to declare and use those functions, as well as a few other related functions and types not directly needed to support the C++ language proper.
Standard C has few such creatures (if any). You can argue that several type definitions are part of language support. The types ptrdiff_t, size_t, and wchar_t are defined in various headers so you can declare objects that have the same types as certain expressions. They are a way to convey otherwise unknowable (or hard to learn) information about these types from the translator to the program.
You can also argue that function exit is part of language support. The execution of any C program effectively occurs by evaluating the expression exit(main(argc, argv)). Saying such a thing simplifies descriptions — it ties together the effects of calling exit and returning from main, for example. But it doesn't have a dramatic effect on how you actually write C programs. You cannot, for example, provide your own version of exit and expect it to be called when main returns. (And many implementations don't really call exit when main returns, but some other underlying function instead.)
C++, on the other hand, offers numerous opportunities along these lines. You can trot up all sorts of functions that get control, either directly or indirectly, when one of the language support functions gets called. For example, in last month's discussion of exceptions I identified three functions that let you register handlers, or functions that get control under some circumstances:

set_terminate, to specify a handler for calls to terminate() set_unexpected, to specify a handler for calls to unexpected()

xmsg::set_raise_handler, to specify a handler for calls to xmsg::raise()
You can also derive a class from xmsg and override the virtual xmsg::do_raise to get control when certain exceptions get reported by executing code.
Thus, the draft C++ standard is a bit harder to write than the C Standard. In C, the implementation provides all library functions and you the programmer cannot displace them. The C Standard only has to describe a single interface between implementation and program. In C++, however, the program can displace functions otherwise supplied by the library. The draft C++ standard must spell out the environment promised to such a displacing function. And it must spell out what is expected of the displacing function so the program doesn't get surprised.
A handler for terminate(), for example, is not supposed to return to its caller. If you provide one that prints a message and returns, you can cause the library severe problems. The draft C++ standard says so. So when you read the descriptions that follow, remember that the "treaty" between programmer and implementor can be multifaceted. The extra complexity of the draft C++ standard is one of the prices we pay for extra flexibility in this area.

Storage Allocation
Exceptions can be thought of as a way to structure the use of setjmp and longjmp. Similarly, the addition of new and delete to C++ essentially structure the use of malloc and free. By writing:

Thing *p = new Thing;
you are assured that the object of type Thing is properly constructed after it is successfully allocated and before it can be accessed through p. Similarly, the expression statement:

delete p;
ensures that the object is destroyed before its storage is deallocated.
You don't have to include any headers before writing expressions like these — new and delete are indeed built right into the language. But you can also play a variety of games with storage allocation if you choose. To do so, you begin by including the header <new>. Listing 1 shows a representative version of this header. I omit the extra superstructure required by namespaces, because it is distracting and still in a state of flux.
The simplest game you can play is to gain control when space for the heap is exhausted. The function set_new_handler lets you register a handler for this condition. In principle, the draft C++ standard says you can "make more storage available for allocation and then return," but it fails to describe a portable way to do so. Calling free to liberate storage may help, but there is no requirement that storage be actually allocated by calling malloc. Deleting one or more allocated objects may also help, but even that is not guaranteed. More likely, you will want to throw an exception or terminate execution at this point.

xalloc Exceptions
The default "new handler" does, in fact, throw an exception now. As I described last month, all library exceptions are derived from the base class xmsg. Moreover, all exceptions are thrown by calling ex.raise(), for some object ex of class xmsg. Unless you seize control of the process in one of the ways I described last month, the eventual outcome is that a failed allocation will throw an exception, which will in turn terminate execution of the program.
This is a significant change from universal past practice, which has been to quietly yield a null pointer as a result of the new expression. The Library Working Group of X3J16/WG21, the joint ANSI/ISO standards committee for C++, anguished quite a bit before recommending this change. The joint committee anguished a bit more in turn. But eventually, the predominant wisdom was that the Standard C++ library had bloody well better use the full language in this case, not just the bits that were available when new and delete were first added to C++.
A persuasive argument is that very few programs truly check all new expressions for null pointers. Those that don't may well stumble about when the heap is exhausted — they're almost certainly better off dying a clean death. Those that do check all such expressions often simply abort — the path to abnormal termination is now just slightly different. It is only those few sophisticated programs that try to do something nontrivial when heap is exhausted that need a bit of rewriting. Most of the joint committee felt this was a necessary price to pay to introduce exceptions at this critical juncture.
Even so, some sympathy remains for being able to revert to the old behavior. For a variety of reasons, the Library Working Group has not spelled out a portable way to do so. But the group has identified what it thinks should be a common extension. Calling set_new_handler with a null pointer argument is otherwise undefined behavior. It seems natural to use this nonportable call as a way for implementations to know that they should revert to the older behavior.

Replacing operator new(size_t)
If you want more certain control over the business of allocating storage, your best bet is to provide your own versions of operator new(size_t) and/or operator delete(void *). These functions have a peculiar dispensation — the library provides a version of each, but you can "knock out" those versions by defining your own. (Only the array versions of these two operators, described below, also enjoy this special status within the Standard C++ library.)
Before I go into details, please note an important distinction here. When you write:

Thing *p = new Thing;
the new Thing part is called a "new expression." It calls operator new(size_t) to allocate storage, but it also does other things, such as constructing the newly allocated object. All that operator new(size_t) has to worry about is providing the number of requested bytes, suitably aligned, or dealing with heap exhaustion. Listing 3 shows one way to write this function.
Similarly, when you write:
delete p;
the delete p part is called a "delete expression." It calls operator delete(void *) to free storage, but it first destroys the object (only if the pointer is not null, of course). All that operator delete(void *) has to worry about is freeing storage for the object. Listing 4 shows one way to write this function.
So one thing you might do is replace operator delete(void *) with a function that doesn't really free the storage. That could be handy while you're debugging a program, provided of course that you have enough heap to run your test cases.
Or you might replace both operator new(size_t) and operator delete(void *) with versions that are simpler, or faster, or more sophisticated than the library versions. It is important to replace both, because the latter function in the library only knows how to free storage for objects allocated by the former.
In either case, you probably don't have to bother with set_new_handler. You are at liberty to do whatever you want when you run out of heap. No need to call the new handler, which you can't easily do portably anyway.

Placement Syntax
Yet another latitude granted by the C++ language is to provide an arbitrary set of additional arguments in a new expression, as in:

Thing *get_special(T1 stuff, T2 more_stuff) { return (new (stuff, more_stuff) Thing); }
This form implicitly calls the function:

void *operator new(size_t, T1, T2);
which you are obliged to supply. I leave it to your imagination what extra parameters might be useful when you're allocating some of your more sophisticated objects.
It doesn't take too much imagination, however, to see a very common need. Sometimes you know exactly where you want a C++ object to be constructed — you have reason to believe that the storage area X is large enough and suitably aligned to hold an object of type Thing. Moreover, you're confident that no object has been constructed there already for which a destructor will later be called. (Whew!)
To deal with this twilight zone between C and C++ programming, you can write:

Thing *p = new ((void *)&X) Thing;
This, naturally enough, calls the function:

void *operator new(size_t, void *);
which can simply return its second argument, as shown in Listing 4. The Standard C library provides this one version of a placement operator new. (Don't forget to include the header <new> to be sure it is properly declared.) Any fancier placement variants are up to you to provide.

Member operator new
Yet another way exists for controlling how objects get allocated. For any class, you can overload all the variants of operator new and/or operator delete that I've mentioned so far. Perhaps you want to write your own versions of:

void *Thing::operator new(size_t); void Thing::operator delete(void *);
that does a really fast job of allocating and freeing objects of class Thing. It can, for example, maintain a list of previously freed objects and hand them back quickly for future allocation requests. Unless you really get tricky, you can even ignore the size_t first argument to all variants of operator new, since you know how big a Thing is likely to be. (How do you get tricky? Well, you can make operator new virtual in the base class and fail to override it in a derived class. But thinking about things like that gives me a headache.)
So you see that you can exercise pretty fine control over how all objects, or even individual objects, get allocated.

Allocating Arrays
But that leads to one last residual problem, regarding the allocation and freeing of arrays. You can, for example, write:

Thing *p = new Thing[N];
to allocate an array of N elements each of type Thing. Each of the elements is constructed in order, starting with the first (element zero). In this case, you must write the expression statement:

delete[] p;
to delete the array, not just a simple:

delete p;
as before. Why? Because the "array new expression" above has to somehow memorize how many elements N it has allocated. It needs to know to locate this memorized information and use it to destroy the appropriate number of elements and free the appropriate amount of storage. Yes, some existing implementations of C++ let you be cavalier about deleting arrays the wrong way, but don't count on that license in a portable program.
This requirement presents another problem. What happens if you've provided a member operator new(size_t) for class Thing, as above? It cannot, in general, know whether it's being asked to allocate storage for a single element or a whole array. (Remember the potential trickery I mentioned above.) So what C++ has done in the past is to ignore any such member functions and call the global operator new(size_t) for all array allocations. This has been a less than satisfactory solution.
The joint committee has plugged this control gap by permitting you to define functions such as the members operator[] new(size_t) and operator delete(void *). Defining these functions gives you control over the allocation and freeing of arrays of class objects as well as the class objects themselves. You can't necessarily tell how many array elements are being allocated, by the way. An array new expression can ask for extra storage for its own bookkeeping, so you'd better honor the size_t argument blindly. But at least you can maintain private storage pools now for array objects.
For completeness, the draft C++ standard also includes global versions of:

void *operator new(size_t); void operator delete(void *);
The library versions of these functions just turn around and call the non-array library versions, so I won't show you the code for them. And you can indeed knock these functions out with your own definitions, but I'm not sure why you'd bother. Doubtless, someone more clever or perverse than I can make a case for any feature added to C++.

Type Information
There is one last aspect to the language support library. It is rather small compared to exceptions (all of last month's installment) or storage management (most of this month's). I tack it on here for completeness.
Another relatively recent significant addition to the draft C++ standard is "run-time type identification" (or RTTI, for short). Basically, it adds the operator typeid for obtaining various bits of information on the type of an object (or expression). The operator yields an object of class typeinfo, defined in the header <typeinfo>. Listing 5 shows one way to write this header.
The exception badtypeid is reported in those cases where the type cannot be determined statically at translation time. If, in the process of chasing down the actual object, the program encounters a null pointer, you can guess what happens.
(If you're put off by all these names made from words run together, you're not alone. There's a good chance that the joint committee will approve a new naming convention that involves a more liberal use of underscores to separate component words in names. So don't be surprised if many of these compound names change in the coming months.)
What can you do with an object of class typeinfo? Well, you can obtain some sort of name for the type, for one thing. typeinfo::name() yields a null-terminated multibyte string (or NTMBS, in the jargon of the draft C++ standard) that presumably says something meaningful about the type. There are no standard names defined, so far, not even for the builtin types.
You can also compare two objects of class typeinfo for equality or inequality. Within any given program, you can expect two such objects to compare equal only if they derive from two expressions of the same type. Don't expect to be able to remember these critters in files, however, and check for type equality across programs. Even running the same program twice doesn't promise to yield the same representation of a typeinfo object for the same type each time. (I have indicated that the type information can be represented as an int, but that is just illustrative, not a requirement.)
Finally, you can impose an ordering on all the types within a program. typeinfo::before(const typeinfo&) returns nonzero for an object that represents a type earlier in the pecking order than the argument object. Once again, however, no promises are made about the rules for determining this order, or whether they're even the same each time you run the program.
I'm sure far more can be said about the uses of RTTI, but I'm not the one to say it at this point in my career. Even if I were, this is not the place to say it. For now, you know what the standard C++ library has to know about RTTI.

Listing 2: The function operator new(size_t)