P.J. Plauger is senior editor of The C Users Journal. He is convenor of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Standard C Library, and Programming on Purpose (three volumes), all published by Prentice-Hall. You can reach him at pjp@plauger.com.
Introduction
This is the third installment in a series on the draft standard being developed for the C++ library. (See "Standard C: Developing the Standard C++ Library,"CUJ, October 1993, and "Standard C: C++ Library Ground Rules," CUJ, November 1993.) The joint ANSI/ISO committee X3J16/WG21 has been working for over four years to complete the Standard C++ language and polish a draft standard for it. (And there's still a long way to go.) Only recently has the library portion of that document begun to take shape.That's not for want of effort. Mike Vilot has chaired the Library Working Group (a subcommittee of the joint committee) since its inception. He has reported steady progress to the joint committee at each meeting, but the LWG began with a significant disadvantage. It essentially had no base document to build on, beyond the C Standard for describing the C library portion of the C++ library. The language itself was pretty thoroughly described in Ellis and Stroustrop's, The Annotated C++ Reference Manual (or ARM, see Reference [1]).
Thus, a lot of early effort went into deciding the desired scope of the C++ library. Then more effort went into refining specifications for various library classes. Not the least of this work was updating and summarizing the extensive set of classes that implement iostreams. (Fortunately, Jerry Schwarz, the original author of iostreams, was available to carry out much of this effort.) Now, at last, the LWG can focus more on the description of the Standard C++ library.
My purpose in presenting this ongoing series of columns is to introduce you to that newly created description. As much as possible, the words attempt to describe "prior art." People have been programming in C++ for about a decade now. A principal goal of the draft C++ standard is to describe what has gone before. Still, a standard can't help but be inventive in some areas, if only to resolve ambiguities that arose in the prior art. It also gets inventive when the drafting committee feels the need to add features. The evolving Standard C++ library has many ambiguities to resolve and more than a few new features added.
Thus, you will probably find more that is new here than you'd like to think, particularly if you've worked with C++ for several years. Before you decide that the LWG is out to lunch, however, please note two things. One is that several contributors to the library draft have worked with C++ since its earliest days. They have made difficult tradeoffs based on extensive experience, and they understand both the political as well as the technical cost of those tradeoffs.
The other is that the goal of an international standard is to describe a portable language. That may not exactly include your favorite dialect of C++, but equally it does not necessarily disallow its continued coexistence. Please don't fall into the easy trap of assuming that "undefined behavior" is disallowed behavior. That rubric is often used as a shorthand for "a permissible extension in a nonportable program that doesn't require the translator to issue a diagnostic."
One reason for presenting this series of columns is to prepare the way for the coming public reviews of the draft C++ standard. (The schedule calls for those reviews to commence after the July 1994 meeting of the joint committee, though many harbor grave doubts.) Those reviews are sure to generate lots of public commentary, all of which must be addressed by the joint committee. The C++ language itself is sure to generate plenty of discussion, if only for the major extensions added to the draft standard.
Everyone needs time to digest new ideas before they can get comfortable with them. I'm sure that the library will stimulate its share of challenges, because its description is so new. I just hope that the library portion gets only its fair share. Thus, this early exposure to where the LWG is headed.
Here, once again, is the overall structure of the C++ library draft standard:
(0) introduction, the ground rules for implementing and using the Standard C++ library
(1) the Standard C library, as amended to meet the special requirements of a C++ environment
(2) language support, those functions called implicitly by expressions or statements you write in a C++ program
(3) iostreams, the extensive collection of classes and functions that provide strongly typed I/O
(4) support classes, classes like string and (perhaps) complex that pop up in some form in every library shipped with a C++ compiler
I covered (0) last month. This installment continues with (1) the Standard C library. It is a topic that involves more discussion than you might at first think.
The Standard C Library
An important reason for the success of C++ is that it's built atop C. That confers several immediate advantages:
And herein lies an interesting irony. For it is the limitations of C that inspired many of the features of C++. Just as C made C++ possible, it also made it arguably necessary to many people.
- C++ inherits C's well thought out technology for basic types, expression evaluation, and flow of control.
- C++ profits from C's popularity and portability.
- C++ programs can make use of the extensive Standard C library.
One of the major advantages touted for C++ over C is the ability to write class libraries instead of function libraries. A class encapsulates much more than just a type definition and a handful of related functions. It can enforce information hiding and proper protocols for using member functions. A well designed class, or set of classes, is bound to be more reusable than the equivalent collection of functions.
Nevertheless, the Standard C library endures as an important adjunct to C++. It has not been displaced by a superior set of classes. (Well, iostreams do replace much of what's in <stdio.h>, but not all.) If anything, I believe that the presence of such a rich function library has inhibited the growth of the kind of class library that many C++ programmers would prefer. As is so often the case, that which is good enough wins out over that which is arguably the best.
Including the C Library
So for whatever reasons, the C++ library includes the Standard C library as a subset. From a purely descriptive standpoint, the draft avoids repetition as much as possible. Rather than copy great gobs of wording from the C Standard, the draft C++ standard includes the library portion of the C Standard "by reference." (See Reference [2].)Life is never simple, of course. C++ is not exactly the same language as C. It is no surprise, therefore, to find that the Standard C library cannot survive completely unchanged in a C++ environment. The draft C++ standard includes a number of qualifiers to the behavior of the Standard C library. I mentioned one or two blanket qualifiers last month. You cannot, for example, declare a library function inline, as in:
extern double sqrt(double);In a C++ program, you must include the header <math.h> to be sure the function is declared properly.Here are some of the qualifiers you need to keep in mind when calling the Standard C library from a (standard conforming) C++ program:
The Type wchar_t
In C, the type wchar_t is defined in the headers <stddef. h> and <stdlib.h>. It serves as a synonym for one of the other integer types (used by the translator to represent wide characters). In C++, wchar_t is now a keyword that names a distinct type. That lets you overload functions and reliably distinguish between arguments of types char and wchar_t. But it also means that wchar_t is in a program's namespace. And the C headers had better not try to define the name anew.
The Macro NULL
In C, the macro NULL can be defined as any of 0, 0L, or (void *)0. In C++, the third option is no longer permissible (and the second one is of little benefit). One use for NULL in a C program is to emphasize that you're talking about a null pointer, and not just any old zero. A more important use, however, was in the early days of C, when pointers tended to be all the same size and function prototypes were nonexistent. By writing NULL as a null pointer argument, you were sure to get a zero of the proper size. It is important to point out, however, that the C Standard doesn't require NULL for the first use and doesn't guarantee that it's always suitable for the second.So for a variety of reasons, mostly stylistic, you should probably not use NULL in new programs anyway.
The Macro offsetof
In C, structures are pretty simple creatures. They consist only of the member objects you declare, in the order you declare those members. At worst, the translator throws in a few holes to get storage boundaries right. In C++, a structure may inherit data members from one or more base classes. Some of those members may be private. And the structure may contain one or more pointers to virtual tables. Thus, it is a little harder to say what is meant by "the offset of" an member object in an arbitrary C++ structure (class). It makes no sense at all to talk about the offset of a member class or a member function.For all these reasons, the macro offsetof is defined only for the "plain old data structures" of C.
The Header <stdarg.h>
Much the same sort of thing can be said about most of the machinery that lets you walk varying-length argument lists. The macros defined in <stdarg.h> barely work, in most implementations of Standard C. To ask them to deal with reference parameters, and references to objects of type va_list, is probably pushing things a bit too much. No words on this topic are in the library draft, as of this writing, but there has been some discussion on the e-mail reflector. I'm sure we'll want to restrict what's required of <stdarg.h>.
The Function longjmp
Some people would like to ban setjmp and longjmp entirely from C++ programs. Indeed, these functions are no longer strictly necessary. Exception handling has been added to the draft C++ standard to perform the same operations, but in a safer and more structured fashion.The major problem is skipped destructors. Calling longjmp peels back the stack an arbitrary number of levels to get back to the context where setjmp was called for the jmp_buf argument. Any automatic storage constructed on the way down just gets abandoned on the way back out. A thrown exception, by contrast, makes a point of calling all these destructors in the proper order on the way from the throw point to the catch clause that handles the exception.
It is an obvious fact of life, however, that the world is full of C code that calls setjmp and longjmp. Certainly, the C++ community is eager to pave the migration of C code to C++. Thus, the draft C++ standard avoids any gratuitous changes that require existing C code to be rewritten. (This is the well known principle of keeping C++ "as close as possible to C, but no closer.")
So the compromise is to permit a C++ program to contain calls to setjmp and longjmp. The only problem arises when a call to longjmp skips over destructor calls. (That's not likely to happen in code you first migrate to C++, since no C code calls any destructors.) The draft C++ standard simply decrees that this behavior is undefined.
The Function exit
Some people would also like to ban exit from C++ programs. The traditional behavior is for exit to call the destructors for all static data, but not for any automatic data still alive at the time of the exit call. This is not considered quite so bad as longjmp skipping destructors at least the program is terminating. But it's still not nice.There has been some discussion in support of having exit call those skipped destructors. Effectively, a call to exit would throw an exception that is caught by the agent that calls main. But such a change could add significant overheads to all C++ programs and is controversial for other reasons as well. Right now, its behavior is simply documented in its traditional form.
Storage Allocation Functions
Another minor archaism is the set of functions declared in <stdlib.h> that allocate and free storage. C++ provides operator new(size_t), operator delete(void *), and their array counterparts to do the same thing in a somewhat more structured fashion. (Constructors and destructors get called at the proper times, for one thing.)But once again, lots of migrated code calls malloc, free, and their buddies. Even a few pure C++ programs have occasion to call these functions directly, for reasons I will neither attack nor defend. So the question arises, what is the relationship between operator new and malloc? The answer comes in two parts:
- There is no guarantee that the two mechanisms are compatible. What you allocate with operator new you'd better free with the corresponding operator delete. What you allocate with malloc (or its buddies) you'd better free with free.
- If you replace ::operator new with your own version, it's okay to have it call malloc to buy storage. Put another way, malloc is guaranteed not to call ::operator new to do its thing. No fear of an infinite loop here.
String Functions
A handful of functions declared in (string.h)have a form that is, well, inconvenient to many a C++ program. Consider:
void *memchr(const void *s, int c, size_t n); char *strchr(const char *s, int c); char *strpbrk(const char *s1, const char *s2); char *strrchr(const char *s, int c); char *strstr(const char *s1, const char *s2);Each of these takes a pointer to constant argument that designates a sequence of characters, and returns a pointer to somewhere inside that sequence (or a null pointer). But the return type is not declared as a pointer to a constant type. C++ is rather more finicky than C about mixing pointers to constant and non-constant things. As a result, you find yourself writing type casts practically all the time when you use these functions from C++.The current solution is to replace each of these function signatures with two others, as in:
const char *strrchr(const char *s, int c); char *strrchr( char *s, int c);That should eliminate the need for most type casts. The declarations tend to be more honest in the bargain. I personally favor dropping the second form, however, as being even more honest. (That's the way the analogous wide-character functions are now declared in the normative addendum, described below.) But the LWG has yet to discuss this particular topic.
The Normative Addendum
One other document is included by reference in the draft C++ library standard. As I've discussed several times in these pages, the ISO C committee WG14 is finalizing a normative addendum to the C Standard. (See References [3]-[6].) Once adopted, that document will become Amendment 1 to the ISO C Standard. It provides numerous additions to the Standard C library, to support reading, writing, and manipulation of large character sets. The LWG has bowed to the inevitable and agreed to include it as part of the C++ library, even before it is formally approved.So far, only one qualifier is spelled out for Amendment 1 contributions. Part of that Amendment is the addition of a new header, <iso646.h>, which defines a number of macros as aliases for certain operators. (These are mostly the operators that are hard to write in national variants of ISO 646, which often replace the conventional graphics for C operators such as /.) Thus, for example, the macro or_eq expands to /=.
But this presents a problem in C++ similar to wchar_t. It seems the joint committee has already made keywords of all the macros defined in <iso646.h>. Thus, the header need never be included, and implementors must be sure make any such header safe for a C++ environment.
Finally, the LWG would rather not stop at simply adopting Amendment 1. That would give short shrift to all those programmers who want to use the new wide-character support in writing programs. Why have them descend into C all the time? It would be much better to extend iostreams, at the very least, to read and write the new wide-character streams.
I mention this only as a teaser. The LWG is still actively discussing how best to extend iostreams in this direction. I have implemented one method, and will naturally argue for it at future meetings. Other people, naturally enough, have other ideas. When the topic becomes clearer, I will revisit it in a future column.
Bibliography
[1] Margaret A. Ellis and Bjarne Stroustrup, The Annotated C++ Reference Manual, Addison-Wesley, 1990.[2] ISO/IEC 9899:1990, International Standard for Programming Language C.
[3] P.J. Plauger, "Standard C: Formal Changes to C," CUJ April 1993.
[4] P.J. Plauger, "Standard C: Large Character Set Support," CUJ May 1993.
[5] P.J. Plauger, "Standard C: Large Character Set Functions," CUJ June 1993.
[6] P.J. Plauger, "Standard C: Wide Character Streams," CUJ July 1993.