Dan is the founder of Saks & Associates. He serves as secretary of the ANSI and ISO C++ standards committees. He is also contributing editor for The C Users Journal and columnist for The C++ Report. He and Thomas Plum are coauthors of C++ Programming Guidelines, and codevelopers of Suite++: The Plum Hall Validation Suite for C++. You can reach him at 393 Leander Dr., Springfield, OH 45504, by phone at 513-324-3604, or at dsaks@wittenberg.edu.
The ANSI C++ technical committee, X3J16, has been working on a formal C++ standard for almost three years. But the committee has yet to release a draft standard to the public, and an official standard is still several years away. Nonetheless, the committee's decisions have already affected the C++ compilers you use today, and will certainly shape the compilers and libraries you use in the future.
In this article, I'll explain how the C++ language definition is changing as it evolves into a standard. I'll cover the committee's major technical decisions and describe various problems that are yet unsolved. I'll also look at the prospects for a standard C++ library.
X3J16 is an ANSI technical committee, but it isn't writing just the U.S. national standard for C++. X3J16 is working closely with the ISO C++ Working Group, WG21, to develop an international C++ standard. (The working group's full name is ISO/IEC JTC1/ SC22/WG21. See the text box entitled, "Who's Standardizing C++?" for a more detailed explanation of the standardization process.) International programming-language standards typically start as national (read "ANSI") standards. But U.S. programming-language standards reflect American natural language and culture. Although programmers around the world may tolerate programming languages with English keywords, many need to express parts of their programs, like string literals, in their native languages. Not unreasonably, many also want to write identifiers and comments in their own language.
But many natural languages, even those based on the Roman alphabet, use more than just the 26 characters in the English alphabet. European keyboards have letters with accents and umlauts (like à and ö, respectively), combination letters (Æ), and other characters. Japanese Kanji keyboards have hundreds of keys for composing thousands of different characters.
European computer systems usually make room for native-language characters by omitting certain punctuation characters. For example, Danish keyboards, displays, and printers replace the [ and ] with Æ and Å, respectively. The C program in Example 1(a) comes out on a Danish printer looking like Example 1(b).
(a)
int main(int argc, char *argv[])
{
if (argc < 1 || * argv[1] == '\0') return 0;
printf("Hello, %s\n", argv[1]);
return 0;
}
(b)
int main(int argc, char *argvÆA)
æ
if (argc < 1 00 *argvÆ1A == '00') return 0;
printf ("Hello, %s0n", argvÆ1A);
return 0;
å
(c)
int main(int argc, char *argv??(??))
??<
if (argc < 1 ??!??! *argv??(1??) == '??/0') return 0;
printf ("Hello, %s??/n", argv??(1??));
return 0;
??>
(d)
int main(int argc, char *argv<::>)
<%
if (argc < 1 or *argv<:1:> == '??/0) return 0;
printf ("Hello, %s??/n", argv[1]);
return 0
%>
In a serious attempt to accommodate the needs of C programmers in non-English speaking cultures, the ANSI C committee added the wide character type wchar_t, multibyte character literals, trigraphs, and locales. But their efforts fell a little short. ISO adopted ANSI C as the ISO standard, but the ISO C working group spent another few years preparing an addendum to the standard that, among other things, provides better support for linguistic and cultural variations.
Even before X3J16's first meeting in December 1989, SC22 expressed interest in an international standard for C++. However, some SC22 members were concerned that previous ANSI programming-language standards, including C, didn't meet the needs of the international community, even though those ANSI standards were adopted as ISO standards. Sympathizing with their concern, X3J16 decided to try to produce a standard that ISO would accept without change as an international standard.
At SC22's request, X3J16 wrote a project plan for SC22 to create WG21 to work with X3J16 in developing the C++ standard. Also, X3 changed X3J16's charter from type D (domestic standards development) to type I (international standards development). This means that X3J16 is developing the ISO C++ standard with the intent that it will also become the ANSI standard.
WG21 held its first meeting in June 1991. All X3J16 meetings since have been joint meetings with WG21. I call the joint committee "WG21+X3J16." We meet three times a year, typically in March, July, and November. Each meeting lasts four and one-half days.
Programming-language standards aren't written from scratch; the committee starts with one or more "base documents." X3J16 selected two base documents:
When the committee selected the base documents, there was no ISO C standard. We decided that when the ISO C standard becomes available it will be our third base document. The ISO C standard turned out to be the same as ANSI C, but ISO C will soon have an addendum that WG21+X3J16 will have to consider.
The current draft of the C++ standard has no formal standing. We don't even call it the "draft;" we call it the "Working Paper." The project editor used the PRM as the first version of the Working Paper, and has spliced parts of the C standard in as needed.
The editor produces a new Working Paper three times a year, about two months before each meeting. At each meeting, the committee approves the document as the basis for future work. Someday we'll approve the Working Paper as a draft and make it available for public comment. I don't know when that day will be.
AT&T C++ 2.1 does not implement templates, so the PRM does not describe them. However, the ARM describes templates (although the description is labeled "commentary"). The committee added the ARM's chapter on templates (less the annotations) to the Working Paper. At the time, there weren't any commercially available compilers supporting templates. Now there are several. The remaining discussion of templates assumes you are familiar with basic template features. [Editor's Note: For more information on templates, see "Templates in C++" by Nicholas Wilt on page 29 of this issue.]
X3J16's Formal Syntax working group identified problems with the template syntax. All the problems stem from the choice of <and> as delimiters for template argument lists. A C++ parser might have trouble distinguishing when <and> are delimiters and when they are operators.
For example, the formal parameter of a template can be a type, as in:
template <class T> class list;
or it can be a value, as in:
template <int n> class buffer;.
For class buffer, the actual argument in a template instantiation can be any integer expression, as in:
buffer<10>b1; buffer<2*BUFSIZ>b2;
It can even be an expression containing the <and> operators, as in:
buffer<x>y>z> b3;.
A C++ parser must be prepared to look arbitrarily far ahead to determine that x>y>z is the template argument. The committee briefly considered using parentheses, braces, or brackets for template argument-list delimiters, but decided to stick with <and>. The Formal Syntax group has suggested alternative grammars for C++ that correct the problem, but the committee hasn't selected one yet. In the meantime, if you limit your template arguments to simple expressions, you shouldn't have any trouble with today's compilers.
As with templates, the PRM doesn't have a chapter on exceptions, but the ARM does. So the committee adopted the ARM's Chapter 15 on exception handling, less the annotations, of course. Because exception handling is not yet generally available, I'll take a moment to explain it.
Exception handling is a mechanism for responding to events that may disrupt the normal flow of a program. The C++ exception-handling mechanism is designed to handle synchronous exceptions, like resource-allocation failures or values out of range, rather than asynchronous events like device interrupts. The syntax for exception handling uses three new keywords -- catch, throw, and try. My stock example in Figure 1 shows how exceptions work.
int f()
{
try
{
// the compound statement part
int n = g();
// ...
return n;
}
catch (int x) // a catch clause
{
cerr << "number" << x << " happened\n";
return x;
}
catch (char *x) // another catch clause
{
// respond in some other way ...
return -1;
}
}
int g()
{
return h();
}
int h()
{
if (something_wrong)
throw 2;
// keep going ...
}
The entire body of function f is something called a try block. A try block consists of a compound statement followed by one or more catch clauses. The catch clauses handle exceptions that may occur while executing the compound statement. Executing a throw expression triggers ("throws") an exception. The throw may occur in the compound statement itself or in functions called from the compound statement. This particular try block in f calls g, which calls h, which conditionally throws an exception.
Throwing the exception terminates the execution of both h and g, and returns control to a catch clause in f. f has two catch clauses, only one of which can handle the exception. The program selects the catch clause by matching the type of operand in throw with the type of expression in catch. In Figure 1, the operand of the throw expression in h is of type int, so the first catch clause in f catches that exception.
Some of you may recognize that exception handling is similar to the functionality of the standard C setjmp and longjmp functions. However, exceptions are safer and more powerful than setjmp and longjmp. If either g or h declared local objects with destructors, throwing an exception in h invokes the destructors for those local objects as it "unwinds" the stack on the way back to the catch clause in f. longjmp merely discards intervening stack frames as it returns to the setjmp point without calling destructors, leaving resources used by local objects in an uncertain state. Also, C++ exceptions can throw objects of any type to a handler, but longjmp only transmits integer values.
The committee had little doubt about the need for exception handling in C++, but there was considerable debate about the underlying execution model. The question was whether exception handling should only support termination, as described in the ARM, or support resumption instead. With resumption, a catch clause can return control directly to the throw point after handling the exception. With termination, the only way you can return to the throw point is by repeating the normal flow of execution. Bjarne Stroustrup summarized the question as: Does throwing an exception mean "get out" or "get help?" The committee opted for simplicity and stayed with the termination model.
When we added exception handling, we also relaxed the rules for matching throw expressions with catch clauses to allow a wider combination of type matches. We also added a small section on access rules for thrown objects.
As I explained previously, programmers in non-English cultures have an added burden programming in C++ because C++ uses punctuation characters that have been replaced by native-language characters. (C programmers have this same problem.)
C++ uses ASCII as its character set. ASCII is the U.S. variant of the ISO 646 standard character set. ISO 646 uses fewer character codes than ASCII. Each national variant of ISO 646 can use the unused codes for native-language characters or symbols. In ASCII, characters like {}[]|^~ occupy the unused ISO 646 codes, and C++ uses these characters heavily.
C++ programming on systems using national variants of ISO 646 might be easier if programmers could write C++ programs using only invariant ISO 646 character set, and avoid the troublesome characters.
Standard C's trigraphs don't offer a particularly readable solution to this problem. Trigraphs are three-character sequences that are alternative representations for the troublesome characters. For example, the trigraphs for [and] are ??(and ??), respectively. Using trigraphs, the C program in Example 1(a) looks like Example 1(c).
But trigraphs were never intended for humans to compose C source code. They were designed to aid mechanical translation into C.
Keld Simonsen, the Danish representative to the ISO C and C++ Working Groups, devised a set of digraphs (two-character symbols) and new keywords as alternate spellings for the offending C and C++ symbols, shown in Table 1. Using these symbols, Example 1(a) looks like Example 1(d). Notice that you still need the trigraphs inside the character and string literals.
Existing Alternate
-----------------------
[ <:
] :>
{ <%
} %>
& bitand
&& and
I bitor
II or
^ xor
~ compl
&= and_eq
I= or_eq
^= xor_eq
! not
!= not_eq
The ISO C addendum specifies the identifiers in Table 1 as macros defined in a new standard header, iso646.h. Thus, you can continue using those identifiers as user-defined identifiers in C as long as you don't include iso646.h. However, the C++ Working Paper adds the identifiers to the set of reserved words. This means that, at some future date, you will not be able to use those identifiers as user-defined identifiers in any C++ program. Consider yourself warned.
New features to C++ draw a lot of attention, but the real work of the standards committee is ironing out the flaws in the language description. These flaws include inconsistencies, ambiguities, and minor omissions. Here are a few of the flaws in the ARM corrected by the Working Paper.
The name S in struct S {...}; is called a "tag name." In C, tags are not type names. That is, you cannot declare S x;. You must write it as struct S x;. In C, you can turn a tag into a type name using a typedef, as in typedef struct S S; and then you can write just S x;.
In C++, tags names are automatically both tag names and type names. For compatibility with C, C++ accepts (and ignores) typedefs that equate type names with tag names.
Some C programmers mimic the C++ behavior by always declaring their structs using typedef struct S {...} S;.
Other C programmers don't even bother with the tag name and write
typedef struct {...} S;.
Section 7.1.3 of the ARM states, "An unnamed class defined in a typedef gets its typedef name as its name. For example, typedef struct {/* ... */} S; // the struct is named S." But it's not clear whether a member function with the same name as the typedef is a constructor. In other words, given the code in Example 2(a), is A::A a constructor?
The committee decided that the answer is no. The commentary in the ARM makes it clear that this rule was only meant to give the class a name for linkage. So the committee changed the rule to say, "An unnamed class defined in a typedef gets its typedef name as its name for linkage purposes." Our intent is that the previous declaration should be equivalent to Example 2(b), making it clear that A::A can't be a constructor. We also intended that this class not have a destructor, but those words are not yet in the Working Paper.
(a)
typedef struct {
A();
} A;
(b)
struct dummy_name {
A();
};
typedef struct dummy_name A;
Another problem in the ARM appears in Section 6.7. It says, "An auto variable constructed under a condition is destroyed under that condition and cannot be accessed outside that condition." In Example 3, you cannot access j after the first If statement, because either j was never created (if i is 0), or j has already been destroyed. This rule creates the only situation in C++ where the lifetime of a named object ends before it goes out of scope. That is, you can see j but you can't touch it.
if (i)
for (int j = 0; j < 100; j++) {
// ...
}
if (j != 100) // error: access outside condition
// ...
The Working Paper eliminates this anomaly by changing the rules to say the statement in an If, If-Else (both branches), Switch, While, Do, or For statement implicitly defines a local scope. Example 4(a) is now equivalent to Example 4(b).
(a)
if (i)
for (int j = 0; j < 100; j++) {
// ...
}
(b)
if (i) {
for (int j = 0; j < 100; j++) {
// ...
}
}
// j is no longer in scope
The committee's Core Language working group has spent a great deal of time trying to pin down the rules for looking up names (identifiers) as they are referenced. The working group used the example in Figure 2 to illustrate the problem.
1: struct X {
2: static int i;
3: struct Y {
4: int i;
5: void f();
6: };
7: };
8: int i;
9: void X::Y::f() {
10: i = 5;
11: }
The question is, to which declaration of i does i=5; on line 10 refer? It could be the X::i on line 2, or X::Y::i on line 4, or the global ::i on line 8. The ARM doesn't say. The Core Language working group informally agreed that the answer is X::Y::i. They also agreed that if you comment out line 4, then the answer is X::i.
The committee has yet to formalize these rules, but it appears that they will be something like the following. To look up a name inside the definition of an out-of-line member function:
1: int i;
2: struct X {
3: void f();
4: };
5: struct Y {
6: static int i;
7: friend void X::f() {
8: i = 5;
9: }
10: };
Although many C++ users would like the C++ standard to include an extensive class library, that's not likely to happen. The job is just too big. WG21+ X3J16 has wisely limited itself to a few critical library components:
Language support. Functions and classes required for runtime support of the C++ language. This includes standard implementations for the free-store management functions defined in the header new.h: new, delete, and possibly set_new_handler. It also includes exception-handling support defined in a new header exception.h: functions terminate, set_terminate, unexpected, set_unexpected, and a class SUE all similar to those described in Section 15.6c of the ARM.
Input/Output. A simplified version of the iostream library distributed with AT&T's cfront 2.0, with additional support for wide characters, multibyte strings, and locales. The working specification does not use templates, but does use exception handling.
Standard C. The Standard C Library adapted to C++.
Strings. One or more classes to support variable-length strings. Look for strings of char (ordinary "narrow" characters) and wchar_t (wide characters).
Simple foundation classes. Classes like bit sets, bit strings, and a template for general dynamic arrays.
At present, the Working Paper only includes the C++ version of the Standard C Library. Some aspects of the C library don't mesh well with C++, so the C++ standard makes some minor adjustments.
For example, the C declaration for the strchr function in string.h opens a "hole" in the library's type safety. The C declaration for strchr is:
char *strchr(const char *s, int c);
strchr returns the address of a character in the string addressed by s (or a null pointer). This means that strchr returns the address of a constant character as the address of a modifiable character. This allows accidents like that in Example 5, which tries modifying name even though it's declared constant. memchr, strpbrk, strrchr, and strstr share this problem.
const char name [] = "Nancy"; char c; ... *strchr (name, c) = tolower(c);
The C++ library declares all of these functions as an overloaded pair with extern C++ linkage with self-consistent arguments and return types. For example, the C++ library declares strchr as in Example 6.
extern "C++" const char *strchr(const char *s, int c); extern "C++" char *strchr( char *s, int c);
WG21+X3J16 has added a variety of minor extensions to C++. The details are rather long, so here's a quick summary:
The C++ standard is shaping up nicely, but it's still years away. I hesitate to pick a year.
If you would like to join X3J16, contact Stephen D. Clamage, Vice-Chair, TauMetric Corp., 8765 Fletcher Pkwy., Suite 301, La Mesa, CA 91942 (619-697-7607 or steve@taumet.com); or, for frequent reports on the standard, refer to my column in The C++ Report.
American National Standard X3.159-1989--Programming Language C.
Ellis, Margaret A. and Bjarne Stroustrup. The Annotated C++ Reference Manual. Reading, MA: Addison-Wesley, 1990.
Unix System V AT&T C++ Language System Release 2.1 Product Reference Manual, Select Code 307-159.
ANSI is the American National Standards Institute, a trade association that sets industrial standards for a wide range of products, such as bar codes, bicycle helmets, heating and air-conditioning equipment, and plywood. ANSI is not a government agency and its standards are not binding by law, unless, of course, a governmental agency adopts an ANSI standard for regulatory or procurement purposes.
ANSI doesn't actually write standards; it establishes procedures for writing and approving standards, and then delegates most of the work to industry-specific standards committees. X3 is the ANSI-accredited committee that administers standards development for information processing systems. CBEMA (the Computer and Business Equipment Manufacturer's Association) funds and staffs X3's offices at its Washington, DC headquarters. X3 chartered X3J16 to develop an ANSI standard for C++.
ANSI and many other national standards bodies are members of ISO (the International Standards Organization). ISO works jointly with yet another standards group, IEC (the International Electrotechnical Commission). ISO and IEC formed a joint technical committee, JTC1, to standardize information technology. JTC1's subcommittee SC22 oversees the development of international programming-language standards. SC22 created WG21 to develop the international C++ standard.
X3J16's officers are:
WG21+X3J16 does most of its technical work through working groups that analyze technical issues and make recommendations for resolving them. The working groups are:
Copyright © 1992, Dr. Dobb's Journal