April 1993/Standard C

Columns

Standard C

Formal Changes to C

P.J. Plauger

P.J. Plauger is senior editor of The C Users Journal. He is convenor of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Standard C Library, published by Prentice-Hall, and ANSI and ISO Standard C (with Jim Brodie), published by Microsoft Press. You can reach him at pjp@plauger.com.

Current Status
The last meeting of the C standards committees, ISO JTC1/SC22/WG14 and ANSI-authorized X3J11, occurred jointly last December near Dulles Airport. I left that meeting with a warm sense of accomplishment and a humongous amount of homework. Both were a direct result of having the meeting go the way I'd hoped, for a change. In soap operas, senators and board chairpeople always finagle their political goals against all odds. In the real world, we lowly Convenors of standards committees mostly go with the flow.
I summarized the administrative highlights in my "Editor's Forum" last issue. (That was in CUJ March 1993, but see also the "Editor's Forum" for January 1993 and for November 1992.) Here again is a brief synopsis of what happened:

WG14 finally voted out an amendment to the C Standard. We were charged several years ago by SC22, our parent committee, to produce such a "normative addendum" to correct several perceived flaws in the coverage and expression of the C Standard. With a bit of eleventh-hour compromising, we finally got agreement within WG14. Now the amendment must survive at least two rounds of balloting within SC22 before it becomes formal.

SC22 finally established sensible procedures for interpreting and correcting ISO programming language standards. Various authorized agencies issue Defect Reports to the Convenor (me again). She/he (I) must log them, acknowledge them, and submit them to the Working Group to develop a response. A Technical Corrigendum patches the standard, while a Record of Response simply explains the standard. (I helped develop these procedures at the SC22 plenary in Finland last August.)

ANSI has adopted the ISO C Standard verbatim as its own, replacing the original C Standard, with slightly different formatting. That was a prelude to having X3J11 start an I-Project to track development of the normative addendum. The English of all this is that responsibility for maintaining the C Standard has passed from ANSI to ISO.

X3J11 recognizes that its principal current business is to apply its expertise in interpreting the C Standard. WG14 has requested that X3J11 develop initial interpretations and X3J11 has graciously agreed to do so. WG14 retains ultimate responsibility for publishing the responses, as I described above.
For these and other reasons, I have resigned as Secretary and member of X3J11. I get all the glory I need as Convenor of WG14, thank you. And I seem to have more than enough work as well. That homework I mentioned earlier has occupied me for over a month, off and on, since the last meeting. The responsibility lies with me to prepare the normative addendum for SC22 balloting as a Committee Draft (CD). I also inherit several years of X3J11 interpretations as a huge batch of Defect Reports to log and organize.
My aim in writing this report is not to win your symphathy. (But I'll take any I can get by the way.) Rather, it's to spell out the current formal activities in the C standard arena. Note that this report does not cover the work of X3J11.1 (a.k.a. the Numerical C Extension Group, or NCEG). That's because the charter of that subcommittee is to produce only a Technical Report (TR). In X3 land, a TR does not have the force of a standard. It is simply advisory. You'll hear more about X3J11.1 in future installments of this column.
I need to cover a lot of administrivia first, so please bear with me. I promise to give you a few technical details of what's happening to Standard C before I'm done.

The Normative Addendum
The normative addendum has, until recently, consisted of three contributions, each put forth by a separate member body:

The UK contribution endeavors to clarify several dozen areas where people found the C Standard unclear. Some issues arose from queries within the UK. Others arose from early Requests for Interpretation (RFIs) submitted to X3J11. All took the form of examples to be added to the C Standard. (Examples are part of the C Standard, but don't affect the definition of the C language. Hence, they are a good vehicle for adding clarification without running the risk of inadvertently changing the language.)

The Danish contribution adds macros and several alternate ways to spell some of the punctuators and operators in C. The idea is to provide a way to write C source code more readably in character sets that commandeer things like braces and the tilde character for other graphics. The C Standard includes trigraphs for this purpose, but nobody pretends that using them makes for readable code. Even if you can't replace all trigraphs, proponents argue, any improvement in readability is worth supporting.

The Japanese contribution adds extensive support for manipulating large character sets in C. The C Standard provides only the bare minimum of the functionality you need to play with Kanji or other large character sets. The Japanese delegation has developed a much more ambitious extension to C for this purpose.
The biggest change we agreed to last December was to delete the UK contribution from the normative addendum. Don't think we considered it unimportant — quite the contrary. Rather, we observed that the new machinery for handling Defect Reports offered more apropos vehicles for publishing the work of the UK delegation. So we threw this piece over the wall, as it were.
The other two pieces got final approval at the meeting. Both, however, suffered from a serious shortcoming. They needed to be translated into better "standardese." The Danish contribution evolved as a one- or two-page statement of intent. The Japanese contribution was remarkably refined, given the difficulty that English presents to the Japanese. But still there were places where the wording was a bit rough, or where more formal jargon was called for.
Lucky for me, Dave Prosser took it upon himself to correct these problems. As the final Redactor (editor) of the C Standard, Dave speaks standardese like an ISO bureaucrat. He also understands C better than practically anybody else I know. By the time he completed a pass over the normative addendum, I had little left to do except carp at details, then make a stack of review copies.
As of this writing, a review committee is checking our work. I will then submit the document to SC22 for CD balloting, once we get everyone's approval. By the time you read this, the balloting should be under way. My goal is to have the balloting period close shortly after the next X3J11 meeting (New York City in May), and before the next WG14 meeting London in July). That's all part of a little game of brinksmanship that we Convenors play all the time.

Defect Reports
Meanwhile, back at the ranch, I have this great stack of interpretations from X3J11. Four dozen Requests for Interpretation have percolated through ANSI official channels since the C Standard was approved in 1989. Over the years, X3J11 has patiently addressed and debated every one. The result has been two Technical Information Bulletins (TIBs) summarizing the RFIs and committee responses.
I described some of the earliest RFIs and responses in these pages way back when. (See "Standard C: A Matter of Interpretation," CUJ June 1990, and "Standard C: Interpreting the Nasties," CUJ July 1990.) Other people have also discussed some of the interpretations here and in other publications. Sadly, however, the TIBs have yet to be officially published by ANSI.
Now it looks like they never will be. An administrative foulup or two delayed the publication of TIB #1. Then ANSI switched over to the ISO C Standard and the situation changed. No longer was ANSI obliged to interpret the C Standard, since it was now an ISO document. Worse, it wasn't clear whether ANSI was even permitted to issue interpretations, under the agreement with ISO. TIB #2 sailed straight into the same swamp. Now both are mired in bureaucratic uncertainty.
We didn't want to lose all those probing questions to public view. And we certainly didn't want to waste the carefully crafted responses. So I accepted the obligation to treat each of the ANSI RFIs as a separate Defect Report. I've ensured that Defect Reports #001 through #048 correspond to ANSI RFIs #01 through #48. (And I've already been handed Defect Report #049 through a separate channel, even before the dust has settled on the changeover.)
I've built this 100-page (typeset) Defect Report Log. It contains the original ANSI RFIs, each accompanied by a "suggested response" — the response crafted by X3J11 for publication in a TIB. And remember all those examples from the UK contribution of the normative addendum? Well, I dealt them out as appropriate among the RFIs. Each example is labeled as a "suggested correction" to the C Standard.
That's not the end of it, of course. X3J11 developed most of its responses under a severe constraint. We were originally told that we could not change a single word of the C Standard. Even if a slight change of wording, or an added sentence, could clarify our intent without changing the language definition, we couldn't make the change. Thus, we put a lot of energy into rationalizing that you could read the C Standard the way we intended. That's not the best way to respond to a serious complaint from a confused questioner.
Now WG14 has machinery for making such clarifications, as Technical Corrigenda. The sentiment among many members of both WG14 and X3J11 is that we should not waste this opportunity. We could simply publish the two ANSI TIBs as an ISO Record of Response. That would get the interpretations out to the public (at last) fairly quickly. But it would leave us in the position of rationalizing bad standards language instead of fixing it.
So my task instead is to circulate this Defect Report Log among the membership of both committees. I hope that X3J11 can give us prompt guidance about the best way to respond to each Defect Report. Either we accept the explanation from the ANSI TIB, we include the example from the UK contribution, or we develop amended wording to clarify the C Standard. (I like to think that only one of these three options will suffice in each case.)
I hope for prompt guidance because this process has already dragged on for too long. The sooner we can clarify the gray areas of Standard C for the world at large, the happier I'll be.

The Danish Contribution
Now for a few technical details. The Danish contribution requires that all implementations of Standard C add a header called <iso646.h>. (The name honors the ISO standard which corresponds to ASCII, except that it permits certain graphics to be substituted for those we Americans know and love.) Listing 1 shows the contents of this header.
Note that you can use this file as is with any variant of ISO 646. It just prints funny on some national variant of that character set. The idea, in fact, is to confine most of the funny printing to just this header (which you should seldom feel moved to print.) You can then write:
   if (x != 0 || x != XMAX)
      .....
as
   if (x ne 0 or x ne XMAX)
      .....
and the code should be readable with any national variant. If that is not important to you, don't include the new header. Then none of the new macros conflict with any names you choose.
Besides this header, all implementations of Standard C must also recognize alternate spellings for six tokens:
<: :> <% %> %: %:%:
/* are the same as */
[ ] { } # ##
/* respectively */
Because they are just alternate ways to spell the same token, you can balance <: with }, if you want to be perverse. And if you "stringize" one of these alternate forms, you get a different result than when using the older token (or its trigraph form). Thus:
#define STR(X) #X
printf(STR( <: ) STR( { ) STR( ??< ));

prints <:{{.
Before you start writing letters, let me make a few observations:

Not all of these alternate forms are needed to solve problems with ISO 646, despite the name. Some help with EBCDIC as well.

None of these alternate forms help much with using certain changeable characters inside character constants and string literals. You still need to use trigraphs sometimes.

bitand is hardly a great name to use for & as the address-of operator.

You can improve on these names all sorts of ways. In fact, many people have. In further fact, suggesting alternative lists has been a popular indoor sport at C standards meetings for several years now.
The point is that this addition is not perfect. I'm pretty convinced after years of trying, though, that perfection is unattainable here. This particular approach is good enough. It can also be argued that the problem this addition solves is small and rapidly getting smaller. Others are pretty convinced, though, that it is still a problem worth solving. I believe the clutter is small enough that the rest of us should be tolerant.

The Japanese Contribution
By far the largest part of the normative addendum is the Japanese component. I count one new macro, three new type definitions, and 60 (!) new functions. The basic idea is to provide a complete set of parallels between functions that act on one-byte characters and functions that act on the newer wide characters. It's too bad we have to invent a whole set of variant names for these new functions. (That's one of the ways that C++ has improved code hygiene over C.) But I believe the time is ripe to introduce better wide-character support.
Windows NT traffics consistently in (16-bit) wide characters. It exemplifies the new trend toward supporting large and varied character sets. Multiple sets of 256 characters just don't cut it for systems and applications with an international market.
I plan to devote next month's column to a detailed look at the Japanese contribution. It's too big to due justice to in the space remaining here. For now, I'll simply summarize what it contains:

A set of functions analogous to those in <ctype.h> lets you classify wide characters much the way you do conventional ones. You can also define your own categories of characters and test for them.

A set of functions analogous to those in <string.h> lets you manipulate wide-character strings much the way you do conventional ones.

A set of functions analogous to those in <stdlib.h> lets you convert numeric wide-character strings much the way you do conventional ones.

Additions to the wide-character conversion functions in <stdlib.h> give you much tighter control over the conversion process.

A function analogous to strftime in <time.h> lets you encode time information as a wide-character string much the way you do conventional ones.

Additional conversion specifiers for the existing print and scan functions let you convert between occasional multibyte sequences in files and wide characters internal to the program.

A set of functions analogous to those in <stdio.h> lets you manipulate "wide-character streams". These are files of multibyte characters that appear internally as sequences of wide characters.
Some of this stuff sounds redundant, and it is. Still, there are good reasons for each of the additions. I'll do my best to convince you of that next month.