August 1990/Standard C

Columns

Standard C

Library Ground Rules

P.J. Plauger

P.J. Plauger has been a prolific programmer, textbook author, and software entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standards committee. His latest book is Standard C which he co-authored with Jim Brodie.

History
X3J11 began its deliberations in 1983 amid many uncertainties. One of the largest areas of uncertainty was the library. Kernighan and Ritchie, that venerable de facto standard for the C programming language, mentioned library functions only in passing. The language definition in Appendix A said nothing about the library. Nor was there an "Appendix B" to fill in the blanks. What was said in the running text was heavily influenced by the UNIX programming environment. After all, that was where C was born and that was where Kernighan did all his work.
A continuing tension in the early years of X3J11 was this gap of perception between UNIX and non-UNIX communities. The former felt proprietary about C. It was rather as if any other implementation of C were somehow contrived and substandard. The latter community, on the other hand, felt responsible for the commercial success of C. It was all those IBM PC and Motorola 68000 programmers who were making C an important force in the world.
The differences were felt most were in the libraries that grew up around each implementation of C. People writing in C under UNIX wanted to keep C as close as possible to its roots. They did not want to lose the clean interface they had come to love. People writing for other specific operating systems wanted to access their special capabilities. They did not want to make their systems slavishly match the idiosyncracies of UNIX. A few of us were trying to keep C highly portable across many environments. We did not want to sacrifice the power of C to keep it portable.
An earlier decision of mine did not help matters. When I wrote the library for the Whitesmiths C compiler, there was no clear standard. Most utilities under UNIX were written using the original PDP-11 C library. A few daring souls were fiddling with "streams" and other niceties added when C migrated off the PDP-11. Neither library was as complete, consistent, or compelling as one could wish for a major language.
So I swallowed hard and developed yet a third set of functions. The Whitesmiths C library had no printf or scanf. Instead it had putfmt and getfmt. Format codes were more complete and more consistent. So were the names of I/O and string functions. There were added functions for parsing arguments on command lines and for walking lists of filename arguments. Many people agreed that it was a nice job of re-engineering.
Had I chosen to allow unrestricted use of the Whitesmiths C library, it may have been more widely adopted. As it was, it had a constituency just large enough to be perceived as a threat. I found out years later that AT&T was nervous in the early days of X3J11. They were afraid that I would push for my library over the one in UNIX. As it turned out, I did put forth a number of features that had been proved in that library. Some were even adopted. But I could see that even in 1983 many people thought that printf was practically a keyword in C.
Adding to the excitement was my decision to volunteer as chair of the library subcommittee. To some, this was tantamount to putting the fox in charge of the henhouse. My motives were more noble than your typical fox, but I understand the apprehension. In my enthusiasm for technically inventive solutions, I did not always behave with the disinterest of a good subcommittee chair.
Nevertheless, I believe that the library portion of the C standard turned out pretty good. I claim only a small share of the credit for that. Many people labored long and hard developing that portion and cleaning it up. The folks at /usr/group get high marks for getting the PDP-11'isms out of the UNIX library descriptions. Much of our work consisted only of getting the UNIX'isms out of their product.

Misperceptions
The ANSI C standard has a lot to say about how the library looks to the user. It is no longer sufficient just to provide printf, scanf, the usual math functions, and a passel of string and character manipulation functions. Many more functions are required. And gone are the days when each site could toss in a few dozen implementation-specific functions. Many more constraints exist on what names must not be visible.
As chair of the library subcommittee of X3J11, I fought hard for many of these requirements. My experience implementing C on numerous and varied operating systems taught me that most of the requirements were important. If Standard C was to be both powerful and portable, many of the variations present in C in the early 80's would have to be eliminated. Too many critical variations resided in the C library.
I find it mildly annoying that some of these requirements are widely misunderstood. The Standard C library, for example, is required to have a fairly clean name space. The library defines a couple hundred external names. Beyond that, certain classes of names are reserved for use by the implementors. All other names belong to the users of the language.
Most implementations have to change to satisfy this requirement. For example, UNIX has low-level I/O functions with names such as open, close, read, write, and lseek. These functions are not part of the Standard C library. UNIX traditionally implements the stream functions in terms of calls to these low-level I/O functions. That is no longer permissible under Standard C. A conforming C program must be able to define a function (or data object) called open with no fear that it will interfere with the correct operation of fopen.
I have heard cries that this requirement "breaks" the UNIX implementation of C. It does not. It does require that fopen and its brethren call a different set of low-level I/O functions. An implementor must make a copy of the code for open and rename it _open. fopen must then call the new function. The implementor must also do the same for all the other low-level I/O functions used by the Standard C functions. The problem is solved.
Some people mistakenly assume that open must be banished from the library. It does not. A program that refers to open and provides no definition will load the library function, just like in the good old days. A program that defines its own version of open will have no occasion to pull that function off the library. If no part of the Standard C library expects that particular function, no harm is done. You can always safely "knock out" isolated functions from a library.
Still others complain that the clean name space requirement is a new and onerous burden on implementors. It is not. There have always been de facto requirements on what functions should be present in a C library. Otherwise, people who try to move serious applications written in C say bad things about implementations that have missing bits. Those same people continually bark their shins on furniture that is present in the library that they don't expect. I have repeatedly heard the same complaint from this important constituency. Name space "pollution" has been the single largest source of unexpected problems in writing large portable C applications.
The C standard has merely shone a harsh light on several existing problems. And it has institutionalized solutions that were available only spottily in the past. In this regard, Standard C says nothing really new. It has simply codified the best of existing practice.
Many people, however, have formed a strong emotional attachment to their own personal image of C. Where Standard C appears to distort that image, these people react emotionally. Fine points get lost among strong feelings. That's why it is important to keep clarifying the misunderstandings that crop up. It's not sufficient that some of us believe the C standard to be a good one. We must show the ardent fans of C that their language has not been damaged beyond repair.
One way to show that the C library is not impossible to implement is to show some ways to implement it. That's what this column is about. I don't expect to quell all criticism of the decisions we made in X3Jll, but I do hope to pass on some useful advice to implementors and users alike.

Name Space Issues
I have already touched on the major issues concerning names in the library. For completeness, however, I will spell out the requirements of Standard C in this area.
First, the library defines a long list of names. The language proper defines a few more. With rare exception, the programmer had better not use any of these names except for its predefined purpose. The programmer can, for example, define a macro with the same name as a keyword. (Just don't do it before you've included any standard headers your program needs.) The programmer can define a name with internal linkage or no linkage that matches a name defined with external linkage in the library. While both practices can cause maintenance problems (for the programmer), the implementor must still support them.
The implementor's first job, naturally, is to provide all those definitions. His or her second job is to define each name in its proper name space. You can't cut corners here or you will run afoul of some programmer pushing the edges of the envelope.
Figure 1 shows the name spaces that exist in a C program. It is taken from P.J. Plauger and Jim Brodie, Standard C, Microsoft Press (1989). The figure shows that you can define an open-ended set of name spaces:
Two new name spaces are created for each block (enclosed in braces within a function). One contains all names declared as type definitions, functions, data objects, and enumeration constants. The other contains all structure, union, and enumeration tags.
A new name space is created for each structure or union you define. It contains the names of all the members.
A new name space is created for each function you define. It contains the names of all the labels.
You can use a name only one way within a given name space. If the translator recognizes a name as belonging to a given name space, it may fail to see another use of the name in a different name space. In the figure, a name space box masks any name space box to its right. Thus, a macro can mask a keyword. And either of these can mask any other use of a name. (That makes it impossible for you to define a data object whose name is while, for example.)

Name Space Caveats
The Standard C language proper defines macros and keywords. The Standard C library defines macros, functions, type definitions, structure tags, and member names. Any function name can potentially be masked by a macro, if you include the standard header that declares the function. All function names have external linkage. Some macros can also mask names of library entities that have external linkage. (Two examples of these odd creatures are setjmp and errno.)
As an implementor, you must put each of the predefined names in its proper name space. If you don't, you will surprise the more daring programmers who recycle these names. Let's say, for example, that size_t has type unsigned int on your implementation. If the programmer includes any of five different standard headers, size_t should be defined thereafter in the program. You might be tempted to write

#define size_t unsigned int /* DANGEROUS */
Mostly, that would work fine. Almost any redefinition or redeclaration of size_t will break, however. A macro will be branded as an improper redefinition. A declaration will be rewritten with the type names unsigned int where the translator expects a name. Bad news.
The only safe implementation is to place the same type definition in each of the five files. (You can include a common file instead, but the filename must not collide with filenames that you promise the programmer can #include.) The user must be able to include any combination of the five standard headers, in any order, with no fear that the type gets multiply defined. That leads to a construct, in each of the five standard headers, that looks something like

#ifndef __SIZE_T #define __SIZE_T typedef unsigned int size_t; #endif
You as implementor must also resist two other temptations. You must not define size_t outside any of the five standard headers in which it belongs. And you must not have any of the standard headers include any of the others. In either case, the programmer has unexpected definitions inflicted on the program.
Finally, the implementor must choose any secret names with care. The standard reserves several sets of names for use by the implementor. A programmer who chooses to define names in any of these sets runs the risk of colliding with some secret name. Collision can occur even if the program includes no standard headers. The sets are:

for secret macro names, any name that begins with an underscore, followed by either an underscore or an upper case letter.

for secret names with external linkage, any name that begins with an underscore.
The second set is useful only for names confined purely to executable code in the library. Why? Say, for example, that your implementation computes sin(x) by the secret call_sinq(x, 0). You might be tempted to place at the end of the standard header <math.h> the macro

#define sin(x) _sinq(x, 0)
Nothing prevents the programmer from defining a macro named _sinq. And nothing can be harder to debug than sneaky little code rewrites such as this. Beware.

Standard Header Caveats
An implementation must provide fifteen different standard headers. Any predefined names not defined in the language proper are defined in one or more of these standard headers. The headers have several properties:
They are mutually independent. No standard header requires that another standard header be first included for it to work properly. Nor may any standard header include another standard header.
They are idempotent. You can include the same standard header more than once. The effect is as if you included it exactly once.
They are equivalent to file level declarations. You must not include a standard header within a declaration. And you must not mask any keywords with macro definitions, as I mentioned earlier.
To maintain mutual independence, the implementor must occasionally make use of both redundancy and synonyms. I gave an example of redundancy earlier, for the size_t definition. Whether you replicate the code or include a common secret header is irrelevant. In either case, the effect is to inject the same code at multiple places within the translation unit.
In a few situations, the translator must provide a synonym for a named entity because the name might not be available. Here are two cases that sometimes confuse readers of the C standard.
Some people think that you can use the sizeof operator only if size_t is first defined in the program. Or worse, some people think that using the operator somehow causes the associated type definition to appear. Neither is true. The translator merely needs to know what existing integral type is the proper synonym for size_t. There is never a need for the name proper.
A similar but different issue arises with the three print functions vfprintf, vprintf, and vsprintf. All three are included in the standard header <stdio.h>. All three have an argument of type va_list. But that type is not defined in that particular standard header. It is defined only in the standard header <stdarg.h>. How can this be?
The answer is simple, if a bit subtle. The standard header <stdio.h> must contain a synonym for the type va_list. The synonym has a secret name from one of the sets I showed earlier. That's all that's needed within the standard header to express the function prototype for each of the three functions.
Now, it's rather difficult for you as a programmer to use any of these functions without a definition for va_list. (It can be done, but it's probably not good style.) That means you probably want to include the standard header <stdarg.h> anytime you make use of any of these functions. Still, it's your problem. The implementation need not (and must not) drag in <stdarg.h> everytime you include <stdio. h>.
Idempotence is a little easier to manage. I showed you earlier how to avoid multiple definitions of size_t. You use a similar macro guard for most of the standard headers:

#ifndef __STDIO_H #define __STDIO_H ..... /* body of <stdio.h> #endif
The one exception is the standard header <assert.h>. It's behavior is controlled by the macro name NDEBUG that you can choose to define. Each time you include this standard header, the assert macro is turned off or on, depending upon whether or not NDEBUG has a macro definition at that point in the translation unit. But that's another story.
The final property of standard headers is purely for the benefit of implementors. The programmer must include a standard header only where a file level declaration is permitted. That means the #include directive must not occur anywhere inside another declaration. Most standard headers must contain one or more external declarations. Without this caveat, the standard headers would be impossible to write as ordinary source text files.

Conclusion
Those are the principal ground rules for using and implementing the Standard C library. I could go on to list any number of additional details, but I will refrain from doing so here and now. I think I've hit the high spots.
As you can see, the C standard has a number of subtle implications for implementors of the Standard C library. Some severely constrain how you can write a conforming library. Some cause the standard headers to be less readable than in simpler times past. None, however, are insurmountable or lead to serious performance problems.