Columns


Standard C

With Gun And Reel

P. J. Plauger


P.J. Plauger has been a prolific programmer, textbook author, and software entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standards committee, WG14. His latest book is Standard C which he co-authored with Jim Brodie.

Last month, I discussed the ground rules by which the Standard C library must operate. I reviewed name space issues from the points of view of both users and implementors. I did much the same for the fifteen standard headers. I also showed ways to implement some of the tricker aspects of the Standard C library.

Part of my motivation is to quell criticism of the ANSI C standard. People new to the party often shoot from the hip. They take one look, pass judgement, and start calling people bad names. It is very easy to find things to criticize in anything as complex as a programming language standard. If you haven't been privy to all the micro decisions along the way, it is even easier to conclude that mistakes were made. If your favorite feature is omitted or strangely altered, it is too easy to assume stupidity or malice.

As an active member of committee X3J11, I know that plenty of thought went into the formation of the C standard. The library, in particular, was the subject of much refinement and compromise. Many of us had to give up our pet notions and allow ourselves to be educated. I firmly believe that the result is quite good. If another procedural language has a better specified library, I don't know what it is.

Another part of my motivation is to demonstrate that the Standard C library is capable of reasonable implimentation. It is often easier to show a way to do something than to argue at length about whether it can be done. If the demonstration can be made readable and at all elegant, so much the better.

With this column, I am taking a new departure. My plan is to continue reviewing the Standard C library in further detail. I will talk about both how the library can be used and how it can be implemented.

As often as possible, I will present credible examples and real code. The code will be highly portable Standard C as much as possible. (I assume that an implementation will tolerate "replacement" C code for standard library functions.) I can't promise that the code will be bug free, but I'll certainly try to keep it that way.

I also can't predict at this point how long this process will continue. It should make a good change of pace from essays on the C language proper or on how the standard is flawed. Think of it as a travelogue — With Gun and Reel Down the Standard C Library. I'll do my best to keep the slide show interesting.

I begin with the standard header <assert.h>. It is not the first one addressed in the C standard, but it is alphabetically the first on the list. I see no compelling reason to address topics in any other order.

The Header <assert.h>

Here's what the C standard says about <assert.h>. I quote verbatim:

4.2 Diagnostics <assert.h>

The header <assert.h> defines the assert macro and refers to another macro,

NDEBUG
which is not defined by <assert.h>. If NDEBUG is defined as a macro name at the point in the source file where <assert.h> is included, the assert macro is defined simply as

#define assert(ignore) ((void)0)
The assert macro shall be implemented as a macro, not as an actual function. If the macro definition is suppressed in order to access an actual function, the behavior is undefined.

4.2.1 Program Diagnostics

4.2.1.1 The assert Macro

Synopsis

#include <assert.h>
void assert(int expression);

Description

The assert macro puts diagnostics into programs. When it is executed, if expression is false (that is, compares equal to 0), the assert macro writes information about the particular call that failed (including the text of the argument, the name of the source file, and the source line number — the latter are respectively the values of the preprocessing macros __FILE__ and __LINE__) on the standard error file in an implementation-defined format.

[Footnote 97: The message written might be of the form

Assertion failed: expression, file xyz, line nnn]

It then calls the abort function.

Returns

The assert macro returns no value.

Forward references: the abort function (§4.10.4.1).

Using Assertions

The sole purpose of this header is to provide a definition of the macro assert. You use the macro to enforce assertions at critical places within your program. Should an assertion prove to be untrue, you want the program to write a suitably revealing message to the standard error stream and terminate execution abnormally. Thus, you might write:

#include <assert.h>

.....

assert(0<=idx &&
     idx < sizeof a/sizeof a[0]);
   /*  a[idx] is now safe */
Any code you write following the assertion can be simpler. It need not check whether the index idx is in range. The assertion sees to that. And should this "impossible" situation arise while you are debugging the program, you get a handy diagnostic. The program does not stumble on to generate spurious problems at a later time.

Please note that this is hardly the best way to write production code. It is generally ill-advised for a program in the field to terminate abnormally. No matter how revealing the accompanying message may be to you the programmer, it is assuredly cryptic to the user. Some form of error recovery and continuation is almost always preferred. Any diagnostics should be in terms that the user can understand.

What you want is some way to introduce assertions during debugging. That lets you catch the worst logic errors and document the assertions you need early on. Later, you might add code to recover from errors that truly can occur during execution. You want to leave the assertions in as documentation, but you want them to generate no code.

<assert.h> gives you just this behavior. You can define the macro NDEBUG to alter the way assert expands. If NDEBUG is not defined at the point where you include <assert.h>, the header defines the active form of the macro assert. It expands to an expression that tests the assertion and prints an error message if the assertion is false.

If NDEBUG is defined, however, the header defines the passive form of the macro. It expands to a placeholder expression that does nothing. In either case, assert behaves essentially like a function that takes a single int argument and returns a void result.

How you control the macro expansion is a matter of taste. One style of programming is to change the source code. Once you believe that assertions should be disabled, just add a line before you include the header:

#define NDEBUG/* disable assertions */
#include <assert.h>
That neatly documents that assertions are henceforth inoperative. The only drawback comes when you have to turn debugging back on again. (I can assure you that eventually you will.) You must edit the source file to remove the macro definition.

Many implementations support a somewhat more flexible approach. They let you define one or more macros outside any C source files. You specify these definitions in a command script or make file that rebuilds the program. A make file can be a better place to document that assertions are to be disabled, and can also be an easier file to replicate and alter when you must revert to more primitive debugging phases. <assert.h> is designed with such a capability in mind, though nothing in the C standard requires it.

This header has an additional peculiarity. All other headers are idempotent. Including any of them two or more times has the same effect as including the header just once. In the case of <assert.h>, however, its behavior can vary each time you include it. The header alters the definition of assert to agree with the current definition status of NDEBUG.

The net effect is that you can control assertions in different ways throughout a source file. Performance may suffer dramatically, for example, when assertions occur inside frequently executed loops. Or an earlier assertion may terminate execution before you get to the revealing parts. In either case, you may need to turn assertions on and off at various places throughout a source file.

So to turn assertions on, you write

#undef NDEBUG
#include <assert.h>
And to turn assertions off, you write

#define NDEBUG
#include <assert.h>
Note that you can safely define the macro even if it is already defined. So long as you always write the same (empty) definition, the code will produce no diagnostics. This license is called "benign redefinition."

Implementing <assert. h>

This header requires very little code, but it must be carefully crafted. To respond properly to NDEBUG, the header must have the general structure:

#under assert  /*  remove any
          existing definition */
#ifdef NDEBUG
#define assert(test) ((void(0)
          /*  passive version */
#else
#define assert(test) <active version>
#endif
The initial #undef is innocuous if no macro definition of assert currently exists. It is very necessary, however, if the definition is to change. The passive version of the macro is closely spelled out by the C standard. Note that the name of the dummy argument is unimportant. (No valid program you can write can tell the difference if the name varies.)

All that remains is to write the active version of the macro. An obvious but naive way to provide the needed functionality is to write the active version as:

#define assert(test) if (!(test)) \
   fprintf(stderr, \
"Assertion failed: %s, file %s, line %i\n", \
   #test, __FILE__, __LINE__) UNACCEPTABLE
This form is unacceptable for a variety of reasons:

Let's assume that we will add a function named _Assert to the library. (A name of this form is reserved to implementors. The program may not contain a macro with such a name. Even if the implementation supports external names with only a single case, the name is still reserved to the implementor. You'll see a lot of names of this form as part of the library code.) The first design decision is whether to test the assertion within the function or inline. Each approach has its merits.

Say the first argument to the function captures the result of the test:

void _Assert (test, <other parameters>);
#define assert(test) _Assert(test, <other arguments>)
The function returns immediately if its first argument is non-zero. Otherwise, it uses the other arguments to compose a suitable error message.

Alternatively, the test can be performed inline:

void _Assert (<parameters>);
#define assert(test) ((test) || _Assert(<arguments>))
The first form better enforces the requirement that the test has the proper type. The second tolerates tests with pointer and floating types as well. (You may enjoy that license until the day you have to move your large program to a more restrictive implementation.)

It is hard to say in general which form generates more compact code. That depends strongly on the implementation. The first form, however, results in a function call on every execution. A rather ambitious global optimizer might eliminate some calls, but don't count on it. The second calls the function at most once, when the program is about to terminate.

I favor the second form, with the inline test, despite the weaker checking. (The checking is no worse than what the passive version of the macro supplies.) Many current optimizers can check a broad range of assertions at translation time. They can often eliminate all of the code for an obviously true assertion.

The remaining issue is how best to encode the diagnostic message. Here, the string creation operator #x really pays off. The trick is to form all the information into string literals at translation time. Then string literal concatenation merges the pieces to produce a single argument.

One nuisance is that the built-in macro __LINE__ is not a string literal, but a decimal constant. To convert it to the proper form requires an additional layer of processing. That is performed by adding to the header a secret macro _STR.

Listing 1 contains the final version of the standard header <assert. h>.

The definition of assert composes the diagnostic information into a single string of the form:

xyz:nnn expression
(to use the notation of the C standard). It is a bit more compact than the canonical form with the words "file" and "line" in it. A smart version of the function _Assert can parse the diagnostic message and supply the missing bits if it chooses. The version shown in Listing 2 does not, since the precise format of the message is implementation defined.

Calling other library functions from within this one causes no problems. Any library function may call any other. (Injecting the name of a library function into a program via a standard header is another matter.) Because the translator composes the diagnostic message, the simpler library function fputs suffices. No need to invoke the full power of output formatting.

As you can see, neither the header <assert.h> itself nor the support function _Assert involves much code. Opportunities abound, nevertheless, for going astray. Past implementations of this facility have committed essentially every one of the sins I outlined here. The C standard has gone a long way toward making the use of assert more uniform. A careful implementation is needed to finish the job.