June 1990/Dr. C's Pointers

Columns

Dr. C's Pointers®

The exit And abort Functions

Rex Jaeschke

Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementers of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091 or via UUCP at uunet!aussie!rex or aussie!rex@uunet.uu.net.

Introduction
The library functions abort and exit don't seem very exciting or complicated, and in fact, they aren't. However, since they are often found in production programs, you should understand them, since using them incorrectly may compromise all kinds of disk and memory data. Consider the often-used example of memory allocation:

if (ptr = malloc(size)) == NULL) { fprintf (stderr, "Can't allocate memory\n"); exit (1); }
Terminating a program when memory cannot be allocated may be a reasonable thing to do in some cases but not in all. Consider the case where you have processed some part of a set of related transactions. You try to allocate memory and this fails. Simply terminating the program would leave the database logically compromised. You must either complete the rest of the transaction or backout the part already done. Either way, you have cleanup to do.
Similarly, your program might be just one of several executing programs that share the same piece of memory so, again, you may need to logically clean up prior to terminating. You might even have to send a message to one or more programs telling them you are "going down" so they don't queue up new work for you.

abort vs. exit
While abort and exit both cause a program to terminate, exit does so in a controlled manner whereas abort does not.
According to the ANSI C Standard, "The abort function causes abnormal program termination to occur, unless the signal SIGABRT is being caught and the signal handler does not return. Whether open output streams are flushed or open streams closed or temporary files removed is implementation-defined. An implementation-defined form of the status unsuccessful termination is returned to the host environment by means of the function call raise(SIGABRT)."
On the other hand, "The exit function causes normal program termination to occur. If more than one call to the exit function is executed by a program, the behavior is undefined.
"First, all functions registered by the atexit function are called, in the reverse order of their registration.
"Next, all open output streams are flushed, all open streams are closed, and all files created by the tmpfile function are removed.
"Finally, control is returned to the host environment. If the value of status is zero or EXIT_SUCCESS, an implementation-defined form of the status successful termination is returned. If the value of status is EXIT_FAILURE, an implementation-defined form of the status unsuccessful termination is returned. Otherwise the status returned is implementation-defined."
Listing 1 creates a temporary file via tmpfile; it creates and writes to TEST.DAT; it also writes to stderr. It is terminated by either abort or exit on command.
The abort path produces the following output:
Enter A (abort), E (exit): A
error message to stderr
Abnormal program termination
The file TEST.DAT is empty since the output buffer was not flushed prior to closing. The temporary file created by tmpfile remains in the directory. (Using three different compilers under MS-DOS produced the three different named temporary files _TEMPA.TMP, TEMP0001.$$$, and TMP1.$$$.)
When the more orderly exit path is chosen, the output is:
Enter A (abort), E (exit): E
error message to stderr
the temporary file is deleted, and TEST.DAT contains the following data:
message to data file
This example simply serves to demonstrate that these functions indeed behave correctly, at least on the systems I checked. (Having worked for some years now with quite a few C implementations, I have been bitten numerous times because I believed marketing literature or technical documentation that said "ANSI-conforming" or "This is exactly how it works." What they often mean is "This should conform to the ANSI Standard" and "This is how it is supposed to work." There's a big difference.)

Registering An Exit Handler
A big advantage with exit is that you can write an exit handler to intercept calls to exit. See the example in Listing 2.
The atexit function in stdlib.h allows you to register an exit handler. This function was invented by the ANSI C committee based on prior art. (At least one vendor had a similar function called onexit which, for a time, existed in a draft ANSI standard along with a typedef onexit_t.)
According to the ANSI C Standard, "The atexit function registers the function pointed to by its argument, to be called without arguments at normal program termination.
"The implementation shall support the registration of at least 32 functions.
"The atexit function returns zero if the registration succeeds, nonzero if it fails."
In this example atexit registers the three functions eh1, eh2, and eh3, which are called in reverse order on exit. Each exit handler must have a void argument list and return type. (You can actually register the same function multiple times.)
Note that you cannot deregister an exit handler once it's been registered. In this case (and perhaps in others too), you need register only one function and when it gets control, have it invoke any others itself. Thus, the exit handler can call any combination of functions it needs to handle the state of the program it finds. For example:
main()
{
    printf("reg eh3: %d\n", atexit(eh3));
}
     
void eh3(void)
{
    printf ("Inside eh3\n");
    eh2 ();
    eh1();
}

reg eh3:0
Inside eh3
Inside eh2
Inside eh1
Note that there is no call to exit in main in this example. The ANSI C Standard requires program execution that falls off the end of main to implicitly call exit with an undefined value. Similarly, returning from main with an explicit return value results in calling exit with that value as its argument. This means you will always enter your exit handler at least once even if you never call exit explicitly.
A minor inconvenience is that an exit handler cannot find out the value actually passed in to exit. It's possible you want the handler to behave differently according to the situation that lead to exit being called. However, to access this information, you must use a global variable and initialize it with the exit value as well.
You may be tempted to trap a call to exit and replace it with your own exit value. This is not possible since a handler that calls exit itself potentially produces an infinite loop. In any event, the behavior of calling exit more than once at run-time is undefined.
void eh1(void)
{
    printf ("Inside eh1\n");
    exit(0);    /* ???? */
}
It should be obvious you should not call exit directly from an exit handler. However, since an exit handler can call any other function in the whole program, you must be sure not to call any that eventually do call exit.

Framework For An Application
Let's consider the case where we have four potential areas that can be compromised when exit is called. The global variable flags contains four 1-bit bit-fields which represent the current state of these areas (assuming each has a binary state). Initially, all states are clear; however, they can be set during the program when part of a particular transaction is done, for example. The example in Listing 3 simulates two such "compromised states."
Using bit-fields allows status flags to be packed densely and to be dealt with by name without having to mess with direct bit manipulations. However, the exit handler must explicitly test for each one of them. It would be much faster and would generate less code if the exit handler could test the bit fields in a loop. However, you cannot have an array of bit-fields, and besides, the bit-fields might not be the same width anyway. Listing 4 shows an alternate solution.
handler is an array of flags and since we can't have an array of bit-fields, unsigned chars are used instead. Now there's not much point in being able to loop through this array if you need custom code to do different things for each subscript, so I've added a function pointer for each flag. That way, each flag has its own processing function registered in the array via its initializer. You can also change the value of any function pointer at run-time as you like. (Of course, if you never need to, make pfun a const member.)
You might have noticed the macro NUMELEM. I find this a very useful macro to have in my toolbox.h header that I cart around on different projects. It works for any type of array (including multi-dimensional ones) and hides a messy looking expression along the way. By the way, the expansion should reduce to a compile-time integer constant so don't be concerned about it generating large amounts of code.

Intercepting Aborts
Since abort doesn't clean up after itself and it doesn't let you get control, calling it can cause you grief. You might well respond, "If that's so, then don't call it." In practice, it's not quite so simple. For example, you might (Heaven forbid) be calling a third-party library function that does call abort, in which case, you are "dead in the water."
More close to home, while maintaining a production program you might use the assert macro to trace a suspected bug. Unfortunately, if assert fails, it calls abort. Obviously, this might leave you with compromised data on disk or in memory. What you really need is an abort you can redirect to an exit. Well, it just so happens that I have one for sale. Just for today only though and only one per customer, please! Stand back now; no pushing.
You can intercept a call to abort using signal to catch a SIGABRT, as in Listing 5.
When the abort path is selected, the output is:
Enter A(abort), E(exit): A
Assertion failed: c!= 'A', file abort.c, line 22
Inside abort_hand
Inside eh
whereas with exit, you get something like:
Enter A(abort), E(exit): E
Inside eh
This approach lets you turn an otherwise disastrous abort into one that's controlled and in the process, generates a defined exit status code. However, you still can't find out where the abort call came from within the handler even though the filename and line number are written to stderr.
Again, watch out for infinite looping. Don't call abort from within the exit (or abort signal) handler or any of the functions it calls.

Miscellaneous Issues
Provided you can get access to the relevant file pointers, you should be able to do I/O to already-open files from an exit handler. You should also be able to open new files.
You should always consider the possibility of interrupts occurring when you are in an exit handler. Since the program is in the process of terminating, you might want to use signal to ignore new interrupts. This doesn't solve all possible problems though. For example, multiuser systems usually provide some way for a privileged program to abort another program. This often results in a SIGKILL signal being generated. Unfortunately, most implementations don't allow you to catch a SIGKILL signal. (If they did, you could catch it and ignore it and keep right on spreading your virus around that 500 megabyte disk, hence you can ask to ignore it, but it won't be.)
There is also a SIGTERM signal that you can catch. The standard says almost nothing about how it might be generated (other than by using raise) so you should consult your library documentation for more information on this and other signals.
And as with all code that involves asynchronous operations, you should pay particular attention to using the volatile keyword, to ensure that the value of the object you access actually reflects it latest state.