April 1990/Dr. C's Pointers

Columns

Dr. C's Pointers®

Error Handling In C

Rex Jaeschke

Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementers of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091 or via UUCP at uunet!aussie!rex.
Handling errors in programs is easy. You just don't make any! Well, it's not quite that simple since every now and then your programs must deal with input provided by a human, and humans make mistakes. (Who was it that said "Computing would be real fun if it wasn't for users."?)
Certainty, it is possible to validate data before attempting an operation but it's also common to assume that the routine receives valid data, and design the routine to recover if faulty data causes a process to fail. That is, don't pay the price of validation every time, only when invalid data is detected. However, this approach can break down, particularly if it is impossible, difficult, or expensive to recover from certain errors. And the earlier you trap bad data, the more information you will have about its origins and what to do next.

Approaches To Error Handling
Unlike other mainstream languages, most of the things that can fail in a C program are library functions. Since C has no I/O statements, there are no equivalents to END= and ERR= in FORTRAN's READ and WRITE statements. There is also no equivalent to BASIC's ON ERROR GOTO. About the only kind of errors that can be generated in the C language itself are things like arithmetic over- and underflow, memory access violations (either by attempting to dereference a pointer not pointing to an object or function or by a pointer cast to an unaligned type), and stack overflow. All of these are design issues and will not be discussed here.
Since much of the "real" work in C is done via functions, any error information must be communicated between the function detecting the error and that function's caller. This is typically done either by returning an error indicator value or by initializing an error variable passed in by address, or by a combination of both. For example:

status1 = f(arg); if (status1 != 0) /* handle error */
Here, the function returns zero on success and a specific error value on failure. In the next case:

status2 = g(arg, &errorcode); if (status2 == ERROR) /* handle error */
the function reserves one return value only to indicate an error. The variable errorcode (passed in by address) contains the actual reason if ERROR is returned.
Unfortunately, none of C's standard library functions uses either of these. (Well certainly not the second approach anyway. You could argue that malloc and friends use the first approach since the only "real" reason they fail is not that enough memory is available, regardless of what they were attempting to do.)
C has its own approach; inter-function error communication is done via a global variable, an approach that most structured programmers are strongly warned against for a number of very good reasons. However, that's the way it is so I won't philosophize about it here.

errno To The Rescue
Of course, the global keeper of the error number is our dear friend errno. Historically, errno has been a global int in every program we've written whether we have used it or not. It's really been like a reserved word in the namespace of external identifiers. And since one of ANSI's jobs is to consolidate existing practice, errno survived the ANSI C standardization process pretty much intact.
To help get you into the spirit of things, here's an example of using errno (Listing 1) .
It is the programmer's responsibility to clear errno (a zero value means "no error") each time before calling a function that may set it. No library function is required to clear errno explicitly. You must also test errno or store its value for later testing, immediately after the library function in question returns. If you do not, any other library routine (or user-written routine for that matter) might overwrite errno in the meantime. That is, just because a library function is not documented as setting errno, doesn't mean that it doesn't use it for a scratch variable. Messy, but that's the case.
In the example above, the first occurrence of errno = 0 is unnecessary since at program startup errno is supposed to be cleared.

ANSI C And errno
The proposed ANSI C Standard pins down a number of things regarding errno. The header errno.h was invented as a home for the definition of errno itself and various macros of the form E* that relate to reporting error conditions. errno is allowed to be either a global int or macro that expands to a modifiable lvalue having type int. That is, it could be a macro that expands to something to like *_ _errno().
Only two error value macros are defined by ANSI C: EDOM for domain errors and ERANGE for range errors. However, an implementer is permitted to provide their own E* value macros in this header.
The library functions that are documented as setting errno are: acos, asin, cosh, exp, fgetpos, fsetpos, ftell, ldexp, log, log10, perror, pow, signal, sinh, strtod, strtol, and strtoul. Note that fopen (and most other I/O functions) are not included. As such, you cannot portably recover from a file open failure (which is not surprising since there can be many system-specific reasons for such an error).
The library functions perror and strerror can be used to produce formatted messages corresponding to errno's value. However, the commonly implemented table of messages, sys_list, and its associated machinery are not part of Standard C.

An Error Handling Envelope
Rather than explicitly clear and test errno all the time, it is much more elegant to have an error handling interface inserted between your code and that in the library. Unfortunately, the standard library uses two different ways to return an error a negative int value or a NULL pointer value. You may have to have two interfaces, one to handle each.
Calling an extra function for each math library operation, for example, is an added cost but so too is including the explicit error checking in each place. It's the old speed versus code size tradeoff.
Listing 2 uses the setjmp/longjmp library mechanism to implement recovery from attempts to take the square root of a negative number.
One problem here is the need to explicitly pass the setjmp context into mysqrt — it doesn't really look like a call to sqrt. You could hide this behind a macro:
#define sqrt(d)  sqrt((d), context)
but you would still need to define context yourself. Since ANSI C permits a macro to expand to its own name without recursive death, all existing calls to sqrt could be redirected in this manner with intermediate error checking being added at the cost of recompilation in the presence of this macro. Perhaps a cleaner approach is to make context a global so it never need be passed in. A word of caution about redefining sqrt though. ANSI C effectively reserves the names of all standard library functions and if you invent something of your own with the same name, the behavior is undefined. However, for a given implementation the macro approach may work.

The matherr Concept
Many systems provide a cleaner way to trap (and also recover from) certain kinds of library errors. The idea originated with UNIX systems but has been widely emulated. It involves a function called matherr. Each library routine that can detect certain errors calls another library routine, matherr. Now this default version of matherr may do nothing or it may simply write an error message to stderr. By writing your own version of matherr and linking to it instead of the library version, you can take control when one of the trapable errors occurs. Listing 3 shows a primitive version of matherr. In reality you would probably try to recover from the error.
When Listing 3 is linked with the first example above, the following output is produced.
#1 OK
Function sqrt failed with error type DOMAIN
#2 OK
The reason the second call to sqrt does not show errno set is that matherr returned a non-zero value, indicating that the normal reporting of the error condition should be bypassed (presumably because the error has been "fixed" in the userwritten matherr). With matherr you can bypass or follow the default error handling rules and to a certain extent you can recover from errors and substitute a value that should be returned by the math function instead.
The exception structure has several other members too and the type member values are usually macros or enumeration constants defined in math.h along with the structure template. Check your library manual for more details.
Note that matherr is not included in ANSI C.

Numerical C Extensions Group
This group (abbreviated as NCEG) was formed by me early in 1989. Its purpose is to publish a technical report on directions for adding extensions to Standard C, to support such things as complex arithmetic, IEEE floating-point, vector and parallel operations, and variable dimensioned arrays.
The IEEE floating-point standards deal with a number of interesting things (such as +/-infinity and not-a-number (NaN)) that need to be supported (and taken advantage of) in modern C compilers. According to leading IEEE numerical C implementers, errno gets in their way. Likewise for vendors of C compilers doing parallel operations. As such, errno might well have to be ignored in some implementations, simply for the sake of functionality and/or performance.
As I write this (mid-December 1989), the ANSI C Standards Committee X3J11 is receiving a letter ballot asking members to admit NCEG as a full working group (tentatively called X3J11.1) within ANSI C. The results of this ballot were 22 for and one against, and will be forwarded to SPARC for their consideration.
Contact me for further information on NCEG.