September 1992/Exception Handling

Features

Exception Handling

Carlos Vidal

Carlos Vidal has an M.S. in nuclear engineering. He has been working mainly in UNIX and C since 1986, and has experience with C++ and X. He is currently employed by ABB Atom, Sweden, in the Methods Development department at the fuel factory. He can be reached by mail at ABB Atom, S-721 63 Vasteras, Sweden.
When you work for some time in C, your software looks more like an extension of the C library than a new program. This means that you are beginning to understand and share the undocumented philosophy (or style) hidden behind C. In one expression of that style, many functions return a pointer, and if something goes wrong, the pointer is NULL.
Code like
if(p==NULL)
print_message("...")
or

if(p==NULL) abort
should be included just after any call to your functions. These lines pass hints about a failure through the errno global variable. This is basically the C (and UNIX) error-handling policy, and was the only one in C++ until November 1990, when the ANSI C++ standards committee accepted the exception handling mechanism as part of the language. This mechanism offers a way of jumping directly to a handler when an error condition is found, avoiding the return to the expression that called the function. Unfortunately, at least in the UNIX community, exception handling will not be available in most of commercial compilers until 1993.
This article shows how to build a provisional implementation of the mechanism with a state of the art compiler. The code can be used in real projects, if you cannot wait for the official version.

A Short Description
Exception handling was presented in April 1990 by Koenig and Stroustrup, and is formally described in The Annotated C++ Reference Manual by Stroustrup and Ellis. The C++ Programming Language, 2nd Edition, by Stroustrup also contains some examples.
Exception handling uses three additional keywords: try, catch, and throw. try and catch are always together defining a handler. That is, the handler "tries" a piece of code, and is prepared to "catch" a list of possible errors. The code looks like

try { f1(); f2(); ... etc. } catch (Error of type A) { ... } catch (Error of type B) { ... } // Normal continuation point ...
If something goes wrong inside the try block, the program enters the appropriate catch block. Otherwise, execution continues after the catch list.
To signal an error, the code executed inside the try block can "throw" an exception. For instance,

fd = fopen (...); if(fd == NULL) throw (Error of type X);
This looks only like another syntax of the code

if(error in try block) { switch(errno) { case 1: ... etc } }
except that the next expression executed after throw is the correspondent catch block. The program enters the catch block regardless of the number of stacked function calls involved. For instance,

main() { try { f(); } catch (An_Exception) { ... } } void f() { g(); } void g() { throw ( An_Exception ); }
In this example, the control is transferred directly from g to the catch block, without passing through f.
In other words, throwing an exception transfers control to the last handler entered. If there is no catch matching the exception, the program calls a function named unexpected. If there is no handler, the program calls a function named terminate.
By default, unexpected calls terminate, and terminate calls abort. With the functions set_unexpected and set_terminate this can be changed by user-supplied implementations.
The implementation differs from the classical method in two important ways:
1. After throwing an exception, a special action is taken.
2. The jump to the handler breaks all the rules. Actually, it is a non-local goto, transferring control directly from throw to the appropriate catch block. (This is the delicate side of this mechanism.)
Throwing an exception is like putting a message in a bottle. The castaway launching the bottle has no idea if anyone will get the message, but is sure this is the best thing to do.
In C++, exception handling is particularly useful when treating errors in object constructors. For instance, a constructor implementation of the class

class A: public B { ... public: A(); }
may look like

A::A() : B() { // construct A. // The B part of A should be ready // Check B construction if(status != OK) { my_status = ~OK; return; } // Proceed with A construction ... }
The only way of checking if B::B() succeeded is to add a status variable in B, and ask if it is OK. Then, you can proceed with the A constructor. If it fails, the error can be reported in the same status variable of B if it is accessible, or in another variable especially defined for A (for instance, if B comes in a commercial class library and you can't change the source).
If somebody wants to derive a class from A, he is faced with the same problem. A user of class A has also to check if its construction is OK before using the object, otherwise he is opening the door for a bug. In other words, with this approach errors are not encapsulated.
Using exceptions, A's constructor looks like any simple constructor. That is,

A::A() : B() { // Proceed with A construction ... }
If B::B() fails, this program throws an exception and A need not be aware of it. A user code can look like

try { A a; //or A* pa = new A; } catch(Bad_B) { delete pa; ... }
If a user forgets the handler, when an error occurs the code stops through unexpected, which is always better than a message such as memory fault, core dumped or segmentation violation.

The Formal Syntax
Before starting with the implementation, I will formally define the syntax and uses of the keywords.

throw
Throw can be used in three different ways:
1. throw an_obj;
2. throw;
3. f() throw (type1, type2, etc.) {
//function body ...
Item 1 was used in the previous examples. an_obj is the object passed to the handler. The type of this object is used to select the catch block. For instance,

class AnError; f() { AnError err; ... throw err; ... }
or directly

f() { ... throw AnError(); ... }
Item 2 means to "throw the last exception thrown." To use this form a previous exception must exist. The previous exception has to appear inside the catch block or in a function called from the catch block. For instance,

catch (AnError) { throw; // Throws AnError, with the object originally passed }
Finally, item 3 specifies a list of exceptions handled by a function. That is,

void f() throw (A, B) { // Body of f() }
is actually expanded to

void f() { try { // Body of f() } catch(A) {throw;} catch(B) {throw;} }

catch
catch has to specify a type, catching any exception of this type or of types derived from it. The syntaxes used are

try { ... } catch(AnError) { // Your handler ... }
Or, to access the object passed by throw

catch(AnError& obj) { // Your handler using obj values and services ... }
Finally, catch(...) means to "catch any exception." catch blocks must appear just after the try block, and are evaluated in the order they are declared. Then, if the last block is catch (...), unexpected is never called for this handler.

try
The try keyword has only one syntax, and is the one used in all the examples.

An Implementation
The throw keyword is actually a non-local goto. In C you can get the same result using the standard functions setjmp and longjmp. (A very good description of these primitives was presented by P.J. Plauger in the October 1991 CUJ).

Synopsis

#include.h int setjmp(jmp_buf env); void longjmp(jmp_buf env, int retval);
Basically, setjmp saves the current execution environment (you can say "the state of the CPU" in a buffer passed as an argument, and then returns 0). longjmp restores the environment saved in the buffer passed as an argument, with the exception of the setjmp return value, which is replaced by the second argument of longjmp.
For instance, in the following code, main starts saving the current environment with setjmp. As this is a direct invocation, setjmp returns 0. Then f is called. Finally, g is called and longjmp is executed. Here the normal flow of the program is interrupted. Everything is reset to the state it was in when env was saved, so the next instruction is the evaluation of the if statement in main, with the only difference being that now setjmp returns 1 (longjmp's second argument).

#include.h static jmp_buf env; void g() { longjmp(env, 1); // Never reach this point } void f() { g(); cout< "Return from g(). Never happens"; } main() { if(setjmp(env) == 0) { f(); } else { cout < "Violent return\n"; } }
This behavior is quite similar to the one needed. You can say that try is a kind of setjmp, and throw is a kind of longjmp. The exception-handling definition says that throw has to jump to the "last try-block entered." This can be achieved by passing to setjmp a buffer allocated on top of a stack, and calling longjmp with the last value pushed.
Listing 1 shows the definition of a stack of jmp_buf. For simplicity, I used a fixed-size stack space and a minimum implementation. Any other stack class can be used, provided that jmp_bufs are not allocated as automatic variables. Listing 2 shows the implementation.
In order to handle exception types and be able to pass objects, I defined a virtual base class called Exception (see Listing 3) .
Listing 4 shows the class ExH, controlling the exception handling mechanism. To implement the terminate and unexpected functionality, I defined the type PFV as a pointer to a void function. The functions set_terminate and set unexpected return a pointer to the last function in use. This allows the user to temporarily set his own implementation, independently of the previous one. (See paragraph 15.6.2 in The Annotated C++ Reference Manual.) For instance,
void my_unexp();

void f()
{
   PFV old = set_unexpected(my_unexp);

   ...
   // your stuff
   ...

   set_unexpected(old);
}
To store the current pointers to unexpected and terminate I use the static variables PFV terminate and PFV unexpected in class ExH. To be able to modify them, I declare the functions set_unexpected and set_terminate as friends.
The buffer safeSpace is used to save the last object thrown. As this object is always derived from Exception, I use the pointer lastEx to work with the buffer (lastEx = (Exception*)safeSpace).
The inline function throw shows how they are used together with the stack of jmp_buf. All variables are static because ExH is a control module, so I need only one instance in the program.
Listing 5 shows ExH implementation. Actually, it gives only the initialization of static variables and set_unexpected and set_terminate.

How To Use It
With this class, keywords can be translated as shown in Table 1. After the last catch block, the line else unexpected(); must be added if the last catch has no ellipsis.
Listing 6 shows a translator built with lex (UNIX lexical analyzer) that transforms the keywords to C++ code.

Limitations of This Model
There are two major limitations to this model, compared with the formal definition in The Annotated C++ Reference Manual (ACRM). Probably these are the reasons why exception handling is not already available.

Stack Unwinding
As I said, when longjmp is executed, the next instruction is the one following the associated setjmp. This means that the stack pointer returns to the former value, and no action is taken with the objects created between try and throw. Automatic variables are not such a big problem because they are allocated on the stack, so you lose their values but can reuse the space. On the other hand, objects allocated in the heap (with new or malloc) will remain where they are, generating memory leaks, because you probably lose the pointers to them.
The formal definition of the mechanism prevents this, saying that as control passes from a throw point to handler, destructors are invoked for all objects fully constructed on the path from try to throw.
Unfortunately, this is not the case here. To use this model you have to be careful that the objects created in between are properly destroyed. A partial solution is to derive dangerous classes from an Undo class with virtual destructor, and do some kind of garbage collection with them.

Exception Base Class
The second limitation is that you need an Exception base class to get a uniform access to the virtual functions size and type. Nevertheless, if exceptions are declared all together in some include file, this can be easily modified when the official mechanism becomes available.

Conclusions
My experience using this implementation is that exception handling is very useful to isolate base library writers from application developers. I gained more in productivity than in code performance. For instance, one of my programs has to read an ASCII file with large tables (200K) used to initialize objects of approximately 20 different classes. The application code looks something like

{ CdFile in("my_file"); FuelType ft; try { in >ft; } catch(what you can) { } }
When I wrote operator> I worked very fast knowing that on an input error some kind of action surely has to be taken, independently of the user of the class. The alternative was to abort on any error compromising data integrity.