Columns


Standard C

The Header <setjmp.h>

P.J. Plauger


P.J. Plauger is senior editor of The C Users Journal. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standards committee, WG14. His latest book is The Standard C Library, published by Prentice-Hall. You can reach him at PJP%plauger@.uunet.uu.net, uunet!plauger! pjp.

Introduction

The C programming language does not let you nest functions. You cannot write a function definition inside another function definition, as in:

int f(void)
   {  /* outer function */
   int g(void)
      {  /* NOT PERMITTED */
      .....
The major effect of this restriction is that you cannot hide function names inside a hierarchy. All the functions that you declare within a given translation unit are visible to each other. That is not a major drawback — you can limit visibility by grouping functions within separate C source files that belong to different translation units.

C does, however, suffer in another way because of this design decision. It provides no easy way to transfer control out of a function except by returning to the expression that called the function. For the vast majority of function calls, that is a desirable limitation. You want the discipline of nested function calls and returns to help you understand flow of control through a program. Nevertheless, on some occasions that discipline is too restrictive. The program is sometimes easier to write, and to understand, if you can jump out of one or more function invocations at a single stroke. You want to bypass the normal function returns and transfer control to somewhere in an earlier function invocation. That's often the best way to handle a serious error.

Nonlocal goto

You can do this sort of thing in Pascal. A nested function can contain a goto statement that transfers control to a label outside that function. (A void function in C is called a procedure in Pascal. I use "function" here to refer to Pascal procedures as well.) The label can be in any of the functions containing the nested function definition, as in:

function x: integer; {a Pascal
goto example}
   label 99;
   function y(val: integer):
integer;
      begin
      if val < 0 then
         goto 99;
      .....
You must declare the labels in a Pascal function before you declare any nested functions so the translator can recognize a nonlocal goto.

A goto within the same function can often simply transfer control to the statement with the proper label. A nonlocal goto has more work to do. It must terminate execution of the active function invocation. That involves freeing any dynamically allocated storage and restoring the previous calling environment. Pascal even closes any files associated with any file variables freed this way. The function that called the function containing the goto statement is once again the active function. If the label named in the goto statement is not in the now-active function, the process repeats. Eventually, the proper function is once again active and control transfers to the statement with the proper label. The expression that invoked the function containing the goto never completes execution.

Pascal uses the nesting of functions to impose some discipline on the nonlocal goto statements you can write. The language won't let you transfer control into a function that is not active. You have no way of writing a transfer of control to an unknown function. Here is one of the ways that Pascal is arguably better than C.

Label Variables

The older language PL/I has a different solution to the problem. That language lets you declare label variables. You can assign a label to such a variable in one context, then use that variable as the target of a goto statement in another context. What gets stored in the label variable is whatever information the program needs to perform a nonlocal goto. (The goto need not be nonlocal — it can transfer control to a label within the current invocation of the current function.)

The PL/I approach is rather less structured than the one used by Pascal. You can write a goto statement that names an uninitialized label variable. Or the label assigned to the variable may be out of date — it may designate the invocation of a function that has terminated. In either case, the effect can be disastrous. Unless the implementation can validate the contents of a label variable before it transfers control, it will make a wild jump. Such errors are hard to debug.

C implements nonlocal transfers of control by using library functions. The header <setjmp.h> provides the necessary machinery:

In this regard, the C mechanism is even more primitive than the unstructured goto of PL/I. All you can do is memorize a place that flow of control has reached earlier in the execution of the program. You can return to that place by executing a call to longjmp using the proper jmp_buf data object. If the data object is uninitialized or out of date, you invite disaster.

longjmp and setjmp are delicate functions. They do violence to the flow of control and to the management of dynamic storage. Both of those arenas are the province of a portion of the translator that is extremely complex and hard to write. That part must generate code that is both correct and optimized for space and speed. Optimizations often involve subtle changes in flow of control or the use of dynamic storage. Yet the code generator often works in ignorance of the properties and actions of longjmp and setjmp.

Subtleties

The C Standard addresses two areas where subtleties often lurk:

In both cases, you will find language in the C Standard that is puzzling. That's because the C Standard attempts to circumscribe dangerous behavior without spelling out the dangers.

One of the dangers lies in expression evaluation. A typical computer has some number of registers that it uses to hold intermediate results while evaluating an expression. Write a sufficiently complex expression, however, and you may exhaust the available registers. You then force the code generator to store intermediate results in various bits of dynamic storage.

Here is where the problem comes in. setjmp must guess how much "calling context" to store in the jmp_buf data object. It is a safe bet that certain registers must be saved. A register that can hold intermediate results across a function call is a prime candidate, since the longjmp call can be in a called function. Once the program evaluates setjmp, it needs these intermediate results to complete evaluation of the expression. If setjmp fails to save all intermediate results, a subsequent return stimulated by a longjmp call will misbehave.

The C Standard legislates the kind of expressions that can contain setjmp as a subexpression. The idea is to preclude any expressions that might store intermediate results in dynamic storage that is unknown (and unknowable) to setjmp. Thus you can write forms such as:

You can write no forms more complex than these. Note that you cannot reliably assign the value of setjmp, as in

n = setjmp(buf)
The expression may well evaluate properly, but the C Standard doesn't require it.

The second danger concerns the treatment of dynamic storage in a function that executes setjmp. Such storage comes in three flavors:

The problem arises because the code generator can elect to store some of these data objects in registers. This set of registers is often indistinguishable from the set that can hold temporary intermediate values in an expression evaluation. Hence, setjmp is obliged to save all such registers and restore them to an earlier state on a longjmp call. That means that certain dynamic data objects revert to an earlier state on a subsequent return from setjmp. Any changes in their stored values between returns from setjmp get lost.

Such behavior would be an annoying anomaly if it were predictable. The problem is that it is not predictable. You have no way of knowing which parameters and auto data objects end up in registers. Even data objects you declare as register are uncertain. A translator has no obligation to store any such data objects in registers. Hence, any number of data objects declared in a function have uncertain values if the function executes setjmp and a longjmp call transfers control back to the function. This is hardly a tidy state of affairs.

X3J11 addressed the problem by adding a minor kludge to the language. Declare a dynamic data object to have a volatile type and the translator knows to be more cautious. Such a data object will never be stored in a place that is altered by longmp. This usage admittedly stretches the semantics of volatile, but it does provide a useful service.

What The C Standard Says

Nonlocal jumps <setjmp.h>

The header <setjmp.h> defines the macro setjmp, and declares one function and one type, for bypassing the normal function call and return discipline.106

The type declared is

jmp_buf
which is an array type suitable for holding the information needed to restore a calling environment.

It is unspecified whether setjmp is a macro or an identifier declared with external linkage. If a macro definition is suppressed in order to access an actual function, or a program defines an external identifier with the name setjmp, the behavior is undefined.

Save calling environment

The setjmp macro

Synopsis

#include <setjmp.h>
int setjmp(jmp_buf env);

Description

The setjmp macro saves its calling environment in its jmp_buf argument for later use by the longjmp function.

Returns

If the return is from a direct invocation, the setjmp macro returns the value zero. If the return is from a call to the longjmp function, the setjmp macro returns a nonzero value.

Environmental constraint

An invocation of the setjmp macro shall appear only in one of the following contexts:

Restore calling environment

The longjmp function

Synopsis

#include <setjmp.h>
void longjmp(jmp_buf env, int val);

Description

The longjmp function restores the environment saved by the most recent invocation of the setjmp macro in the same invocation of the program, with the corresponding jmp_buf argument. If there has been no such invocation, or if the function containing the invocation of the setjmp macro has terminated execution107 in the interim, the behavior is undefined.

All accessible objects have values as of the time longjmp was called, except that the values of objects of automatic storage duration that are local to the function containing the invocation of the corresponding setjmp macro that do not have volatile-qualified type and have been changed between the setjmp invocation and longjmp call are indeterminate.

As it bypasses the usual function call and return mechanisms, the longjmp function shall execute correctly in contexts of interrupts, signals and any of their associated functions. However, if the longjmp function is invoked from a nested signal handler (that is, from a function invoked as a result of a signal raised during the handling of another signal), the behavior is undefined.

Returns

After longjmp is completed, program execution continues as if the corresponding invocation of the setjmp macro had just returned the value specified by val. The longjmp function cannot cause the setjmp macro to return the value 0; if val is 0, the setjmp macro returns the value 1.

Footnotes:

106. These functions are useful for dealing with unusual conditions encountered in a low-level function of a program.

107. For example, by executing a return statement or because another longjmp call has caused a transfer to a setjmp invocation in a function earlier in the set of nested calls.

Using <setjmp.h>

You use <setjmp.h> whenever you need to bypass the normal function call and and return discipline. The nonlocal goto that <setjmp.h> provides is a delicate mechanism. Use it only where you must and only a few stylized ways. I recommend that you build on a standard pattern:

You can also add additional case labels to handle other argument values that longjmp can expect.

Here is what the top-level function might look like:

#include <setjmp.h>

static jmp_buf jmpbuf;

void top_level(void)
   { /* the top-level function */
   for (; ; )
      switch (setjmp(jmpbuf))
         {  /* switch on alternate returns */
      case 0:  /* first time */
         process();
         return;
      case 1:  /* restart */
         <report error>
         break;
      case 2:  /* terminate */
         <report error>
         return;
      default: /* unknown longjmp argument */
         <report error>
         return;
         }
}
I assume here that all references to jmpbuf are within this translation unit. If not, you must declare jmpbuf with external linkage. (Drop the storage class keyword static.) Alternatively, you must pass a pointer to jmpbuf to those functions that must access it.

Note in this regard that jmp_buf is an array type. If you write the argument jmpbuf, the translator alters it to a pointer to the first element of the array. That's what setjmp and longjmp expect. So even though jmpbuf appears to be passed by value, it is actually passed by reference. That's how setjmp can store the calling environment in jmpbuf.

For consistency, you should declare each parameter as jmp_buf buf and write the corresponding argument as jmpbuf. Don't declare the parameter as jmp_buf *pbuf or write the argument as &jmpbuf. The latter form is clearer but at odds with the long-standing conventions for calling setjmp and longjmp.

If you choose an alternate form for using setjmp, execute the macro in the smallest possible function you can write. If the translator does not treat setjmp specially, it has less opportunity to surprise you. If it is aware that setjmp is troublesome, it has less code to deoptimize for safety.

Additional caveats apply if you call longjmp from within a signal handler. I will discuss them next month.

Implementing <setjmp.h>

The only reliable way to implement setjmp and longjmp requires functions written in assembly language. You need an intimate knowledge of how the translator generates code. You also need to perform several operations that you cannot express safely in C, if at all.

Listing 1 shows the file setjmp.h. It has proved adequate for a variety of Standard C implementations. It assumes that the calling context can be stored as an array of int. That is usually the case even when the stored context includes data objects of diverse types. I use the internal header <yvals.h> to define the macro _NSETJMP that determines the number of elements in jmp_buf. As an example, the Borland Turbo C++ compiler for PC-compatibles requires that <yvals.h> contain the definition:

#define _NSETJMP 10
Note that <setjmp.h> declares a function named setjmp. It then masks this declaration with a macro that merely calls the function. The only reason for this silly exercise is to keep programs honest. A program should assume that setjmp is a macro. Hence the program cannot redeclare it in a translation unit that includes <setjmp.h>. A program should also assume that the Standard C library defines the name setjmp with external linkage. Hence the program cannot also provide such a definition even if it never includes <setjmp.h>. This implementation of <setjmp.h> endeavors to generate diagnostics for programs that are not maximally portable.

I will not show implementations of setjmp and longjmp here. Believe it or not, I have written working versions of these two functions in C for the VAX architecture. The idea was to illustrate their basic workings for those not schooled in assembly language. Upon reflection, however, I'm not sure that any good purpose was served by such a perverse exercise. Kids, don't try this at home.

This article is excerpted from P.J. Plauger, The Standard C Library, (Englewood Cliffs, N.J.: Prentice-Hall, 1992).