February 2001/Exception Handling in Embedded C Programs

Embedded Systems

Exception Handling in Embedded C Programs

Yonatan Lehman

The most common way to emulate exceptions in C is through its setjmp/longjmp facility. The approach presented here is less complicated, but with some surprisingly useful features, including a simple form of stack unwinding.

Introduction

Exception handling is a mechanism that is used to handle failures within a system in a graceful and modular manner. A full discussion of exception handling is beyond the scope of this article; however, to help readers appreciate the method I summarize the main features an exception handling mechanism should provide:

Separation of detection code from recovery code. A low level function that detects a failure of some sort should be able to say, "hang on — we have a problem; let’s abort the current sequence" without knowing how and where the problem will be handled. This is especially important when a function may be called from several places, and in different situations — each of which potentially needs different handling. The function must be able to raise an exception (signal that an error has occurred) from anywhere within the program, regardless of how deep it is in the call stack.

"Unwrapping" of the calling stack. If an exception occurs within a sequence of nested functions (A calls B, which calls C, which calls D etc.), execution should stop immediately in each of these functions; the deepest function should return immediately to its caller, which should return to its caller, and so on — until execution has propagated to a function that is high enough in the call tree to handle the exception.

Local cleanup. Each of the intermediate functions may need to do some cleanup, release resources, reset its state, and so on. This is necessary since each function has stopped in the middle of execution. The process of raising an exception must leave all parts of the system in a state that will allow recovery, by ensuring that no part of the system remains in an inconsistent state.

A way to identify which error occurred. In other words, functions that detect exceptions need to say "hey we’ve got exception A," and functions that do error cleanup and recovery need to say "if we’ve got exception A do this if we’ve got exception B then do that."

Newer languages such as C++ and Java include exception handling as part of the language. In older languages such as C this mechanism must be emulated using more basic features.

The purpose of this article is to show a simple, systematic, unobtrusive, and efficient method for implementing exception handling in C. The relevence to embedded systems is due to the fact that 1) embedded systems don’t "blue screen" — they try to recover and keep going. 2) Embedded systems are typically both time and memory critical, so the implementation must be both time and memory efficient.

An Added Bonus

In addition to the exception handling features listed above, this method provides a feature not normally found when exception handling is part of the language — it can trace the unwrapping of the stack. In other words, upon return from each of the functions D, C, B, and A, the mechanism can write information to a log. This log shows the calling sequence up to the time the exception occurred. Typical output is as follows:
boo.c[128] RAISE 100
foo.c[64] return 100
moo.c[345] return 100
nu.c[1234] RECOVER 100
This shows that the error was raised in file boo.c line 128. The function where the error was raised was called from line 64 in file foo.c. That function was called from file moo.c line 345. The exception was finally handled in file nu.c line 1234.

This sort of "postmortum" dump shows the full context of the exception, and is often useful, since it eliminates the need to reconstruct the situation that led to the failure. In embedded systems this is particularly important because failures may be timing dependent. It may take hours of running to reconstruct a problem, and simply using a debugger or adding printf statements can change the timing enough to make the problem disappear.

In the code that follows I refer to an abstract LOG operation. This operation can be simply a printf to standard output or to a file, or it can be a write to a memory resident buffer. What is important is that ASCII trace information can be written and it can be viewed at some point.

Using the Method

The best way to understand this exception mechanism is to see how it is actually used. The following is an example of a call tree: function A calls function B, which calls function C, which calls function D:
void A(...) {
   ...
   B(...);
   ...
}

void B(...) {
   ...
   C(...);
   ...
}

void C(...) {
   ...
   D(...);
   ...
}
Function D checks and discovers some sort of exceptional condition that it can’t handle by itself. Since D can’t recover, it signals its caller (function C) that it has had to abort execution. Similarly, C can’t handle the problem; it needs to "bounce" the problem up to function B. Function B hasn’t got the political clout to solve the problem either, so it too signals its caller, A. The buck stops in function A. It takes some recovery action and the system carries on execution.

A walk through a few code excerpts will show how this works in practice.

The first requirement of the method is that every function take an extra parameter called a status parameter. For readability, I give this status a special type, using a typedef as follows:
typedef unsigned long ER_status_t;
At the top level (the main function, or the top level of each task in a multi-threaded system) the program defines a status and initializes it to an OK value (usually zero).
main (...) {
   ER_status_t st = ER_OK;
   ...
}
Every function takes an extra parameter of type ER_status_t *. By convention, this is the last parameter. The parameter is passed from function to function. So the previous example actually looks as follows:
void A(..., ER_status_t *st_p){
   ...
   B(..., st_p); ER_CHK(st_p);
   ...
EXIT :;
}

void B(..., ER_status_t  *st_p) {
   ...
   C(..., st_p); ER_CHK(st_p);
   ...
EXIT :;
}

void C(..., ER_status_t *st_p) {
   ...
   D(..., st_p); ER_CHK(st_p);
   ...
EXIT :;
}
You will have noticed that in addition to the status pointer parameter passed to each function, I have added two additional elements. After invoking each function the code calls a macro ER_CHK, passing it the st_p parameter. In addition, every function has an EXIT label at the end of the function. I’ll get to the EXIT label in a moment.

First, observe what happens in function D, where the error is detected:
void D (..., ER_status_t *st_p)
{
   ...
   if (some condition) {
      ER_RAISE(st_p, error1);
   }

   EXIT : ;
}
ER_RAISE does the following (simplified for the moment):
#define ER_RAISE(ST_P, ST) \
/*1*/ {*ST_P = ST; \
/*2*/   ER_LOG("%s(%d) RAISE %x\n", \
             __FILE__,__LINE__, *ST_P);\
/*3*/   goto EXIT;}
This macro has three parts:

It sets the status pointer to a given error value.

It writes a trace with the file and line to the log. (In C99 it could also include the function name, since C99 provides a __FUNC__ macro.)

Execution jumps — yes, via a goto — to the EXIT label at the end of the function.

So function D returns, in effect, from the point where it invoked the macro ER_RAISE.

This brings us to the ER_CHK macro. This macro is placed after every function that takes a status parameter. It is defined (again, slightly simplified) as:
#define ER_CHK(ST_P) \
/*1*/{if (*ST_P != ER_OK) {\
/*2*/  ER_LOG("%s(%d) return %x\n",\
         __FILE__,__LINE__, *ST_P);\
/*3*/   goto EXIT;}
Line 1 checks the status for a non-ER_OK value. If this happens then lines 2 and 3 write to the log and goto EXIT, as in ER_RAISE.

This process continues at each level, until it reaches the brave function A, which wants to handle the error. Typically, what the handler does is something like:
void A(..., ER_status_t *st_p) {
   ...
   B(...,st_p);
   if (*st_p == error1) {
      handle error1;
      ER_RECOVER(st_p);
   } else if (*st_p != ER_OK) {
      default action for other errors;
      ER_CHK(st_p);
   }
   ...
}
Here, instead of calling ER_CHK, function A explicitly checks *st_p for all values that it knows how to handle. (Only error1 is shown in the example but it could “else if” any number of errors.) Assuming that it can recover from this error, function A calls the macro ER_RECOVER. This macro does something like:
#define ER_RECOVER(ST_P) \
  { ER_LOG("%s(%d) RECOVER %x \n",\
           __FILE__,__LINE__, *ST_P);\
    *ST_P = ER_OK;}
ER_RECOVER writes a special log trace to mark the end of the trace sequence. It then resets the status to ER_OK so that execution can continue normally.

Back to function A: in cases where it can’t handle the error, it can call ER_CHK to defer the exception handling to an even higher level (similar to "rethrowing" an exception in C++).

Time for a Quick Review

1) At the top level of the program (the “main” function), or the top of each task or interrupt handler in a real-time system, the program defines and initializes the status parameter.

2) The pointer to the status is passed to each function.

3) After each function returns, the ER_CHK macro is used to check the status, and if it is not ER_OK, to log the file and line and do a goto EXIT.

4) ER_RAISE is used to raise (start) an exception.

5) ER_RECOVER is used to "close" and reset an exception.

6) Note that the status is initialized only once. Functions that raise no exceptions need not explicitly set the status value to ER_OK; they can just leave the status value alone.

Why goto and Not return?

The problem with the return statement is that it forces multiple exits from the function. If the function needs to do some cleanup before it exits — for example, to release a resource it is holding — in the best case it will contain duplicated code; in the worst case, the programmer will forget to do it and the function will contain a bug!

The EXIT label provides a single point of exit from the function. It also acts as the exception handler for the function (the equivalent to the catch in C++). For example, consider a function to be used in a real-time system, in which a resource must be locked while it is being used:
void X(..., ER_status_t *st_p)
{
   lock();
   ...
   B1(st_p); ER_CHK(st_p);
   ...
   B2(st_p); ER_CHK(st_p);
   ...
EXIT :;
   unlock();
}
In this example X calls functions B1 and B2, within some sort of locking mechanism. Such locking is often needed in real-time systems to ensure that tasks or interrupts don’t interfere with each other. The mechanism could be a semaphore lock, or an interrupt disable, depending on the situation; for this example what matters is that the lock must be matched by an unlock, otherwise the system could become deadlocked.

Putting the unlock after the EXIT statement ensures that even when calling ER_CHK from B1 or B2 (or from any other function called) that the unlock will be done, since ER_CHK always exits via the EXIT label.

The EXIT label thus marks a "cleanup" handle that can do any kind of cleanup, such as releasing a lock, releasing allocated memory, or closing a file, before propagating the error to the calling function. Each function in the call stack has an opportunity to do any cleanup necessary.

A More Complete Example

In the example below, four functions are invoked: A, B1, B2, and C. However, only functions B1 and B2 are in the locked region.
void X(..., ER_status_t *st_p) {
   A(st_p); ER_CHK(st_p);
   lock();
   B1(st_p); ER_CHK(st_p);
   B2(st_p); ER_CHK(st_p);
   unlock();
   C(st_p); ER_CHK(st_p);
EXIT :;
  ???
}
The problem here is that if B1 or B2 raise an exception, then X must perform the unlock, but if A or C raise an exception it must not. The solution is to use a local is_locked flag, and test it within the cleanup handler:
void X(..., ER_status_t *st_p) {
/*2*/ int is_locked = 0;
      A(st_p); ER_CHK(st_p);
/*4*/ lock(); is_locked = 1;
      B1(st_p); ER_CHK(st_p);
      B2(st_p); ER_CHK(st_p);
/*7*/ unlock(); is_locked = 0;
      C(st_p); ER_CHK(st_p);
EXIT :;
/*10*/ if (is_locked)
          unlock();
}
The flag is initialized to 0 in line 2, indicating that X has not yet performed the lock. Once X performs the lock, is_locked is set to 1 in line 4. X then calls functions B1 and B2. When X performs the unlock in line 7 it once again clears is_locked. Finally, after the EXIT X tests is_locked in line 10 and unlocks it only if it is set. This guarantees that if ER_CHK is called immediately after A or C — or it isn’t called at all — the EXIT section does nothing; if EXIT is reached from B1 or B2, then the EXIT section does the necessary cleanup.

Usage Guideline

1) Never put an ER_CHK after the EXIT label. This causes an infinite loop (think about it...) — you will probably fall in this trap at least once, especially if you cut and paste a function call (along with its ER_CHK) after the EXIT. See the source code provided with this article (www.cuj.com/code) for a way out of this problem.

2) Be careful when doing loops or ifs on a single statement. The following code:
while (something)
   some_function(st_p); ER_CHK(st_p);
is wrong. (The ER_CHK is not in the loop.) This is correct:
while (something)
{
   some_function(st_p); ER_CHK(st_p);
}
3) I always puts the ER_CHK at the end of the line, where it is unobtrusive; otherwise it can double the number of lines of code, making it harder to read.

4) Don’t forget to initialize the top level status to ER_OK, and to reset it to ER_OK when you recover (unless you use ER_RECOVER).

5) Don’t cheat on the ER_CHK; put one after every function you call. If you don’t you will get a trace which shows not where the function was called but where the ER_CHK was done. This trace will be misleading and you may spend hours trying to figure out how you got a trace in function X from function Y when it never calls that function.

6) If a function raises no exceptions, and calls no functions that affect the exception status, then you can eliminate the status parameter if you really must — but it is better to include it in all functions, since:

a) It saves you remembering when you need a status and when not.

b) When you eventually do call another function that does take a status, it saves you having to add status to a long chain of functions. (What usually happens is that a local status is declared, and exceptions don’t get propogated all the way up.)

Exceptions for Embedded Systems

The implementation shown above is fine for systems where memory is plentiful, and the memory overhead of doing ER_CHK after each function is not significant. In embedded systems, more careful coding is necessary. The following snippets show how to use three new macros, ER_DEF_FILE, ER_ENTER, and ER_EXIT, to achieve a more memory efficient implementation.
ER_DEF_FILE("my_file.c");

void A(...,ER_status_t *st_p)
{
   int      i;
   ER_ENTER();
   ...
       B(st_p); ER_CHK(st_p);
   ...
EXIT :;
  ER_EXIT(st_p);
}
The ER_DEF_FILE macro is added once to each file. The ER_ENTER, ER_EXIT macros are the very first and last calls in each function respectively.

To understand what these macros do, look again at the first implementation of ER_CHK:
#define ER_CHK(ST_P) \
 { if (*ST_P != ER_OK) {\
    ER_LOG("%s(%d) return %x\n", \
    __FILE__,__LINE__, ST);\
    goto EXIT;}
This code generates memory overhead in a number of places:

1) In some compilers, __FILE__ generates a (long) full path name. This can waste memory, and also makes the log entries unreadable.

2) Every ER_CHK generates and stores a new copy of the file name (__FILE__).

3) The ER_LOG, taking three parameters, is called by every ER_CHK. (Although this could conceivably slow things down when an exception occurs, that is not the primary concern. The primary concern is that every appearance of ER_LOG generates code to store the parameters, which represents a memory overhead whether the function is called or not.)

4) A copy of the format string is stored for each ER_CHK.

The new macros solve these problems as follows. ER_DEF_FILE is defined as:
#define ER_DEF_FILE(F) \
static char _er_filename[] = F;\
static void _er_log_error(\
   ER_status_t  status,\
   int          line) \
{\
   ER_LOG("%s(%d) error %x\n",\
      _er_filename, line, status);\
}
This macro does two things. First, it defines a global, static string of char, initialized with the name given as a parameter F. This provides a string that can be used anywhere where __FILE__ would be used, but the string is allocated only once. Second, the macro defines a static function _er_log_error, which takes the line number and and status as parameters, and generates the ER_LOG trace just as ER_CHK did.

Now ER_CHK can be redefined as follows:
#define ER_CHK(ST_P) \
   { if (*ST_P != ER_OK) {\
      _er_log_error(__LINE__, ST);\
      goto EXIT;}
This eliminates overheads 1, 2, and 4 above and reduces overhead 3, by passing only two parameters instead of three. However, we can do better than that, by using the macros ER_ENTER, ER_EXIT, and by redefining ER_RAISE and ER_CHK.

ER_ENTER opens a block and defines a local variable _er_line:
#define ER_ENTER() {int _er_line;

ER_RAISE and ER_CHK do
   ....\
   _er_line == LINE; goto EXIT;...
ER_ENTER does this instead of calling the log function, thus it uses an assignment of a constant to a local variable (which is often stored in a register) instead of a function call.

Finally, ER_EXIT is defined as follows:
#define ER_EXIT(ST_P) \
    {if (*st_p != ER_OK) \
    _er_log_error(_er_line, ST);}}
Note the extra curly brace at the end. This closes the brackets opened by the ER_ENTER.

ER_EXIT tests the status, and if it is non-zero it writes to the log using the static function generated by the ER_DEF_FILE macro, passing it the line number that was saved by ER_CHK or ER_RAISE. This reduces the overhead to a function call per function instead of a function call per ER_CHK.

Conclusion

The macros defined here are useful both for implementing exception handling and recovery, and as a diagnostic tool. They are a great help if they are used consistently, which is easy to do if you include them from the beginning and as coding proceeds. Adding these macros afterward, or when you have a specific problem, can be a pain in the neck.

Along with this article I have provided a file er.h, which implements these macros. (See www.cuj.com/code.) The file defines several versions of ER_CHK that trade off different levels of memory compactness with programing conveniance. I also include a file er_example.c which demonstrates use of the exception macros.

Yonatan Lehman has worked as a programmer, software, and system architect for 18 years. He holds a BSc from Imperial College, London. He is currently working for Zen Research, Israel and can be reached at ylehman@ieee.org.