ELF: Exception Handling and Logging Framework

By Sony Antony

Production-support engineers have always complained about the lack of logging in software systems. Logging becomes particularly important when something unexpected happens inside the running processes. As a result, logging is closely related to exception handling in C++.

In this article, I present an easy-to-use technique for managing exception handling and logging in a C++ application. Two of the added benefits of this technique are Java-like exceptions with stacktrace and the ability to monitor method entry and exit in real time (method invocation traceability).

I also describe a modular framework called "ELF" that uses this logging technique to let applications log their events to any number of local or remote computers. Though ELF is written for UNIX-like systems, it is easy to port it to other operating-system environments, since its networking- and OS-specific features are abstracted inside a small number of classes.

The Technique

The technique I present requires that all exceptions thrown inside the application be derived from the common abstract base class Exception (Listing 1), which, in turn, is derived from std::exception (the base class of all exceptions defined in the C++ Standard). Though it is possible to invoke the methods of class Exception directly, most of the time, it is easier to use one of the five macros provided to operate on the Exception object. Exception is a stackable class. This means that an Exception object can contain another Exception object that, in turn, can contain yet another Exception object, and so on. This makes it possible for a method from a higher-level abstraction to catch lower-level exceptions, substituting them with higher-level exceptions without losing the original low-level exception; see Listing 2.

You can use the macro DEFINE_EXCEPTION() to define a new derived class Exception. Similarly, you can use one of the four THROWxxx macros in Listing 1 to fill in the attributes and to throw an exception.

I find the following exception-handling policies useful:

The exception specification for all methods will declare the method to be throwing an exception of the base class Exception, as in void foo() throw( Exception ). This way, if you are not interested in catching an exception, you can let it escape up the stack, irrespective of the actual type of the Exception.
Every exception thrown should be of a type derived from the base class Exception. Unless customization is required, the macro DEFINE_EXCEPTION(XXXException) can be used to define an exception that could be thrown from methods of the class XXX.
If a method is meant to be included in the stacktrace, or if its entry/exit should be made traceable, it should be defined using the macro pairs ELF_FUNCT_DEF()/ELF_FUNCT_DEF_END (Listing 2). It is advised that at least most of the important methods are defined this way.
The macros ELF_FUNCT_DEF() and ELF_FUNCT_DEF_END are defined as in Figure 1.
As can be seen, the trick is to enclose the actual method body within a try/catch block. The catch is used to catch all Exceptions that happen to pass through the method. The exception handler adds the current method's signature to the Exception's StackFrame. The caught Exception is then rethrown to the higher-level callers. The original user code of the method appears as the body of this added try/catch block.
logManager is a Singleton object of type LogManager, the class acting as the entry point for all logging-related activities. FunctCallMonitor is a lightweight class within the scope of LogManager whose constructor logs the method entry if the method's call tracing has been turned on. Similarly, FunctCallMonitor's destructor logs the method's exit. It uses the Singleton logManager object to do the actual logging. (For more information, see the accompanying sidebar entitled "For a More Java-like Stacktrace.")

The ELF Framework

ELF adds a distributed logging mechanism to the technique I just described. All the implementation details are hidden behind the Singleton object logManager, of type LogManager, which acts as the entry point for user code. LogManager is capable of logging in three different ways: local logging using files, remote logging using UDP for lightly loaded systems, and remote logging using TCP for heavily loaded systems. logManager is initialized by passing a Properties object, which is like the Java Properties class used to encapsulate the notion of a set of configuration parameters. There is also a Singleton instance of the Properties class called "appConfig," similar to the one returned by Java's System.getProperties() call. appConfig is expected to be initialized at application startup from the configuration file, so that application-wide configuration is available. A new type of logging can be added to the logManager by using the method addLogWriter(). You can optionally pass a Properties object if the default appConfig Properties is not used. This way, if TCP-based logging is desired to two different remote hosts, addLogWriter() can be invoked twice by passing a different Properties object containing the different log server IP addresses and ports.

The class LogManager is centered around the abstract class LogWriter, which acts as the interface for all the six levels of logging (the five data levels CRITICAL, ERROR, AUDIT, WARNING, DEBUG, and one function call trace with six similarly named methods). LogWriter is also a stackable class. It can be made to contain other LogWriter objects in a recursive fashion. To support multiple log destinations, LogWriter uses recursive chaining. Whenever a logging method is called on a LogWriter, it first checks to see if it contains a nested LogWriter object. If it has one, the same method is invoked on the nested LogWriter before it does its own logging. LogWriters are instantiated using the factory method createLogWriter(), which uses three implementation classes — FileWriterImpl, UdpLogWriterImpl, and TcpLogWriterImpl — to handle local file-based logging, remote UDP-based logging, and remote TCP-based logging.

By using the methods setDataLogLevel() and turnOn/OffFuncCallTracing() of LogManager, it is possible to change the current logging level dynamically. But for a server process that is always running, you must implement a command-receiving mechanism that listens for requests for changing the log level. Such a mechanism could be as simple as a separate thread listening at a well-published port, or as sophisticated as a full-fledged CORBA interface.

Be warned that all of the UDP-based servers will simply drop the incoming packets under heavy load (including the UNIX syslogd log daemon that uses UDP, if configured to accept remote log clients). So UDP should be chosen only if the log server is expecting a light load, as is the case when function call tracing is turned off. Although UDP is not as reliable as TCP, UDP offers two advantages. First, it does not keep a connection open to the log server, thereby saving the client resources. Second, the client's write() system call will never block, irrespective of the server load. On the other hand, TCP ensures that every bit of data logged by the client application is actually logged by the server. At the same time, if the log server's TCP receive buffer is full and if it exercises flow control, the write() attempt may block until the server has consumed the pending data in its receive buffer. Since a high-efficiency log server, like the one used by ELF, reduces the possibility of a full buffer, it is better to use the LogServer class in the TCP mode for remote logging.

A log server that can operate in either UDP or TCP mode receives the logging data on the remote end. The log server is based on the abstract base class LogServer. A factory method, createLogServer(), instantiates a concrete object of type TcpLogServerImpl or UdpLogServerImpl, depending on the mode in which the log server is launched.

In order to gracefully shutdown the log server, SIGHUP is being used. Signal handling in C++ is a complex area that can yield unexpected results if not dealt with carefully. Most of the C++ objects, like standard IO streams and STL containers, are not reentrant (async-signal safe). Even most of the C library functions, like printf() and exit(), are not async-signal safe [2]. This means that such functions and objects cannot be used inside a signal handler unless you take special care to define execution points within the process where signals are allowed. Such points should not be inside any of the nonreentrant objects and functions. In case they happen to be inside a method or object, you should take care not to use the same method or object inside the signal handler. The easiest way to handle this situation is to block the signals for most parts of the program, and then to unblock them at async-signal-safe blocking system calls.This way, one can use any functions and objects within the program without any consideration to their async-signal safety.

LogServer defines two such points around the async-signal safe blocking system calls read() and poll() for UDP and TCP, respectively. To generalize the concept of pre- and post-blocking processing, I define an abstract callback base class called "BlockingOperationSentry." A derived class must override its preBlock() and postBlock() methods. A blocking operation is performed like the following:

BlockingOperationSentry*
 sentry  = ... ;
{
  BlockingOperationSentryGuard 
      guard( sentry ) ;
  blockingOperation() ;
}

BlockingOperationSentryGuard is a simple helper class that is responsible for invoking the preBlock() and postBlock() methods on the sentry object using its constructor and destructor, respectively. In the case of LogServer, preBlock() unblocks the signal SIGHUP, and postBlock() blocks it again. This technique ensures that SIGHUP is delivered to the process only during the blocking system call that gets invoked between the preBlock() and the postBlock() calls.

UdpLogServerImpl is a simple iterative server, reading and processing one datagram at a time, while TcpLogServerImpl is based on the high-performance IO multiplexing pattern [4]. TcpLogServerImpl waits for events to happen in any of the connected sockets, or at the original TCP socket listening in the well-published port, using the poll() system call. UdpLogServer uses only one socket to listen for the client requests and to process them. On the other hand, TcpLogServerImpl will have six TCP connections per each client using the server for logging. This is so because each level of logging (like debug, warning, and so on) uses one independent TCP connection all for itself.

You can eliminate most of the extra code generated by the macros with the help of some compile-time macros, as explained in the README file. Once the initial development stage is over, and after you have enough confidence in the software's stability, you can selectively remove features like the function call traceability to make the code smaller and faster. (See the accompanying sidebar entitled "Launching the Log Server.")

Conclusion

ELF is a user-friendly framework that lets applications perform distributed logging, detailed exceptions with stacktrace, and function call traceability. ELF uses a modular design, which makes it possible to include different parts of the ELF framework in a selective fashion. This design lets you maximize the logging capability in the initial stages of software development, then eliminate the extra code when the software is more stable, with the help of some compile-time switches.0

Resources

[1] Chaudhry, Puneesh. "A Per-Thread Singleton Class," C/C++ User's Journal, May 2002.

[2] Stevens, W. Richard. Advanced Programming in the UNIX Environment, Section 10.6. ISBN 0201563177.

[3] Stroustrup, Bjarne. C++ Programming Language, Third Edition. Section 14.4. ISBN 0201889544.

[4] Stevens, W. Richard. UNIX Network Programming, Volume 1, Section 6.11. ISBN 0201563177.

Sony is Chief Technical Officer at Pinnacle Software Solutions in Atlanta. He can be reached at sonyantony@bellsouth.net.