Efficient Thread Coordination

John LaPlante

Improve your thread control logic and make your program faster at the same time.

Introduction

Have you ever written a background thread similar to the following?

extern bool   bExit;
extern IOPort in;
const  int    PERIOD = 100; // ms

while ( ! bExit)
{
  stat = BlockOnInput(in, PERIOD);

  if (Success(stat) && !bExit)
    ReadInputAndDoWork(in);
}

This is a common, but potentially sub-optimal way to write a thread that responds to multiple forms of input. This approach is technically incorrect as well [1]. This thread listens for both incoming I/O using BlockOnInput and cancellation requests from an external thread via global variable bExit. The code relies on rapid polling of the bExit flag in order to respond to cancellation requests.

Performance-wise, there are two problems with this example:

1. Response latency: during the portion of time the worker thread is "blocked" waiting for I/O, there is no way to interrupt it. For any given timeout T, the best average response time we can expect is T/2.

2. Polling overhead: the worker thread is expected to poll variable bExit. This incurs CPU overhead. More significant however is the cost of entering/exiting the blocking routine.

This article presents a C++ class that solves these problems. It provides fast interruption of background threads with virtually no application polling. It allows polymorphic handling of multiple threads and is extensible via template parameters for platform independence and varying I/O models.

Why Control Threads?

Why would you need to interrupt or otherwise externally control a perfectly good background thread? At a minimum, your application would benefit from a graceful and responsive shutdown when complete -- forceful thread termination APIs, when available, are often considered risky [2]. At other times, it may be desirable to have background threads switch state to begin another task. There may be cases where you want to halt processing in order to quickly respond to some sort of emergency event. The design ideas presented here evolved from requirements to respond quickly to shutdown events in large, multithreaded, factory automation systems.

In modern operating environments, as a thread author, you are urged to signal threads with familiar inter-thread communication mechanisms and allow them to change state gracefully themselves. Let's explore how to do this efficiently.

Polling versus Blocking

From your application's point of view, your thread will generally be in one of two states:

1. Running: your thread is actively executing code or APIs that do not suspend your thread -- it is getting CPU time. This corresponds to the ReadInputAndDoWork call in my example.

2. Suspended: your thread is taking little or no CPU time. This is done with the help of the operating system and can take the form of a simple sleep API; wait APIs, which operate on semaphores, mutexes, condition variables, etc.; or blocking I/O APIs such as read, select, etc. [3]. This corresponds to the BlockOnInput call in my example.

You can interrupt a thread that is running by having it regularly poll some intermediate variable or object. You will be forced to face the "Overhead versus Latency Conundrum" (see sidebar) for these portions of your code, so get used to it.

Interrupting a suspended thread is more interesting. A blocked thread can resume in the following ways:

Timeout: the wait times out. This is not interrupting your thread.
Wait satisfaction: the wait is satisfied by receipt of the desired event or data. Again, this is not interrupting the thread but rather normal operation.
Simulate wait satisfaction: arrange your logic to simulate the event for which your thread is waiting, but use a special value that differentiates the event. Alternatively you could force an intentional error condition.
Multiple-input waiting API: satisfy a secondary input channel of a "multiple-input" waiting API. Some operating systems provide assistance with APIs that can "listen" for more than one event or I/O channel simultaneously. The select socket call and Windows' WaitForMultipleObjects are common examples that are exploited here. Your environment may provide other such facilities [4].

Simulating wait satisfaction of your object may not always be possible. Even if it is, it requires you to expose an interface to your blocking object and include special handling of your semaphore message or error condition. This type of logic can obfuscate your code.

A "multiple-input" wait is elegant and more logically correct. The ThreadMediator class presented here relies on the availability of such an API. With this operating system support, you can implement fast and efficient interruption of blocked threads [5] without polling.

Design

I have named the base class ThreadMediator because it is loosely based on the Mediator pattern outlined by the classic GoF Design Patterns book [6]. As described in the book, a mediator:

...is responsible for controlling and coordinating the interactions of a group of objects. The mediator serves as an intermediary that keeps objects in the group from referring to each other explicitly. [7]

In this design, a mediator object derived from ThreadMediator is used as an intermediary for communication between a controller thread and a worker thread. Much like the Mediator pattern, this intermediate object isolates the controller thread from the worker thread and also centralizes the thread synchronization logic. However, ThreadMediator models this coordination between just two objects, rather than a group of objects as intended by the original Mediator pattern.

Also similar to the Mediator pattern, ThreadMediator supports polymorphism, so you can use a different mediator with each background thread. The controller thread simply sees the uniform polymorphic interface to the base class.

Figure 1 illustrates the structure of the sample code (available for download from <www.cuj.com/code>). A mediator object exists for each background thread. Whenever each background thread must perform some lengthy waiting operation, it is done uniformly through calls, such as Wait or Sleep, on the mediator object. This includes waits on thread synchronization objects, waits for I/O, or sleep calls.

When there is a need to signal these background threads, the controller thread sends its intentions via the mediator objects, in this case, with the Signal function.

ThreadMediator responds to this signal by causing an exception to be thrown immediately within the target background thread. It is the responsibility of the programmer to arrange for the proper try/catch blocks to handle this exception in each of the background threads and respond accordingly. ThreadMediator classes include a templated argument for the type of exception thrown, so it can dovetail with existing exception architectures.

The sample code includes three types of threads to illustrate several typical uses for thread mediation. These include:

MonitorThread: periodically awakens to perform a task.
SocketThread: handles input from a socket.
DeviceThread: starts a command on an external device and waits for completion.

MonitorThread is really just a simple timer that is interruptible while asleep.

SocketThread is based on familiar BSD sockets. This example waits for UDP input from any port.

The DeviceThread code emulates an external device with driver semantics that communicate completion status by signaling a Win32 synchronization object handle, such as an event. In the sample code, this is simulated with a Win32 waitable timer handle.

Recoded with this mediator design, my original example boils down to something like this:

try
{
  while (true)
  {
    mediator.Wait(in,INFINITE);
    ReadInputAndDoWork(in);
  }
}
catch(SomeException & ex)
{
  // Handle the signal request
}

Notice that the timeout for the blocking call is now infinite. The code no longer polls. Instead, the operating system is performing this function more efficiently for you. The main execution loop of all three example threads looks very similar to this.

Class Structure

Figure 2 shows a class diagram of the ThreadMediator hierarchy. At the top is an abstract ThreadMediator. This class exposes an interface intended for the controller thread that allows signaling of the threads for interruption. See Listing 1 (ThreadMediator.h) for the implementation.

ThreadMediatorImpl extends ThreadMediator, adding an interface intended for the background threads. This includes all the waiting functions. ThreadMediatorImpl adds little in the way of logic however. Most of the real logic is delegated to a private object parameterized by the template token Sync in Figure 2. (Recall Coplien's Handle/Body and Envelope/Letter idioms, but think compile-time polymorphism for the inner class [8]). Sync is also referred to variously as "synchronizer" throughout the code.

The controller thread treats multiple concrete mediators polymorphically through the ThreadMediator interface. Each background thread can use a different type of mediator object, tailored for the type of synchronization or I/O that is needed. The controller thread only sees the common abstract ThreadMediator base class interface they all share. This is a key underpinning of the formal Mediator pattern, although it is considered optional in simple situations [9].

The ThreadMediator and ThreadMediatorImpl classes constitute the platform-independent portion of the hierarchy. The synchronizer classes isolate platform-dependant code from the mediator classes. Additionally, the synchronizer classes isolate the underlying I/O model, whether it is sockets, events, messages, etc. Think of the synchronizer classes as user-supplied classes that you hand code to extend the Mediator pattern for different I/O models and/or platforms.

The sample code includes two synchronizer classes: SocketSynchronizer and Win32Synchronizer. Listing 2 (Win32Synchronizer.h) shows the implementation of the shorter of the two, Win32Synchronizer.

Win32Synchronizer holds a private HANDLE member _signalObject, which is initialized as an Event. Its Wait function therefore also takes a HANDLE argument as the application object for which to wait. Win32Synchronizer uses WaitForMultipleObjects [10] to perform blocking waits. Calls to WaitForMultipleObjects include the private _signalObject event as the first in the handle array. If there is another object to be waited for, that is added as the second handle in the array.

When an external thread wants to interrupt the background thread, it calls Signal, which calls SetEvent on the _signalObject event. Meanwhile the background thread, presumably already blocked in the WaitForMultipleObjects call, will awaken immediately, WaitForMultipleObjects will return status that _signalObject is signaled, and an exception is thrown. The type of exception thrown is a template parameter to the provided synchronizer classes.

SocketSynchronizer works similarly to Win32Synchronizer except that the private _signalObject member is of type SOCKET, and all blocking waits are performed through select. Its Wait function therefore takes a SOCKET argument as the application object for which to wait. When Signal is called, the _signalObject socket is closed, notifying select that it is ready to read [11], and the whole exception throwing exercise ensues. In a Unix environment, select allows you to wait on file-like handles other than sockets such as external devices, pipes, etc.

Running the Example

Listing 3 (Main.cpp) starts at the top level of a simple example. Three thread objects are created, and references to them are stored polymorphically in an array for convenience. These background threads are started, and then the main thread goes to sleep, awaiting keyboard input from the user.

At this point, the background threads begin to print their current activity to the screen. You will observe them alternating between blocked and unblocked activity, with randomly varying timeouts just to keep things interesting. Upon receipt of an Enter keystroke (newline) from the user, the main thread awakens immediately and calls Signal on the mediator object for every background thread. This is done indirectly through a delegation function on the thread base class. You will observe the response from all threads as they exit quickly.

Extensions and Limitations

There are some variations on this implementation that come to mind. One possible extension is to overload the Wait function to take an array of synchronization objects. Another obvious extension is error handling, which has been ignored in the sample code.

The sample code for SocketSynchronizer uses the brute force technique of simply closing the signaling socket. An alternative implementation would be to create a socket pair (socketpair [12]) and treat _signalObject like a pipe, using simple character writes to signal select.

You can also opt to signal your threads with something other than exceptions, such as return codes. If retaining the exception-based strategy, ensure that your application is relatively exception-compliant by using "Acquisition is Initialization" coding models [13], or rigorously try/catching wherever a resource leak may occur from thrown exceptions.

The sample code was developed under Windows with Visual C++ 6.0 to illustrate the use of both select and the Win32 WaitForMultipleObjects call in the same application. Those portions that are not Win32-centric, such as the base mediator classes and the socket classes, should port pretty cleanly to other environments, but this has been left as an exercise for the reader.

It should also be noted that Win32's CreateWaitableTimer has been used in the device simulations, so this application requires Windows NT 4 or later and will not run under Windows 9x derivatives.

I have used this strategy successfully in two real-world applications. It has improved the efficiency and robustness of both, while isolating the details of control logic. Perhaps your applications can benefit as well.

Sidebar: To Poll or Not to Poll -- That is the Question

Notes

[1] Many would argue it is incorrect because of unsynchronized access to bExit from multiple threads, especially on multiprocessor machines. Technically this is true; however since bExit is a single, atomic byte and is being tested only for zero/non-zero values, I believe this risk vanishes in practice.

[2] Win32's TerminateThread is described by Microsoft as "a dangerous function that should only be used in the most extreme cases." See <http://msdn.microsoft.com/library/en-us/dllproc/ base/terminatethread.asp>.

[3] W. Richard Stevens. Advanced Programming in the Unix Environment (Addison-Wesley, 1992), p. 396.

[4] Some examples here might include ev_receive (pSOS) and ajevwat (AMX). If your thread pumps some form of message, messaging APIs work nicely in this context, such as msgQReceive (VxWorks), MsgReceive (QNX), and GetMessage (Win32).

[5] The terms "fast" and "efficient" are completely relative. What is meant here is fast and efficient relative to the underlying operating system.

[6] Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley, 1995).

[7] Erich Gamma, et al. p. 274.

[8] James Coplien. Advanced C++: Programming Styles and Idioms (Addison-Wesley, 1992).

[9] Erich Gamma, et al. p. 278.

[10] <http://msdn.microsoft.com/library/en-us/dllproc/base/ waitformultipleobjects.asp>

[11] This is actually correct behavior; see [1], p. 400.

[12] Douglas Comer. Internetworking with TCP/IP: Principles, Protocols, and Architectures, 4th Edition (Prentice-Hall, 2000), p. 416.

[13] Bjarne Stroustrup. The C++ Programming Language, 3rd Edition (Addison-Wesley, 2000), Section 14.4.

[14] See <www.osl.iu.edu/download/mpidc95/papers/html/ nieplocha/polling.html> for a similar discussion.

[15] <http://info.iet.unipi.it/~luigi/polling/>

About the Author

John LaPlante holds two B.S. degrees and has programmed computers for 20 years, including incarnations with Hewlett-Packard, The Learning Company, Akamai, Mindport, Loral Corp., and GSI Lumonics. He currently resides in San Diego, CA with his cat, dog, wife, and three kids, automating DNA analysis hardware for Illumina Inc. and can be reached at jlaplante@illumina.com.