March 1990/Multitasking With Lightweight Threads

Features

Multitasking With Lightweight Threads

Gregory Colvin

Trained in cognitive psychology, Dr. Colvin first learned to program in 1972, in BASIC on a PDP-8. He later had the distinction of being the first Cornell University graduate student to purchase an Apple II with student loan money, and has been happily hacking microcomputers ever since. He has been programming professionally in C since 1983. He welcomes comments and queries at 680 Hartford, Boulder, CO 80303 (303) 499-7254.
Often, and against my better judgment, I contract to create applications under operating systems which do not support multitasking. Lightweight threads can sometimes be used to circumvent this limitation; both Microsoft's Windows and Apple's Multifinder use lightweight threads to retrofit multitasking facilities to single tasking operating systems.
This article presents the ANSI C source for a multitasking kernel based on lightweight threads. I have tried to make this kernel as small, fast, simple, and portable as possible. I have successfully used the predecessor of this kernel to implement a real-time graphics display system and to provide background query processing for a database application.

Threads
A thread of computer execution consists of at least three elements: a memory segment containing executable machine instructions, an instruction pointer register which indicates the next instruction to execute, and a data segment for variable allocation. Most computers also support function calls by providing a stack memory segment; a frame pointer register, which points to the current stack frame; and a stack pointer register, which points to the next available space for a stack frame. A stack frame contains the arguments and local variables for a function, and the instruction pointer and frame pointer of the function that called it (see Figure 1) .
A function call typically creates a new stack frame by pushing the function arguments and the current instruction and frame pointer registers on the stack, moving the current stack pointer to the frame pointer register, decreasing the stack pointer enough to leave room for local variables, and moving the address of the called function to the instruction pointer. Thus nested function calls result in a linked list of stack frames on the stack segment, which is traversed as functions return.

Context Switching
A multitasking operating system can execute several threads "simultaneously" on one machine. Since most machines can only execute a single instruction at a time, only one thread is really executing at any one time, but multiple threads appear to run simultaneously because the O.S. performs context switches between threads at frequent intervals. At each context switch, the O.S. saves the contents of the machine registers for the current thread and restores the state of the registers for another thread. Each thread has its own code, data, and stack segments, so that threads cannot ordinarily interfere with one another.
C, unlike ADA or Modula 2, is single threaded, so that an executing C program has one instruction pointer and one set of memory segments. However, the ANSI Standard C library does provide a pair of functions, setjmp() and longjmp(), for saving and restoring the contents of the machine resisters. I have used (some might say abused) these functions to implement multiple threads within a single C program.
The setjmp() and longjmp() calls are unusual, in that the longjmp() call, when successfull, never returns to the function that calls it, whereas the setjmp() call can return to the same function any number of times. The first call to setjmp(jmp_buf) saves the current state of the machine registers in the jmp_buf structure and returns 0. A subsequent call to longjmp(jmp_buf,int) restores the saved registers, causing the nonzero int specified in the calling argument to be returned by setjmp(jmp_buf) to the function that called it.
Usually longjmp() is used to abort from errors in deeply nested functions without actually returning from all the functions. An error handler is installed with a C statement like:

if (error=setjmp(buf)) e(error); else f();
The call to setjmp() returns false, so that function f() is called. Errors in f() or any functions called within f() can then be handled by calling longjmp (buf,error), which causes setjmp() to return error, so that e(error) is called. To use these functions to support switching among multiple threads requires several jmp_buf structures — at least one for each thread of execution.

Implementation
To implement lightweight threads, the single thread of execution of a normal C program must be divided into independent tasks. Since a C program begins its life with a single stack segment, this segment must be divided into pieces, one for each thread. The ThInit(int n,int size) function does this by first calling setjmp() to mark an entry point for a thread, then calling itself until enough room for a thread (size bytes) has been used on the stock. It repeats this recursive process until the desired number of threads (n) has been created. The saved machine registers for each thread are kept in Threads, a global array of thread structures, one structure for each thread. Having thus divided up the stack, new threads can be created by ThNew(void (*root) (int)), which simply saves the address of the root() function for the new thread in Root, then does a longjmp() to restore the registers set by ThInit(). The reactivated ThInit() calls (*Root) (ThCurr), and the new thread is underway. ThNew() returns the ID of the new thread, which is a non-zero index into the Threads table.
The threads created in this way are lightweight in two senses. First, they share the same code and data segments, and are thus not protected from each other by the operating system's memory management. Second, they are are treated as one process by the operating system, and thus are not automatically switched. Thus, the ThJump(int ID) function is needed so that a thread can cause a context switch to another thread, specified by ID. ThJump calls setjmp() to save the state of the current thread, then calls longjmp() to restore the saved state of the destination thread. The ThJump() function does not return until another thread jumps back to it, in which case it returns the thread ID of the jumper. Figure 2 shows a picture of the stack and the global Threads array while running two threads. Thread 1 is the initial thread (that is, the thread that called ThInit()), and Thread 2 is the currently running thread, which has made several function calls since it was jumped to by Thread 1.

Communication And Deadlock
Since lightweight threads share the same data segment they can communicate easily through global variables and shared memory buffers. On this basis, you may implement semaphores, pipes, messages, or any other communication discipline. Whatever communication method you choose, you must "beware the Jabberwock" of deadlock.
For instance, a simple message passing scheme (not a very efficient one) can be implemented by placing the address of a message into an array of pointers, one for each thread, initialized to zero:

Message = (char **) calloc(N_Threads, sizeof(char *));
and then jumping to the message destination:

void msg_send(char *message,int destination) { int id = ThId(); Message[id] = message; do ThJump (destination); while (Message[id]); }
The ThId() macro is used to get the ID of the current thread. The sending thread waits in a loop until the message is received. The destination thread can then receive a message with:

char *msg_recv () { int id; char *m; do id = ThJump(ThNext()); while (!Message[id]); m -- Message[id]; Message[id] = 0; return m; }
If no thread ever sends a message then the receiving thread will never leave the loop, a condition called starvation. This may not be a problem, since if no message is sent there may be nothing for the waiting thread to do. However, consider the following code, which waits for a message from a particular thread:

char *msg_wait(int godot) { int id; char *m; while (!Message[godot]) ThJump(godot); m = Message[godot]; Message[godot] = O; return m; }
If Thread 1 is in the loop, waiting for a message from Thread 2, and Thread 2 is also in the loop, waiting for a message from Thread 1, then neither thread will ever get out of the loop, and no other threads will get to run. This is a deadlock.

Preventing Deadlock
In general, deadlock can occur whenever threads must block to wait for exclusive use of a particular resource (memory buffer,screen, keyboard,disk controller, printer, etc.) that is in use by another thread. In the examples above I have implemented blocking by busy waiting in a loop. It would be better to add a flag to the thread structure, so that blocking could be handled by the kernel. It would then be possible to centralize deadlock control within the kernel.
The easiest way to prevent deadlock is simply not to share resources. For instance, one thread might handle all file IO, another all printer output. Another easy solution is to arrange resources and threads in a hierarchy, so that there is never a conflict between threads. For instance, one thread might buffer keyboard input, while a second thread reads the keyboard buffer, performs computations, and buffers window output, and a third thread reads the window buffers and displays them on the screen. Other cases are harder, and the solutions tend to be task specific.
For example, database programs can prevent deadlock, and ensure data integrity, with the concept of a transaction. A thread needing resources, such as write access to a set of data records, sets out to acquire them, one by one. If all the needed resources are acquired, the transaction succeeds and releases its resources for the next transaction. If any resource cannot be acquired, the transaction aborts, releasing all its resources. The thread then waits for a while, and tries again. Thus, no thread is ever holding a resource while waiting for another resource.
For other examples on preventing deadlock, be sure to check the relevant literature for the tasks you are implementing. A good general discussion, with C source code and further references, can be found in Andrew Tanenbaum's Operating Systems: Design and Implementation (Prentice Ha11,1987). If you fail to ensure that your application is deadlock free you can look forward to mysterious system hangs and other evidence of Murphy's Law.

Caveats
I have tested the code presented here with Microsoft C 5.0 on my 386 clone and with MPW C 3.0 on my Macintosh SE. On my clone the kernel compiles to under one K of code, and executes over 80,000 jumps per second. (Ed: The code is available from the CUG Library; see New Releases.) Be sure to design and test very carefully if you implement a similar kernel. Although the setjmp() and longjmp() functions are portable, this implementation depends on non-portable details of stack implementation. Be especially careful not to overrun the stack areas set up for each thread. I have provided a simple ThProbe() macro that exits with a message if an overrun is detected, but I have succeeded in crashing threads with printf() between calls to ThProbe(). For a more powerful approach to ensuring stack integrity, see the article "A Stack Checking Function" by Eric White in The C Users Journal (Volume 7, Number 3, April 1989). If you exercise proper care, you will find the concept of lightweight threads to be a useful addition to your tool kit.

Listing 1

Listing 2