November 1995/Locking Without Deadlocks

Multithreading

Locking Without Deadlocks

John Rogers

JR (John Rogers) has spent most of the last 17 years doing systems programming, often in multiprocessing or multitasking systems. He was born and raised on a multitasking IBM mainframe, worked on multiprocessing air-traffic control systems, moved up to UNIX (multi-user), then Ada (which has tasking built in), and most recently worked on the multithreaded OS/2 and Windows NT systems at Microsoft. He can be reached at 72634.2402@CompuServe.com.

Introduction
I want to open people's eyes about unsynchronized resource sharing. I've included some hypothetical horror stories, using OS/2 and other current operating systems for relevance. This article then discusses a number of locking issues, including deadlocks. It gives a number of little known deadlock prevention techniques and discusses one set of locking primitives in detail: the new POSIX semaphores. Lastly, it shows how to implement higher level lock operations (a readers/writer lock package) using the POSIX semaphores.
Microcomputer operating systems have been evolving away from their simple origins. Back in the old days we had one user running one program on one processor at a time. Now we have various options to consider:

multiple processes (tasks)

multiple processors (CPUs)

multiple users

multiple threads within a process

networking
Each of these options allows the sharing of resources. (A resource could be almost anything: a record in a database, something in main memory, a directory entry, an account, a disk file, a tape drive, a communications device, a given robot in a factory, the scheduling information for a college course, etc.) Updating a resource without synchronizing with other users can crash your system. There are, luckily, many locking techniques that you can use to avoid such crashes. Misuse of locking, however, can deadlock (hang) your system.

Some Horror Stories
To get you motivated, here are some hypothetical horror stories, each caused by a race condition of some kind. A "race condition" is when two or more accesses occur to something at the same time (or interleaved in varying orders), where the results depend on the order of execution. Race conditions are notoriously hard to reproduce and debug. I have a strong preference for designing systems so as to avoid race conditions entirely. These descriptions of race conditions cover a wide range of systems and situations. Don't worry about the specifics of any one in particular; I'm just trying to give you a feel for the kinds of things that can go wrong.
Imagine a spreadsheet program implemented as an OS/2 process containing two threads. One thread (let's call it Interface) handles user interface stuff. The other thread (call it Calc) just recalculates out-of-date portions of the spreadsheet. The Interface thread would tell the Calc thread what to do via a work queue implemented as a linked list. What happens if there is no synchronization of updates to pointers in the linked list? Assume that Interface is running and is about to add an item to the linked list. The code in Interface could update the head pointer, making it point at the new entry. Suppose that, before Interface has a chance to update the chain pointers in the linked list, that thread's time slice ends and the Calc thread runs instead. If Calc tries to walk the list at this point, Calc will run into a garbage chain pointer somewhere. In OS/2 terms, this can result in a General Protection fault (GP fault).
Bernstein et al., in their book on concurrency control [1], give the example of a bank-account balance error caused by two interleaved transactions. Consider an account starting with a balance of $1,000 and two transactions each trying to update that account at the same time. The first transaction gets the current balance ($1,000), adds a deposit (of $100), and computes a new balance of $1,100. At the same time, a second transaction gets what it thinks is the current balance ($1,000), adds a deposit ($100,000), and writes what it thinks is the new balance ($101,000). Unaware of this, the first transaction then writes what it thinks is the new balance ($1,100). Regardless of which write happens first, the system loses one of the updates.
Imagine an air-traffic control system maintaining flight plans in a disk-based tree. A node somewhere in the middle starts out with records for flights "CONTINENTAL950" through "DELTA412." A pilot files a new flight plan ("CONTINENTAL976"), causing a split of that node. At the same time, a message arrives directing to delete the entry for another flight plan in that same node, as that flight has landed safely. Since these nodes contain pointers to entire trees of flight plans, it isn't hard to imagine a whole set of flight plans disappearing when the two updates get interleaved. People could die from this lack of synchronization.

Some Synchronization Techniques
Many different synchronization techniques exist out there. In researching this article, I found nine important ones: semaphore, readers/writer lock, mutex (mutual exclusion), test and set, file and record locking via fcntl/flock/lockf, ENQ/DEQ, disabling interrupts, monitors, and rendezvous. In my almost 20 years as a programmer, I can recall using six of these.
Unfortunately, most of these techniques can be misused or accidentally ignored. Just because you decide to control a resource (say a table) with some kind of lock (say a semaphore) doesn't mean that you can't accidentally access the table somewhere without using the semaphore. These synchronization techniques usually emphasize speed rather than robustness. Forgetting to lock or unlock, or unlocking something you didn't lock in the first place, are things to beware. The term "loaded gun" often applies. Improvements are available in some programming languages, though: monitors and rendezvous. In Ada, a rendezvous is part of the language (with a "begin"and "end"), so you can't forget to unlock at the end — it does it for you. I recommend Ada for large systems; this is one of many areas where its careful language design has avoided an entire class of programming bugs.
In various projects in my career, I have used my favorite synchronization technique: a readers/writer lock. It offers simplicity and efficiency. A readers/writer lock allows, at any one time, zero or more readers (sharing the resource without updating it), or exactly one writer (who has exclusive control of the resource and may update it). As one example, the POSIX. 1 fcntl function [5] provides a readers/writer lock for files. Unfortunately, I'm not aware of any readers/writer lock package that can't be misused by forgetting to lock or unlock.

How Can Deadlocks Happen?
Merriam-Webster's dictionary [6] has a great definition of deadlock: "a state of inaction or neutralization resulting from the opposition of equally powerful uncompromising persons or factions." In computer terms, all the members of a group have deadlocked if each member holds a lock needed by another member, and each member is waiting for another member to free a lock. Deadlock can occur, for instance, among processes in a system, or among threads in a single process. Note that there may be more than two members in the deadlocked group. Some other terms I've heard for deadlock are "hang" and "deadly embrace."
There are three deadlock prevention techniques that have existed for decades, so I call them the "classical" techniques. All of them involve static restrictions on the order of locking things (or collecting lock requests together), which the designer must take into account when creating the system.
One classical deadlock prevention technique requires locking all applicable resources at the same time, before any processing begins. In other words, this is likely to be massively inefficient. Yuck!
The second classical technique is a little more flexible but still not very efficient. Before locking anything, all previous locks must be unlocked. The new set of locks must all be obtained at the same time.
Both of those techniques may have made sense back in the days of batch-oriented mainframes, but I'm not aware of any current system using either of them.
On the other hand, I've had much success with lock hierarchies, the third classical deadlock prevention technique. All possible locks are pre-ordered (probably by the software designer). Additional items may be locked as time goes by, without needing to free previous locks, as long as the order of obtaining the locks does not violate the hierarchy. For instance, I set up a hierarchy of about nine locks in the Windows NT file-replicator service. Lock hierarchies also make up the basis for some interesting tree-locking algorithms.
Beyond these classical techniques, there are a number of deadlock detection and resolution algorithms. These are dynamic techniques (as opposed to the static deadlock prevention I mentioned earlier). The dynamic techniques are beyond the scope of this article, however. Goscinski's distributed operating systems book [3] has some 60 pages on the dynamic techniques.

Something New: POSIX Semaphores
POSIX defines an optional set of semaphore functions for use by C programs. Note that these are completely different from UNIX System V semaphores. POSIX semaphores come in two flavors: named and unnamed. Since POSIX. 1b [4] itself doesn't provide threads, under that standard semaphores are used only to synchronize processes. However, another POSIX standard for threads (POSIX.1c) is in progress as of this writing. I assume the newer standard will extend the influence of semaphores to work on individual threads as well.
Even though the POSIX semaphores are defined in a real-time standard, POSIX.1b (formerly POSIX.4), they aren't limited to realtime systems. To test the code in this article, I wrote a set of wrappers which use the Win32 semaphores to partially implement the POSIX semaphores. I use these wrappers under Windows NT, which is not particularly a real-time system. Those wrappers are beyond the scope of this article.
Each semaphore, also called a "counting semaphore," has a numeric value and usually locks a set of countable resources. A positive semaphore value means there are that many resources available at that time. In some implementations, a negative value of the semaphore means there are no more resources available, and the absolute value of the semaphore is the number of callers blocked waiting for those resources. In these implementations a zero value means no available resources and no blocked callers waiting for them. In other implementations a zero value means no available resources (with zero or more callers blocked waiting for them) and negative values are not possible.
As for the operations on semaphores, various terms have cropped up to name the two basic ones. The operation to lock a semaphore has been called "down" (in Algo168 and the Tanenbaum operating system books), "wait" by POSIX and others, and "P" by Dijkstra (the inventor of semaphores). The unlocking operation has been called "post" by POSIX, "release" in the Win32 APIs, "signal" by Andy Yuen, "up" (Algo168 and Tanenbaum), and "V" by Dijkstra.
If the POSIX semaphores are implemented in a given system, they are declared in the <semaphore.h> header. The type sem_t contains the semaphore itself. Except for sem_open, all of the POSIX semaphore functions return an int value of zero if no error occurred, or return -1 and set errno if an error occurred. Semaphores and shared memory are both optional POSIX components. The POSIX feature test macros _POSIX_SEMAPHORES and _POSIX_SHARED_MEMORY_OBJECTS are the ones to check.
POSIX shared memory and semaphores are somewhat persistent. I say "somewhat" because they exist until the next boot, until someone deletes the semaphore via sem_unlink or sem_destroy, or until someone frees the memory (shared or otherwise) containing the sem_t object. Gallmeister's book [2] is full of good advice on writing robust applications using these; he makes the point, "persistence is more likely to be a problem than a feature." One limitation of my Win32 wrappers is that they aren't likely to be as persistent as the POSIX standard requires.

POSIX Named Semaphores
There are three functions that are specific to named semaphores. sem_open creates a sem_t handle, sem_close closes it, and sem_unlink gets rid of the semaphore by name once all handles to it have been closed. The sem_t handle to a named semaphore may be passed to any of the common semaphore functions (including ones to lock and unlock it), but it must never be passed to any of the unnamed semaphore functions.
The sem_open function is declared as:

sem_t * sem_open( const char * Name, int Open_Flags, ...);
Here Name points to the name to be used for this semaphore (see below for naming suggestions). Open_Flags may have multiplebit flags ORed in: O_CREAT and O_EXCL. If O_CREAT is specified, then two additional arguments must be passed: a mode_t value with UNIX-style file permissions (described below), and an unsigned int giving the initial value for this semaphore. A call with O_CREAT but without O_EXCL will open an existing semaphore. On the other hand, if O_EXCL is specified with O_CREAT, then sem_open will fail if called to create an existing semaphore. On success, sem_open returns a pointer to a sem_t object; on error it returns (sem_t *) - 1 and sets errno to the applicable error code.
For POSIX named semaphores (as well as POSIX shared memory and various other things), here are some rules for coming up with portable names.
(1) Use a slash (/) as the first character.
(2) Avoid having any other slashes in the name.
(3) Avoid using the same name for two different kinds of things.
(4) Avoid possible conflicts with things that might be in the root directory (for example, /tmp).
(5) To avoid Win32 compatibility problems, avoid having any backslashes (\) in the name as well.
When you create a semaphore, you must pass a mode_t value to sem_open. This is given as UNIX-style user-group-other permission bits. The <sys/stat.h> header has equates for these, as required in the POSIX.1 standard. Being somewhat paranoid, I use S_IRWXU, which gives the user read-write-execute permissions and forbids any access by anyone else.
The sem_close function is the only way to close the sem_t handle obtained from sem_open:

int sem_close(sem_t * Semaphore_Pointer);
The sem_unlink function schedules the semaphore with the given name to be deleted once any open handles to it have been closed via sem_close. The declaration for sem_unlink is:

int sem_unlink(const char * Name);

POSIX Unnamed Semaphores
The other flavor of POSIX semaphores is unnamed semaphores. For unnamed semaphores to be useful, an application either needs shared memory (another optional POSIX.1b component) or threads support. I believe the standardization of POSIX threads is still in progress, so I won't go into detail on that in this article.
Besides the four functions common to both flavors of POSIX semaphores (described later), there are two routines that are specific to unnamed semaphores: sem_init and sem_destroy. sem_init is declared as:

int sem_init(sem_t* Semaphore_To_Initialize, int Shared_Flag, unsigned int Value);
Here Semaphore_To_Initialize points to an area to be used as an unnamed semaphore, which must be at least sizeof (sem_t) bytes long, and could be in shared or regular memory. Shared_Flag must be zero to use the semaphore in the current process exclusively, or 1 to allow sharing this semaphore with other processes. Value is the initial value of the semaphore, as described earlier. Be aware that with my Win32 wrappers, sharing one sem_t for an unnamed semaphore between processes may be much slower than having named semaphores with the same names but not sharing the sem_t data itself between the processes.
The sem_destroy function cleans up after an unnamed semaphore. (Even if a given sem_t object is in shared memory, sem_destroy must only be called once to take care of it.) This function is declared as:

int sem_destroy(sem_t * Semaphore_Pointer);
Here Semaphore_Pointer must point to an unnamed semaphore (that is, one created via the sem_init function). Also, that semaphore must not have any active waiters.

Common POSIX Semaphore Functions
There are four functions that may be used on either flavor of semaphore: sem_post, sem_wait, sem_trywait, and sem_getvalue. The first three are the important semaphore operations. The two wait (lock) functions are declared:

int sem_wait(sem_t * Semaphore_Pointer); int sem_trywait(sem_t * Semaphore_Pointer);
The sem_wait function tries to lock the supplied semaphore (by examining and updating the semaphore value), and if unable to lock that semaphore, blocks (waits) until it is able to do so. The sem_trywait function is identical to sem_wait, except that rather than waiting, sem_trywait returns - 1 and sets errno to EAGAIN.
The semaphore unlock (sem_post) function is declared:

int sem_post(sem_t * Semaphore_Pointer);
The sem_post function updates the value of the semaphore indicated by Semaphore_Pointer. If applicable, sem_post may unblock one waiter for this semaphore.
The sem_getvalue function is declared:

int sem_getvalue(sem_t * Semaphore_Pointer, int * Value);
sem_getvalue sets the integer at *Value to the current value of the indicated semaphore. Note that the possibly negative values may occur (see earlier discussion). Also, beware the value of the semaphore changing after this call!
Gallmeister's book describes sem_getvalue as "provided for use in debugging semaphore code." I wonder if there might be a use for this function in production code, under carefully controlled circumstances, In particular, I hope to come up with better readers/writer lock algorithms using it. On the other hand, in my Win32 semaphore wrappers, sem_getvalue turned out to be the hardest one to implement. I couldn't find a way to query the count of a Win32 semaphore, so I had to maintain my own count too.

Creating Readers/Writer Locks
I have encapsulated a set of lock operations into a readers/writer lock package. This package provides a portable interface to C programs (not just realtime or POSIX programs). For this article, I have included one implementation of this package, built using the POSIX semaphores and shared memory. Listing 1 (rwlock.h) contains the interface being exported to other code, and Listing 2 (rwlock.c) contains the actual code.

References
[1] P.A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems (Addison-Wesley, 1987).
[2] Bill O. Gallmeister, POSIX.4: Programming for the Real World (O'Reilly & Associates, Inc., 1995). This is a great book on the POSIX C functions for real-time synchronization, including semaphores and shared memory.
[3] A. Goscinski, Distributed Operating Systems: The Logical Design (Addison-Wesley, 1991).
[4] IEEE Std 1003.1b-1993. IEEE Standard for Information Technology — Portable Operating System Interface (POSIX) — Part 1: Application Programming Interface (API) [C Language] — Amendment: Realtime Extensions. This is POSIX.lb, formerly known as POSIX.4.
[5] ISO/IEC 9945-1: 1990. Information Technology — Portable Operating System Interface (POSIX) — Part 1: System Application Programming Interface (API) [C Language]. This is POSIX.1
[6] Merriam-Webster's Collegiate Dictionary, 10th ed., 1993.