Threading from Managed C++

Rex Jaeschke

The .NET Framework supports the ability to create multiple threads of execution within a single program. As such, this facility is available to all languages executing in that environment. In this article, I’ll show how threads are created and synchronized. (It is important to note that Managed C++ does not support synchronization of threads in different applications.) I’ll also show how shared variables can be guarded against compromise during concurrent operations.

Creating Threads

In Listing 1, the primary thread creates two other threads, and the three threads run in parallel without synchronization. No data is shared between the threads and the process terminates when the last thread terminates.

Let’s begin by looking at the first executable statement in Listing 1, in case 3a. Here I create an object having the user-defined type Th01. Th01 has a constructor, an instance function, and three fields. I call the constructor passing it a start and end count and an increment amount, which it stores for later use in controlling a loop.

In case 3b, I create an object of the library type Thread, which is from the namespace System::Threading. A new thread is created using such an object. However, before a thread can do useful work, it must know where to start execution. I indicate this by passing to Thread’s constructor a delegate of type ThreadStart, which supports any function taking no arguments and returning no value. (Being a delegate, it could encapsulate multiple functions. However, in my examples, I’ll specify only one.) In this case, I identify that the thread is to begin by executing instance function ThreadEntryPoint on object o1. Once started, this thread will execute until this function terminates. Finally, in case 3c, an arbitrary name is given to this thread by setting its Name property.

In cases 4a, 4b, and 4c, I do the same thing for a second thread, giving it a different set of loop control data and a different name.

At this stage, two thread objects have been constructed. However, no new threads have yet been created. At this stage, these threads are “inactive.” To make a thread “active,” you must call Thread’s function Start, as shown in cases 5 and 6. This function starts a new thread executing by calling its entry-point function.

The two new threads each display their names and then proceed to loop and display their progress periodically. Since each of these threads is executing its own instance function, each has its own set of instance data members.

All three threads write to standard output, and, as you can see from the following, the output from the threads is intertwined:

Primary thread terminating
t1: i =         0
t1: i =    200000
t2: i =  -1000000
t2: i =   -800000
t1: i =    400000
t1: i =    600000
t1: i =    800000
t1: i =   1000000
t1 thread terminating
t2: i =   -600000
t2: i =   -400000
t2: i =   -200000
t2: i =         0
t2 thread terminating

Of course, the output might be ordered differently on subsequent executions.

In this output, the primary thread terminated before either of the other two started running. This demonstrates that although the primary thread was the parent of the other threads, the lifetimes of all three threads are unrelated. (You can make the life of a child thread be dependent on that of its parent by making the child a “background” thread.)

Although the entry-point function used in this example is trivial, that function can call any other function to which it has access.

If you want different threads to start execution with different entry-point functions, you simply define those functions in the same or different classes, as you see fit.

Synchronized Statements

In Listing 2, I have two threads accessing the same Point. One of them continually sets the Point’s x- and y-coordinates to some new values while the other retrieves these values and displays them.

Even though both threads start executing the same entry-point function, you can make each thread behave differently by passing a value to their constructors. The function Sleep suspends the calling thread for the given number of milliseconds.

The potential for conflict arises from the fact that one thread can be calling Move in case 3 while the other is (implicitly) calling ToString in case 4. Since both access the same Point, without synchronization Move might update the x-coordinate, but before it can update the corresponding y-coordinate, ToString could run and display a mismatched coordinate pair. When the appropriate blocks of code are synchronized, as shown in cases 1 and 2, the coordinate pairs always match.

Since Move and ToString are instance functions, when they are called on the same Point, they share a common lock for that Point. To get exclusive access to an object’s lock, you call Monitor::Enter, giving it a pointer to that lock. Then if Move is called to operate on the same Point as ToString, Move is blocked until ToString is completed, and vice versa. As a result, the functions spend time waiting on each other whereas without synchronization, they both run as fast as possible. For the sake of discussion, groups of statements that are used for synchronization will be referred to as “lock blocks.”

Once control of an object’s lock is obtained, you are guaranteed that only one instance function from that class can have its critical code be executed on that object at any one time. Of course, an instance function in that class that uses no lock pays no mind to what any of its synchronized siblings are doing, so you must be careful to use locks as appropriate. Instance functions’ lock blocks that are operating on different objects do not wait on each other. A lock is released when the code associated with a lock block terminates normally or an exception is thrown from within it. Therefore, the lock is in place while code within a lock block calls any and all other functions.

It is the programmer’s responsibility to avoid a deadlock, that situation when thread A is waiting on thread B, and vice versa.

Consider Listing 3. When the lock block begins execution in case 2, the lock for the array pointed to by array is engaged thereby blocking all other code that also needs to synchronize on that array, such as case 3 when both functions are called to operate on the same array.

A lock block can contain another lock block for the same object since it already has a lock on that object. In this case, the lock count is simply increased; it must decrease to zero before that object can be operated on by another synchronized statement in another thread. A lock block can also contain a lock block for a different object, in which case, it will be blocked until that second object becomes available. Here’s an example:

  static void CopyArrays(int list1 __gc[],
                         int list2 __gc[])
  {
/*4*/    Monitor::Enter(list1);
/*5*/      Monitor::Enter(list2);
        Array::Copy(list1, list2,
          list1.Length);
      Monitor::Exit(list2);
    Monitor::Exit(list1);
  }
};

The obvious thing to do with a lock block is to use the instance object for the parent function. However, you can invent lock objects and synchronize on them without actually having those objects contain any information (see Listing 4).

In Listing 4, class Th04 has a lock object called fileLock that contains no data and is never initialized or used in any context except a lock block. Functions M1 and M2 each contain a statement that must be blocked while the other runs, and vice versa.

If a class function (rather than an instance function) needs synchronizing, the lock object is obtained by using the __typeof operator. There is one lock object for each class (as well as one for each instance of that class). A lock on a class means that only one class function’s lock block for that class can execute at a time as shown in Listing 5).

Other Forms of Synchronization

You can control synchronization of threads directly by using a number of functions in classes Monitor and Thread. For example, consider a buffer into which one thread writes and another thread reads. The two threads are synchronized such that you can’t process the contents of the buffer until the creator has put something there, and the creator can’t put another message there until the previous one has been processed.

Although space limitations prohibit me from showing an example, the functions involved in such an arrangement are Monitor::Pulse and Monitor::Wait. Basically, the writer thread gets ownership of the buffer, writes to it, and then calls Pulse and releases its hold on the buffer’s lock. Some thread waiting (via Wait) for that lock to become available then gets ownership, reads from the buffer, and then releases its lock. Calling PulseAll allows all waiting threads to continue, rather than just one of them.

The function Thread::Sleep can also be useful, as it allows a thread to be suspended for a given number of milliseconds.

volatile Fields

The field modifier volatile tells the compiler that no one thread controls all aspects of this field. Specifically, one or more other threads might be reading from and/or writing to this variable asynchronously. Essentially, this modifier forces the compiler to be less aggressive when performing optimization. Consider the following:

  volatile int i = 0;

/*1*/  i = 10;
/*2*/  i = 20;

/*3*/  if (i < 5 || i > 10) {
    // ...
  }

  int copy = i;

/*4*/  if (copy < 5 || copy > 10) {
    // ...
  }

In the absence of volatile, case 1 could safely be ignored, since you immediately overwrite the value of i in case 2. However, given the volatile modifier, the compiler must perform both store operations.

In case 3, the compiler must generate code to fetch the value of i twice. However, its value might change between fetches. To make sure you are testing the same value, you have to write something like case 4 instead. By storing a snapshot of i in the non-volatile variable copy, you can safely use the value of copy multiple times knowing that its value cannot be changing “behind the scenes.”

By using volatile, you can avoid explicit synchronization for certain kinds of variable access.

Thread-Local Storage

When writing a multithreaded application, it can be useful to have variables that are specific to a particular thread. Although this is not a standard feature of C++, Managed C++ supports it via an attribute that allows “Thread-Local Storage.” For example, consider the following program:

#using <mscorlib.dll>
using namespace System;
using namespace System::Threading;

__gc class Th07
{
/*1*/  int f1;
/*2*/  static int f2 = 20;
/*3*/  [ThreadStatic] static int f3 = 30;

f1 is an instance field, so each instance of type Th07 has its own copy, and that exists for the life of its parent object. On the other hand, f2 is a class field, so there is only one occurrence of it for the class, regardless of the number of instances of that class. In theory, this field exists until the applications terminate. Neither of these fields is specific to a thread. With the appropriate constructs, both kinds of fields can be accessed by multiple threads.

Simply stated, thread-local storage is memory that is owned by a particular thread, and that memory is allocated when a new thread is created and deallocated when that thread terminates. It combines the privacy of local variables with the persistence of static variables. A field is marked as being thread-local by attaching to it the attribute ThreadStatic, as shown above in case 3. Like a local or static field, a thread-local static field can have an initializer.

public:
  Th07()
  {
    f1 = 10;
  }

  void TMain()
  {
    String *threadName =
      Thread::CurrentThread->Name;
    
/*4a*/    ++f1; ++f1; ++f1; ++f1; ++f1;

/*4b*/    Monitor::Enter(__typeof(Th07));
    ++f2; ++f2; ++f2; ++f2; ++f2;
    int f2LocalCopy = f2;
    Monitor::Exit(__typeof(Th07));

/*4c*/    ++f3; ++f3; ++f3; ++f3; ++f3;

    Console::WriteLine(S"Thread {0}: f1 = {1},
      f2 = {2}, f3 = {3}", threadName,
      f1.ToString(), f2LocalCopy.ToString(),
      f3.ToString());
  }
};

Function TMain is the entry point for new threads. This function simply increments the three fields, f1, f2, and f3, five times each and prints their current value. The lock block starting in case 4 makes sure that no other thread can concurrently access the shared variable f2 while its value is being incremented or printed.

int main()
{
/*5*/  Thread::CurrentThread->Name = S"t0";

  Thread *t1 = new Thread
    (new ThreadStart(new Th07(),
    &Th07::TMain));
  t1->Name = S"t1";

  Thread *t2 = new Thread
    (new ThreadStart(new Th07(),
    &Th07::TMain));
  t2->Name = S"t2";

  t1->Start();
/*6*/  (new Th07())->TMain();
  t2->Start();
}

The primary thread sets its own name to t0 in case 5 and then creates and starts two threads. It also calls TMain directly, as a regular function rather than as part of thread creation and startup. Here are several possible examples of the output that can result. (The only difference between the possible outputs is the order in which the threads do their incrementing and printing.)

Thread t0: f1 = 15, f2 = 25, f3 = 35
Thread t1: f1 = 15, f2 = 30, f3 = 5
Thread t2: f1 = 15, f2 = 35, f3 = 5

Thread t1: f1 = 15, f2 = 25, f3 = 5
Thread t0: f1 = 15, f2 = 30, f3 = 35
Thread t2: f1 = 15, f2 = 35, f3 = 5

Each of the three threads has its own instance of f1, which is initialized to 10, so it is no surprise that each has the value 15 after being incremented five times. In the case of f2, all three threads share the same variable, so that one variable is incremented 15 times.

The threads t1 and t2 go through the thread-creation process, each getting its own version of f3. However, these thread-local variables take on their default value zero, rather than the initializer 30 shown in the source code. Beware! Then after being incremented five times, each has the value 5. Thread t0 exhibits different behavior. As you can see, this thread was not created by the same machinery as the other two threads. As a result, its f3 does take on the explicit initial value, 30. Also note that in case 6, TMain is being called as a regular function, not as part of the creation of a new thread.

Atomicity and Interlocked Operations

Consider the following scenario: an application has multiple threads executing in parallel with each thread having write access to some shared integer variable. Each thread simply increments that variable by one as follows:

++value;

Well that looks harmless enough. After all, this looks like an atomic operation, and on many systems, it is, at least from the point of view of a machine instruction. However, Managed C++’s execution environment does not universally guarantee this.

To demonstrate this, Listing 6 has three threads, each concurrently incrementing such a shared variable ten million times. It then displays that variable’s final value, which, in theory, should be thirty million. The resulting application can be run in one of two modes: the default mode is unsynchronized and uses the ++ operator; the alternate mode, indicated by using a command-line argument of Y or y, uses a synchronized library increment function instead.

When the standard ++ operator is used, five consecutive executions of the application resulted in the following output (when run on an Intel Pentium-based system):

After 30000000 operations, value = 20672006
After 30000000 operations, value = 19852180
After 30000000 operations, value = 27631609
After 30000000 operations, value = 25205827
After 30000000 operations, value = 25429970

and as you can see, the reported total falls far short of the correct answer. Simply stated, between 20 and 30 percent of the increments went unreported. When the same program runs in synchronized mode — that is, using Interlocked’s Increment instead — all thirty million increments are done and reported correctly, as follows:

After 30000000 operations, value = 30000000

Class Interlocked also has a Decrement function.

Rex Jaeschke is an independent consultant and developer and leader of seminars, specializing in programming languages and environments, including .NET. He serves as editor of the C# standard. Rex can be reached at rex@RexJaeschke.com.