Features


Handling Time-Consuming Windows Tasks

Andy Yuen


Andy Yuen has a Master and a Bachelor degree in Electrical Engineering from Carleton University in Ottawa, Canada. He has been working in the software industry for over 14 years. He is currently a senior software engineer with a company in Sydney, Australia, which specializes in automation and network management. Andy can be reached by e-mail at auawyl@grieg.sdi.oz.au.

Introduction

Microsoft Windows 3.X and OS/2 Presentation Manager (PM) programmers are probably all aware of the "1/10 second rule" — an application should take no longer than 1/10 of a second to process a message. Any task which takes longer can be considered a "big job." And a big job degrades the system response. Sometimes, it even brings the system to a standstill while it is running. This happens because there is only one input message queue for all Windows or PM applications. Once you hold up this queue, other applications cannot run. OS/2, to some extent, is less affected by a big job than Windows due to its true multitasking capability; other non-PM tasks can still run. Also, OS/2 has a lot more built-in facilities to combat this problem. (For example, see the sidebar on OS/2's object windows.)

The usual work-around under Windows is to have PeekMessage loops scattered all through a program. This article proposes an alternate solution. It describes the small C++ class Cschlr which provides the functions of a simple scheduler. It manages thread creation, deletion, and synchronization by using counting semaphores and timed functions. A Windows program can create one or more threads to handle time-consuming tasks and then pass control back to Windows quickly via the message loop. Thread interactions can be synchronized by using counting semaphores and timed functions. Although this C++ class has been designed from the ground up for Windows, it can actually be used unmodified for OS/2. I also discuss a number of the design/implementation issues, and illustrate its use with a complete example.

Design

The design criteria of Cschlr are based on the KISS (Keep-It-Simple-Stupid) principle. Consequently, Cschlr must be simple, small, and portable. It is intended to work in the following fashion:

1. Threads can be created before entering the main PeekMessage loop or can be created anywhere in an application.

2. Threads can terminate themselves anywhere within an application.

3. Parameters can be passed to threads during thread creation.

4. PeekMessage needs only be used once in WinMain.

5. Counting semaphores and timed functions provide synchronization among threads.

The interface to the Cschlr class (C++ header file) can be found in Listing 1. Member functions can be found in Listing 2. The constructor and destructor are self-explanatory. I give below a brief description of the rest of the methods:

int CreateThread(THDFN func, int stacksize, void *param);
Used to create a thread, where func is the thread procedure, stacksize is the size of the stack for the thread and param is a pointer to user parameters to be passed to the thread. This method is modelled after the BeginThread function provided by C compilers for OS/2. The returned integer is the thread number.

void Suicide();
The counterpart of CreateThread. A thread can kill itself by either calling Suicide explicitly or by letting itself to fall through the end of the thread function.

Csemq* CreateSem(long lValue);
Used to create a counting semaphore and to initialise its value to lValue. It returns the semaphore object on success or a null pointer on failure.

void Signal(Csemq *sem, long lMaxCount = 256);
Used to signal a semaphore. Sem is a semaphore object created by CreateSem. lMaxCount is the maximum semaphore count allowed. If the current semaphore count is greater than or equal to lMaxCount, its count will not be increased any further. The calling thread always gets back control after calling Signa1. (Refer to the sidebar for a description of the operation of counting semaphores)

void Wait(Csemq *sem);
Used to wait on a semaphore. If the semaphore count is smaller than 1, the calling thread will be blocked until signalled by another thread.

void Preempt();
Used by the calling thread to give up the CPU voluntarily.

void Sleep(long 1Seconds);
Used by the calling thread to go to sleep for lSeconds. Another ready-to-run thread will be despatched.

void GetSemStates(Csemq *sem, long &lCount, int &fWait);
Used to retrieve the current states of the semaphore object sem. On return, lCount contains the semaphore count and fWait is set to TRUE if there is at least one thread waiting on the semaphore and FALSE otherwise.

void Run();
Used to dispatch threads and to give them a slice of the CPU. This method is normally only invoked in the main PeekMessage loop.

Implementation

Cschlr implements the high-level thread and semaphore management services and provides the interface to the user. It maintains a ready-to-run queue for threads and keeps tracks of the identity of the running thread. The state transition diagram of Cschlr is shown in Figure 1. Preempt handles the task switch. It moves the running thread to the ready-to-run queue and schedules another thread from the ready-to-run queue for execution. Wait allows the running thread to wait in a semaphore queue while another ready-to-run thread is dispatched. Signal moves a thread waiting in a semaphore queue back to the ready-to-run queue. And Sleep and Wakeup provide time-delay services to all threads. Please note that Wakeup is used only internally by Cschlr.

In order to implement the above, Cschlr calls on the services of two private classes: Csemq (see Listing 3 and Listing 4) and Cthread (Listing 5 and Listing 6) . Consequently, it is easy to understand how Cschlr works once one knows the internals of Csemq and Cthread. Hence, I'll start with these and come back to Cschlr later.

Csemq is a private class of Cschlr. This means that all of its methods and data are private to the containing class. Only Cschlr can access them, besides Csemq itself, because Cschlr has been declared as a friend of Csemq.

Csemq provides semaphore services to Cschlr. There is a counter and a queue associated with each counting semaphore. Most C++ compilers come with class libraries which provide common objects like lists, stacks, queues, etc. However, if I had used any one of these class libraries, the code would no longer be portable among different compilers. Also, in examining the requirements, and sticking to the small-is-beautiful design philosophy, I decided to implement my own queue management routines because I don't need the full functions of these class libraries anyway.

I use a long integer to represent both a queue and a counter. If the MSB (Most Significant Bit) is a 0, the long integer is a counter, which means that the counter is at most 31 bits long. When the MSB is 1, the long integer is a queue. Each set bit (non-zero bit) in the long integer represents a thread. For example, a value of 0x8003 contains two threads: bit 0 is thread #0, and bit 1 is thread #1. This implies that there are at most 31 threads possible. This should be more than enough for our purpose.

If multiple threads are waiting on a semaphore, which thread should we resume when a signal has been received? I decided that a somewhat round-robin algorithm should be implemented. I said "somewhat" because it is not possible to implement a truly round-robin algorithm using the queue implementation described above — such a queue does not keep the order in which threads are queued. (Someone might point out that this is a contradiction in terms, and that it is not really a queue if it does not maintain the order. Maybe I should have called it an unordered list. Anyway, I'll refer to it as a queue throughout this article.)

In order to simulate a round-robin effect, each Csemq object has a private variable priority associated with it. Priority is set to CSC_NO_THREAD (whose value is 31) when a Csemq object is created. Method Dequeue returns a thread number for Cschlr to schedule its execution. It starts scanning the semaphore queue one bit to the left of the bit position recorded in priority. Since priority is set to 31, it starts scanning from bit 0 (wraparound) moving to bit 1, bit 2, etc.

When a thread is found, its position is saved in priority. For example, if thread 4 (bit 4) is the first thread found waiting on the semaphore, Dequeue starts scanning from bit 5 the next time it is invoked. The reason that I go to such trouble to implement this scheduling algorithm is to prevent a thread from monopolizing the CPU. Take for example the following case: thread 0 and thread 1 are waiting on the same semaphore. Assume that thread 0 starts execution and later waits on the same semaphore. If I start scanning always from bit 0, thread 0 will always get awakened when the semaphore gets signalled. Thread 1 will never get a chance to run. By implementing the aforementioned "somewhat" round-robin scheduling algorithm, this situation could be avoided.

Semaphore operations such as Signal and Wait should be atomic or indivisible. They cannot be interrupted in the middle of an operation. If the operation is not atomic, one might get into the situation where two threads try to perform an operation on the same semaphore concurrently. It may happen that while one operation is half-way through the operation, the operating system switches to another thread which then updates the semaphore count or queue, rendering the semaphore structure inconsistent. Under Windows, this would never happen because of the cooperative multitasking model used: a Windows procedure only gives up the CPU voluntarily. There is no chance of having two truly concurrent threads of execution under Windows. The PeekMessage loop is the mechanism to multiplex the CPU among a number of pseudo-concurrent threads in an application which uses Cschlr.

Class Cthread

Cthread is also a private class of Cschlr. It provides thread objects to Cschlr. In order to keep this class implementation portable, I avoided the use of inline assembly since the syntax is different among compilers. The most portable way to implement this is to use the Standard C library's setjmp and longjmp functions. These were originally designed for error recovery purposes and people are starting to question their usefulness under C++, which has exception handling. However, these two funcitons prove to be invaluable in implementing the Cthread class.

Unfortunately, the use of setjmp and longjmp do not make Cthread 100 per cent portable, due to the layout of the jmp_buf structure. jmp_buf is used to save the context of execution when setjmp is called. The execution context consists of various CPU registers. The only time that I need to know which registers go where is during the creation of a thread. One needs to set the CS:IP (Code Segment:Instruction Pointer) and the SS:SP (Stack Segment:Stack Pointer) to simulate a setjmp during thread creation.

Cthread maps the jmp_buf to a data structure defined within CTHREAD. CPP. The typedef statement:

typedef struct {
    long ip; //CS:IP
    short filler; //don't care
    long sp; //SS:SP
} *mapptr;
defines a pointer type for this purpose. It works for the Zortech C++ Version 3 compilers for Windows and OS/2, which I am using for the development of Cschlr. If you are using a different compiler, you need to change this type definition to reflect the difference in implementation of these functions.

There are two ways to find out the structure of your compiler's jmp_buf. The first is to examine the sources for the C library if they are available. The second method, and probably the easier of the two, is to write a very simple C program which calls setjmp. Compile and link it with debugging information. Then use the source-code debugger to compare the register and jmp_buf values before and after executing setjmp to find out where the CS:IP and SS:SP registers get saved.

The other thing to remember is that Cthread is intended to be compiled using the large memory model. The magic number WORDSTOREMOVE is defined as 3. Different values are needed for different memory models. The magic number is 3 because the C library function setjmp has the form:

int setjmp(jmp_buf env);
A thread pushes env onto the stack, and then calls setjmp, thus saving the 32-bit return address on the stack. setjmp then saves the BP (Base Pointer) register on the stack (the standard C function prologue). When longjmp is called to restore the stack environment, it discards the saved BP and return address on the stack. For the large memory model, the return address occupies four bytes while BP occupies two bytes, for a total of three 16-bit words. Since this function relies on the standard C prologue to work properly, the compiler must be instructed to generate the standard C function prologue instead of the one for Windows. Consequently, you will notice that there are two different compile option macros defined in the Makefile (
Listing 8) for the example program: one for compiling Cschlr, Csemq, and Cthread, the other for compiling the Windows main program.

Cthread saves the context of a thread in the data structure THD. The member Context is the jmp_buf. TotalLen is the total length of the THD structure in bytes. This value is needed for the destructor of a thread object, which frees the allocated storage for the stack. OverFlowed is a flag set to zero to guard against stack overflow. Stack is the stack for the thread. Since the stack grows downward in a X86 processor, if overflow occurs Overflowed will hopefully be overwritten with a nonzero value. Cthread refuses to switch to a thread with a nonzero OverFlowed flag when the Transfer method is invoked.

The implementation of Cschlr is quite straightforward except for CreateThread, which passes a user parameter to the thread, and the implicit termination of a thread by falling through the thread function. Again, Cschlr is designed for the large memory model. CreateThread creates a new thread by:

task[i] = new Cthread(func, stacksize, (int *) &retaddr, 4);
The argument value 4 specifies the number of 16-bit words to copy to the new thread's stack. In order to force a thread to kill itself when falling through the thread function body, a return address is copied to the stack, together with the user parameter to be passed to the thread. This return address points to Kill, a static function within CTHREAD.CPP. See Figure 2 for the stack layout during thread creation. When the thread exits the thread function, it returns to Kill, which calls Suicide to terminate itself. The address of Kill and the user parameter pointer take up four words when the large memory model is used.

Note also that there are 33 slots reserved:

Cthread *task[CSC_NO_THREAD + 3];
in the Cschlr object although there can only be 31 threads. One extra slot is used for the Windows main program while the other is for termination of threads. When Run, the method that multiplexes the CPU among various threads, is called within the PeekMessage loop, its context gets saved in the slot task[MAIN]. When a thread gives up the CPU voluntarily, control is passed back to MAIN so that it can continue with the PeekMessage loop. Consequently, a Wait or a Sleep must not be called within the PeekMessage loop. Otherwise it will bring back the big-job problem that we meant to avoid.

When Sleep is called, the thread is put to sleep in the dummy semaphore queue WaitQ. All sleeping threads are maintained in chronological order in the form of a linked list. During each task switch, the table is scanned and the threads that have reached the end of their sleep are moved to the ready-to-run queue ReadyQ. If there is no ready-to-run thread in ReadyQ, control is returned to MAIN immediately.

The slot task[DUMMY] is used for thread termination. It is required because a Cthread object always saves the context of the running thread during a task switch. task[DUMMY] is used for saving the context of the terminating thread temporarily. The context saved in it never needs to be restored because that thread no longer exists after committing suicide.

An Example

The best way to understand Cschlr is by example. Charles Petzold describes the Salvage benchmark as an example of a "big job" [2]. I'll use the same example to contrast the Cschlr solution to the big job to the solutions described in his article. Instead of starting from scratch, I'll use Gpf Systems Inc.'s OS/2-based Gpf code generator. My version of Gpf is Version 1.3, which runs on OS/2 V1.3. It is a tool very similar to Microsoft's Visual C++. However, it generates Windows, OS/2 16-bit, and 32-bit C code.

It was ahead of its time when I first got it in early 1992. I believe the latest version is Gpf 2.0. Instead of going into details as to how it works, let me just say that it generates the framework for a Windows applications and creates the necessary resources. I am just using it as an event dispatcher, which means that whenever an event occurs — the user clicks on a menu item, for example — Gpf will call my function for handling that event.

The example program is called BIGJOB.CPP. [Note: BIGJOB.CCP is not listed here because of its size. It will be on the monthly code disk. See BIGJOB.H, Listing 7, which implements the main logic of the example. — mb] BIGJOB has only two menu items: Repetition and Action. Repetition allows a user to select the number of iterations to run the Salvage benchmark and Action allows a user to either Start or Abort its execution. Running it for 10,000 iterations may take quite a while depending on the speed of your system. BIGJOB displays a progress report every five seconds and shows the total time taken for running the benchmark to completion. While the benchmark is running, a user can switch to any Windows application at will without appreciable delay. An application which is not designed to execute a big job will lock up the computer for the duration of the benchmark. This example clearly demonstrates the power of Cschlr.

As I have said before, I am using Gpf as an event dispatcher. Whenever an event occurs, one of my functions gets called. The only thing I need to change in the Gpf generated source code is to replace the generated main message loop with my PeekMessage loop. In order to expedite the development, the body of my event handling functions (the file BIGJOB.H) is included in BIGJOB.CCP by an #include directive. People may frown at this method, but it is actually suggested in the Gpf manual [4].

Another thing that needs to be done is to rename the generated file BIGJOB.C to BIGJOB.CPP in order to use C++ classes. Some may find Gpf's generated comments to be excessive. Other may find that, for such a small application, the generated code is more complex than necessary and many functions are included although they never get used. This is no longer true if you use Gpf to develop a large application. After all, there is always a price to pay for the convenience one gets by using Gpf.

The events that I hook into include:

WM_CREATE

Gpf calls CreateThread(VOID) to create two threads: Timer and Big. Timer is responsible for the periodic progress report when the benchmark is running. It invalidates the main window to force PaintWindow(pGpfParms) to repaint the screen. Note that Gpf usually passes a pointer of type PGPFPARMS, which contains all windows-related parameters like handles, messages, etc. to a user-supplied function for event handling.

Big is the thread that actually carries out the benchmark. Notice that Big gives up the CPU voluntarily after each invocation of Salvage by calling mtask.Preempt(). This allows the main PeekMessage loop to regain control so that other Windows applications may run. It also examines fContinue to see whether it should abort the benchmark and enable the Start item in the application menu.

WM_PAINT

Gpf calls PaintWindows(pGpfParms) which displays the appropriate message on the screen.

WM_COMMAND

Gpf calls SelectRep(pGpfParms) which sets the number of repetitions and puts a check mark in the selected item by calling the Gpf function GpfMenuTick.

Gpf calls StartSalvage(pGpfParms) to disable the item Start to avoid further selection. The function then enables the Abort item, sets the boolean variable fContinue to TRUE, and signals the Big thread to carry out the benchmark.

AbortSalvage(pGpfParms) sets fContinue to FALSE, disables the Abort item, and enables the Start item by calling the Gpf function GpfMenuGray.

Semaphores semtimer and semjob are used to synchronize the actions among the Window procedure and the threads Timer and Big.

PeekMessageLoop is the function that replaces the normal GetMessage loop in WinMain.

Cross-platform Development

Cschlr works under both Windows and 16-bit OS/2 PM. Since I am using compilers from the same vendor, I don't even have to make any change to the mapptr type definition in Cthread because the jmp_buf structure remains unchanged for both Windows and OS/2 compilers. Cschlr works under OS/2 because OS/2's PM also has the same architecture as Windows: it uses a single message queue for all PM applications.

Cschlr does not take advantage of the true multi-tasking (multi-threading) capabilities of OS/2. It dispatches threads by multiplexing the CPU with Run() in the main PeekMessage loop in Windows and the WinPeekMsg loop in OS/2. But it has one advantage that true OS/2 multi-tasking does not have. OS/2 threads cannot directly force an event to occur by calling APIs like WinInvalidateRect if the calling thread is not the message thread (the thread where the message queue is created). Cschlr allows you to do that because all threads it creates are actually part of the message thread.

Although one can use Gpf to define the screen layout and use the same Gpf file to generate both Windows and OS/2 C source codes, the user-supplied event handling functions still need to be changed if they use GUI-specific calls, due to the differences between Windows and PM APIs. This brings one to ask the question: is there a tool which allows you to develop an application once and port it to other platforms without any source changes? The answer is both yes and no. Yes, there are some tools in the market which claim to do that. No, they may not do what you intend them to do.

Most of these tools take the common denominator approach, which limits you to use only a small subset of features available from a particular GUI. Although they usually provide ways for you to handle GUI-specific features, once you've done that, the application is no longer portable. Some class libraries attempt to encapsulate GUIs in C++ and force you to learn a totally new set of functions and programming model to gain portability. Some of them have over 150 classes with thousands of methods. The learning curve is steep even if they can do what they claim to do.

My friends and I did some extensive research on cross-platform tools sometime ago. The verdict: package the non-GUI-related functions in the most portable way possible and use the best tool you can find to generate the particular GUI that you are interested in. If the non-GUI part — the heart of your application — is well packaged, it should be possible to interface the two components together. Of course, it is easier said than done. The best way to do this varies depending on the application. Since Cschlr works unmodified under both Windows and OS/2, I think it should qualify as a cross-platform tool.

Conclusions

I have presented a simple C++ class, Cschlr, to handle time-consuming tasks under Windows, and demonstrated its use by a complete example. It should be evident from the example that Cschlr helps in limiting the proliferation of PeekMessage loops that are quite commonplace in applications that are CPU bound. It reduces the total number of PeekMessage loops to exactly one and provides counting semaphores and timed functions for synchronisation.

All these functions are provided without increasing the memory usage significantly, due to the small size of Cschlr. The use of Cschlr also improves the readability of source code by reducing the proliferation of PeekMessage loops and by providing a facility for the user to organize a Windows application by breaking it up into smaller and more manageable pieces in the form of threads. And a Windows application written using threads should make porting to OS/2 less painful.

References

1. Dror, A., Lafore, R. 0S/2 Presentation Manager Programming Primer. McGraw-Hill 1990.

2. Petzold, C. Utilizing OS/2 Multithread Techniques in Presentation Manager Applications. Microsoft Journal, March 1988.

3. Comer, D., Fossum, T. V. Operating System Design Vol. 1. The Xinu Approach. Prentice-Hall 1988.

4. Gpf Systems, Inc. Gpf. Gpf Systems Inc. 1992.