Debugging


Don't Blow Your Stack

Jerzy Tomasik


Jerzy Tomasik is president of Effective Experimentation Experts, a firm that specializes in statistics and scientific programming. You may contact him at 159 Laurel Ave., Brea, CA 92621, or as jtomasik on telepath. His CompuServe ID is 70562,1372.

If your programs never terminate with the run-time error "stack overflow" and never crash mysteriously, you may skip the rest of this article. Even if you have mastered the compiler and linker switches required to avert the dreaded "stack overflow" message, you may still be playing with fire. Though you may know the amount of stack space your own code takes, you can't be sure how much space is used by the library routines you've included, especially those third-party GUI libraries without the source code.

Despite the proliferation of C books and magazines, stack management has not received the coverage necessary for programmers who come to C without experience in assembly language. The only book I know that covers the topic fairly well is Compiler Design in C by Allen Holub. Even this text is a little short of practical solutions for everyday programming problems.

The few, simple functions that I'll present in this article have allowed me to trim the stack to the absolute minimum, without any guesswork. The "high-water marking" of the stack requires only two function calls. Even better, you can set up the functions to report the amount of unused stack every time your program terminates.

Memory Organization Of C Programs

Before you can solve your stack problems, you need a good understanding of the memory organization in a typical C program. Begin by examining MEMORG.C in Listing 1. Compiling this program under Microsoft C using

cl /Fs /Fm /AS memorg.c
produces MEMORG.LST and MEMORG.MAP in addition to the MEMORG.OBJ and MEMORG.EXE files. The information that tells what the compiler decides to do with different variables is placed at the end of the .LST file. For example, the compiler calculated the size of str1 (an initialized variable) to be 28 bytes. The compiler computed the file's total data size, however, as 114 bytes. The discrepancy can be found in the five printf format strings. The compiler places each printf's 17-byte string in the static data segment. Bear in mind that these 114 bytes become embedded in your executable program.

Since str2 has not been initialized to anything explicitly, the compiler has no reason to store it as part of the executable file. During load time, however, 2048 bytes will be filled with zeros and reserved for str2. For historical reasons this is called a BSS (for Block Starting with Symbol) area. Finally, str3, though unititialized like str2, has not been placed in the BSS segment. In fact, str3 is not part of any of the segments listed at the end of the .LST file. str3 has been declared with the static keyword.

A variable declared as static maintains its value between function calls because the compiler puts it in a fixed place in memory for the duration of the program, This location does not change between different calls to the same function. What may be less known is that the keyword static also makes the variable visible only within the source file in which it has been declared. This explains why you sometimes see the line

#define PRIVATE static
to improve the readability of source code.

In simple terms, the static keyword tells the compiler everything it must know about a variable. Since a C compiler works with only one source file at a time, it cannot resolve physical placement of global variables, but a variable declared static can only be accessed by the current source files. The compiler, then, is free to determine the placement of static data.

Since str2 is not static, but global by default, it is visible to other files. Only the linker has sufficient knowledge of all the modules comprising the entire program to resolve global variables; the compiler cannot. But near the end of MEMORG.MAP (Listing 3) are 01200H bytes named c_common, placed in the BSS class. str3 (4096 or 1000H bytes) ends up here, in addition to 200H bytes allocated for the C start-up code and libraries.

Local Symbols

Immediately following the ending brace of main in Listing 2, you'll find a list of Local Symbols. Notice that the variables are listed only by name and offset. The offset is relative to the current stack pointer (SP), which is resolved when the routine is executed. (MEMORG has only one function. More complex programs would contain local symbols for each function.)

The compiler places the local (auto) variables on the stack as needed. If a given function can be called from different points in the program, more likely than not the value of SP will be different at these points and local variables will occupy different physical space. Also, the compiler does not zero out the stack space at the end of a function call. So the values of local variables are not only not maintained between function calls, but are practically guaranteed to contain garbage. This also explains why variables declared inside a function cannot be accessed from outside the function — other functions cannot use a fixed address to access them.

Passing Arguments

The compiler also uses the stack as a "scratch pad." C passes function arguments by value, meaning that the arguments' values must be copied to a place that both the calling and the called routine can agree upon. In C this is the stack. Program FUNC_ARG.C in Listing 4 illustrates stack usage during function calls. Again I compiled with the /Fs switch to generate the compiler listing shown in Listing 5.

The only difference between the two functions by_value and by_address is how the big_struct argument is passed. by_value copies the entire structure (1024 or 400H bytes) onto the stack, while by_address only copies the structure's address (two bytes in small memory model). If you compare the offsets of dummy in these two functions, you will find a difference of 1022 bytes. It is easy to see why passing a large variable's address is more desirable than copying the variable itself. In fact, FUNC_ARG.EXE will not run with the default stack size of the Microsoft C compiler.

Up to this point, I haven't mentioned the effects on the stack created by allocating memory dynamically. In Listing 1, the variable heap_str is allocated at run-time. A call to the library function malloc allocates 512 bytes from the heap, which are later released with a call to free. From MEMORG.LST you can see that heap_str takes up only two bytes of stack space. This size is in sharp contrast to auto_str, which occupies 512 bytes of stack space.

Don't forget that the space claimed by malloc remains in use until it is freed. You can return the address of an allocated buffer from a function (this is exactly how the strdup function works), but returning the address of a stack-based variable can and will lead to the most frustrating C bugs. (Remember, storage on the stack gets recycled after the function terminates).

High-Water Marking The Stack

Even if you had the patience to examine each .LST file of your real-life application, you could not look at the libraries that come without the source. In addition, the number of times a recursive function will be called frequently depends on data (for example in qsort). In such cases you would have to analyze the algorithm to predict stack usage at run-time.

What I have shown you so far is only the background you need for dealing with stack problems, and guidelines for coding that will make the problems less likely. Now I'll develop tools to measure exactly how much stack space a program has used, even after calling third-party library routines.

The principle of marking the stack at its "highest" level is very simple. You simply write a known byte pattern over the entire allocated stack area at program initialization, and then determine the last place where the program overwrote the pattern. The actual implementation is not only compiler specific, but is different for each memory model. (You'll need to compile CHCKSTCK.C for the memory model you use.) Since I use only Microsoft C and Turbo C, the package supports only these two compilers. See Listing 6.

Fortunately, using the package is considerably simpler than understanding its internals. I have included a trivial program TESTSTCK.C (Listing 7) to demonstrate how to use the two functions spray_stack and unused_stack. Although you are not required to call spray_stack at the beginning of the program, it is prudent to do so, since any stack problems prior to calling spray_stack may escape your attention. In fact, you can call spray_stack more than once in order to collect stack usage data for different parts of the program. Note that you can call the complementary function, unused_stack, at any point to pinpoint the stack hogs.

Implementing spray_stack

The Microsoft C compiler (more correctly the startup code) defines a global variable called end, the address of which points to the end of the data segment. end also coincides with the bottom of the stack segment. If the stack pointer ever dips below the location of end, your program is almost certainly destined to crash! spray_stack uses end to determine the bottom of the stack.

Microsoft C has a unique feature — the library routine stackavail returns the amount of free stack space. spray_stack therefore uses this function to determine the number of bytes of the stack to mark. The variable end and the function stackavail make implementing the Microsoft-compatible version simple. (This has little relation to the question of which compiler is better.) spray_stack only has to find the address of end (bottom of the stack) and the size of the free stack:

stack_top = stackavail 0 + &end
in order to mark to the top with the signature byte. The function unused_stack finds the bottom of the stack and walks the stack until it finds something other than the signature byte. At this point, we know how many bytes of the stack were left unused.

Turbo C startup code contains a global variable _stklen that you normally initialize inside your code to override the default stack size. The Turbo C version of spray_stack uses this variable to determine the size of the stack. Turbo C provides a very convenient access to 80x86 registers via predefined variables (register name prepended with an underscore). This allows us to find the current stack pointer (SP) simply as _SP. Although this method is extremely non-portable, it's a blessing when you need it. Note that I extensively use compiler-predefined symbols for the memory model.

If you need to port these routines to another compiler, I'll briefly mention that you can always find (approximately) the current stack pointer under any C compiler. Remember that auto variables are placed on the stack. So look at the following code fragment:

unsigned short *get_SP(void)
{
unsigned short sp;
return( &sp );
}
This function returns the current stack pointer with an accuracy of a few bytes, depending on the details of the implementation. Note that you can use the same method to observe the stack during debugging. I routinely open the register window in both Turbo Debugger and CodeView. Keeping an eye on SP will let you know which functions use inordinate amounts of the stack. You can also open a "dump" window to watch the bottom of the stack. Any overwrites by errant pointers will be quickly noticed.

One final implementation detail is that by registering unused_stack with the atexit handler, you can get the stack usage report every time your program terminates. You can also check for an environment variable and report only if the variable is set. Your final product can thereby contain your stack checking routines and help you in on-the-phone trouble-shooting. My current application includes a hot-key to display a window with current SP and stack-usage information.

Conclusion

There are different ways to implement and use spray_stack and unused_stack. If you are a skilled assembly-language programmer, you may want to modify your compiler's startup code to make the functions an integral part of your development environment, though doing so would make the code even more non-portable. You would probably have to modify it for each new release of the compiler. I have used my implementation through upgrades of both compilers and haven't had to make any modifications. I believe this arrangement strikes a balance of simplicity, maintainability and resistance to minor changes between compiler versions.