Heap Checking

A pair of libraries for handling heap-related bugs

Steve Oualline

Steve is author of C Elements of Style (M&T Books, 1992) and can be contacted at sdo@crash.cts.com.


Because they don't cause immediate problems that are easy to spot, heap errors are among the most difficult and frustrating programming bugs to root out. Heap errors are sneaky. They quietly modify random data, causing other portions of your program to fail and you usually don't have any idea what causes the problem.

In comparison, logic errors are easy to find. Suppose, for example, you make a logic error in a check-balancing program--you subtract, instead of add, deposits. The resulting negative balance is obvious and easy to correct.

With heap errors, however, the balance might end up being something 86,#*2,*(#.%%. Unquestionably garbage is being written into memory--but how? In this case, memory used for data was changed by bad code. The code could just as easily have changed code memory or DOS's memory. Since DOS doesn't like having it's memory monkeyed with, your system will likely crash even to the point that Ctrl-Alt-Del doesn't do the job, forcing a hard reset.

This article presents two libraries which address problems such as these. The SafeHeap library intercepts all the heap-related calls (malloc, free, and the like) and performs extensive error checking before actually executing the operation. The LogHeap library prints out a log message every time the heap changes. These log messages can then be used help locate heap problems, such as memory leaks.

Borland's Heap-checking Routines

Borland recognizes the difficulties of heap debugging and provides you with a number of library routines designed to ensure the integrity of the heap. These functions, see Table 1, give you a tremendous set of tools for finding program errors. For example, if you try to free the same block twice, heap corruption will occur. You can check for this problem by simply using heapchecknode to make sure that the pointer you're freeing is really allocated; see Example 1.

By putting calls to the Borland checking routines in strategic places in your code, you can detect heap problems early. There are some problems with this approach, however. First, you must modify your source code. You must also decide where to put the checks, and you may make the wrong choice. Additionally, the C library makes your job harder by burying heap allocation inside library routines such as strdup. Finally, this method requires work that you want to avoid whenever possible.

The SafeHeap and LogHeap Libraries

To address the problem of heap errors and fill in some of the gaps with Borland C, I wrote the SafeHeap and LogHeap libraries. This set of debugging tools includes code that checks memory leaks, data integrity on each call, and the like. I've also written a group of sample programs that illustrate how to use the library. The source code and executables for the library and sample programs are available electronically; see "Availability," page 3.)

To use the library, you need to divide your program into two parts--one part for the heap routines, the rest for the program. See Figure 1(a). You can also add another layer between the heap and program for debugging. This layer, which I call the "paranoid" layer, checks all heap requests for sanity and report errors the moment they happen. See Figure 1(b). Borland has helped in the construction of this layer by supplying the source code to their run-time library. First, edit the module farheap.asm, renaming the functions as in Figure 2. The prefix r_ indicates that these are the "real" routines. (You don't have to worry about calloc since it actually calls malloc to allocate the memory.)

Now you can write the paranoid layer. Your version of malloc looks Figure 3, although in practice it's not this simple. All this version tells you is that you have an error. It doesn't give you vital information--who or what caused the problem, for instance.

Getting the Return Address

To get the return address, you first need the address of the calling procedure. One way of getting this is to run the program under the debugger, put a breakpoint at the error message, and display the call stack. (Unfortunately, some programs are so large and complex that they resist debugging. Run a program normally and you get a pointer error. Run it under the debugger and you crash the debugger and everything else.)

Here's another way of getting calling procedure's address. Looking at the assembly code for a far call, you see it starts with a CALL FAR instruction, see Figure 4(a), which pushes the return address segment and offset onto the stack. The initialization code of a C procedure looks like Figure 4(b). The first instruction saves the bp register on the stack. Next the current stack pointer is saved in the register bp. This sets up the bp register so that it points to space for the local variables. Finally the stack pointer is adjusted to allocate stack space for these variables.

The result of all this is that the bp register points to a block of memory that contains the information in Figure 5. You can get a pointer to the return address with char **ret_ptr = MK_FP(_SS, _BP+2);. This gives you the absolute return address. You need to transform this address into one that's relative to the beginning of the program. The variable _psp contains the segment of the Program Segment Prefix (PSP). This is a 0x100 byte data area at the beginning of each program. You can use this variable to help transform our return address into something useful.

Start by breaking the address apart into a segment and offset; see Figure 6(a). You then adjust the segment by the value of the PSP segment. One final adjustment is needed for the PSP, 0x100 bytes, or 0x10 paragraphs; see Figure 6(b).

Turning an Address into a Line Number

Once now you've got the caller address, you need its location in the code. The link map comes to the rescue. The tlink /v option or the bcc -lv option generates a link map containing line numbers. Just scan the listing for an address close to the one in the log file to determine the source line containing the error.

For example, when you run the program BAD_FREE (one of the sample programs available electronically), you get the message heap error(free) 023E:0058 20E0:0004 (2,4). The first number in this message is the address call the caused the problem. Looking through the map file for BAD_FREE, you come across Figure 7. You return address (023E:0058) is between line 023E:004D (line 16) and 023E:005A (line 18). Thus, you've narrowed down the problem to line 16 where there's an illegal call to free.

Memory Leaks

A memory leak occurs when you allocate, but never free, a block of memory. This can cause you to run out of memory. To find memory leaks, you need to use a different version of the intercept library. Instead of merely reporting errors, this version reports all heap allocation and deallocation calls.

The result is a complete (and somewhat large) log file you can use for locating memory leaks. All you have to do is pair up the malloc calls with the corresponding frees. Anything else is a memory leak.

Rather than do this manually, I've included the utility H_CHECK. First run your program with the LogHeap library, then run H_CHECK, and you'll get a list of allocated memory that was never freed.

Technical Details

The compiler provided a number of surprises when creating this set of debugging tools. First, the function fopen calls malloc. This normally isn't a problem, unless you're trying to call fopen from your own "intercept" malloc. The problem is that you've introduced an infinite recursion. The function malloc detects an error, calls fopen to open the log file, which calls malloc which detects an error which calls . . . and on, and on.

The solution is to use the log routine to turn off logging while inside the library. If the in_log flag is set, you can ignore any recursive calls.

The other problem concerns NULL pointer checking. The first time I ran BAD_NULL, I found location 0000:0000 was being changed in two places. This was surprising since I'd only changed it once. Who was the mystery player?

As it turns out, the startup code installs a new divide-by-zero handler, although when the program exits, it restores the old value. This was the restoration I was logging. Consequently, I changed NULL pointer checking to not complain if interrupt vector 0 is restored to it's original value (contained in _Int0Vect).

C++ Heap Checking

C++ uses new and delete to allocate memory. In Borland C++, these routines are front ends for malloc and free. Borland C++ defines the symbol __BCPLUSPLUS__ when compiling C++ code. The SafeHeap and LogHeap libraries use this symbol to automatically compile their own version of new and delete. To create a C++ version of the libraries, edit the makefile to include the Borland C++ option -P (force C++ compile).

Conclusion

SafeHeap and LogHeap provide a set of tools for finding many different heap problems. They can't find everything, but they will catch most errors. Better yet, they catch errors early, before they have time to corrupt memory and cause other parts of the program to fail.

Table 1: Borland library routines that deal with the heap.

Routine      Description
heapcheck     Walks through the heap and checks each block's critical
                 attributes such as link pointer and size.
heapcheckfree Checks the free space in the heap to make sure that each
                 word contains the same value. This value can be set by the
                 function heapfillfree.
heapchecknode Checks a given pointer to make sure it points to an
                 allocated block in the heap.
heapfillfree  Fills the free blocks in the heap with a constant.

Example 1: Using the heapchecknode function

Figure 1: (a) Heap routine and main program layers; (b) "paranoid" layer of program added for debugging purposes

Figure 2: Renaming functions in farheap.asm.

Figure 3: The "paranoid" layer version of malloc.

Figure 4: (a) Getting the calling procedure's address using a CALL FAR; (b) initialization code of a C procedure.

Figure 5: The bp register points to a block of memory that contains this information.

Figure 6: (a) Breaking the address apart into a segment and offset; (b) adjusting the segment by the value of the PSP segment.

Figure 7: Line numbers for bad_free.obj (bad_free.c) segment BAD_FREE_TEXT


Copyright © 1993, Dr. Dobb's Journal