Debugging


Debugging In C — An Overview

Wahhab Baldwin


Wahhab has 25 years' experience in software development and is currently owner of Baldwin Software Services, a small company specializing in helping large companies apply new technology and improve their software development process. He can be reached at 1011 Union St., Manchester, NH 03104.

Despite the advantages C offers in the way of flexibility, expressiveness, portability, and broad library support, debugging must be counted as one of its disadvantages. Every C programmer has experienced the hours of frustration trying to find a bug that would never have occurred in a language such as BASIC or Modula-2.

There are four fundamental approaches to debugging:

In this article I will cover each approach in turn.

Avoiding Bugs

While avoiding creating bugs sounds like a platitude, it is probably the most effective form of debugging. The best method of bug avoidance is to know what is expected of your program and to develop clearly written specifications before writing the code. In addition, writing a comment block for each function — that describes its input, output, and assumptions — will not only prevent bugs, but will make them much easier to find, not only now, but three years in the future.

A second technique for avoiding bugs is to use function prototypes. A common cause of pernicious bugs is disagreement between the parameters passed to a function and those the function expects. The problem might be a long-short mismatch, a signed-unsigned mismatch, or forgetting to take a variable's address. In any case, using fully specified prototypes in a header file will catch or correct all these.

The Microsoft C compiler offers a switch, /Zg, which will generate function prototypes from an existing C program. If you have inherited some program without prototypes, creating header files using this option is an excellent first step.

Hungarian notation can prevent bugs. Named after the nationality of its inventor, Microsoft employee Charles Simonyl, this technique involves placing a prefix in front of a variable name to indicate its type. Examples would be n for count, a for array, p for pointer, s for short, l for long, b for boolean, ch for character, sz for null-terminated strings, or h for handle. These are combined, as in

unsigned short *pusVar;
Since each variable carries its own type information, you are less likely to inadvertently use a variable of the wrong type.

Another frequent cause of bugs is uninitialized data. In C, auto variables and memory allocated by calls to malloc are uninitialized. The programmer himself must make a point of initializing all auto variables. Using calloc (which clears its returned memory) or memset will help avoid problems that arise from uninitiated allocated memory.

Using Tools To Analyze Your Program

lint is a excellent tool for both preventing and finding bugs. I use PC-Lint by Gimpel. Born of the UNIX environment, lint is a program that scans your program, looking for any conceivable semantic error. You may be shocked at the sheer volume of error messages it produces (though PC-Lint allows you to disable certain messages or classes of message). Developing the habit of writing code that passes lint's scrutiny will prevent many a bug. Using function prototypes in a header, as I discussed earlier, will catch some of the errors for which C programmers used to rely on lint. Still, there are many other situations in which lint is invaluable.

One of the simplest tools for program analysis is cb, the UNIX C Beautifier, and its DOS offspring. I use Source Print, by Powerline Laboratories. This product not only indents your source code rationally (in your choice of three styles), but prints output with structure blocks marked. It also provides boldfaced keywords, a table of contents and an index to your functions and/or globals. Aside from making other people's source code easier to read, Source Print has found bugs such as

if (a)
   if (b)
      funb();
else
   fun_noa ();
where the else clause is actually applied to the if (b) line, not the first line as intended.

There are several tools available that can build cross-reference listings, structure charts, etc. Stewart Nutter's cp program in the August, 1988, Dr. Dobb's Journal draws a tree diagram of function calls within your C program. [Also see Eric Bergman-Terrell's article on page 33 of this issue. pjp] CLEAR+ for C, from CLEAR Software, constructs flow charts, tree charts, formatted source listings, and function cross references. Source Doc by Intelligent Solutions not only provides documentation reports, but also prepares function comments which it inserts into your code.

On another level, you compiler's assembly language output is the ultimate documentation of what your program does. Even the best compilers have bugs themselves. Occasionally, it is only by following the assembly code that a program's performance becomes comprehensible. Most compilers have a switch for producing a mixed C-assembly listing. The F3 key in Codeview will let you toggle between source, assembly, and both.

Instrumenting Your Source Code

Probably the best known and still one of the most useful techniques for debugging C is the venerable printf statement. printf statements placed at various points in your code will indicate where execution has reached in your program and will display the values of suspect variables as your program runs. In many cases, this information will show which variable is creating a wrong choice or value, leading you to a quick fix, or to the next routine which must be inspected.

Many tools and techniques can improve your use of printfs. ANSI C provides the __LINE__ and __FILE__ macros, which will indicate the location of the debug statement in your code. You can best use these identifiers by combining them in a macro, such as

#define DEBUGSTR(var) {if (debugstr) \
   printf("DEBUGSTR Module:\n%s Line: %d String "\
   #var "= '%s'\n", __FILE__, __LINE__, var);}
This macro definition uses the new ANSI "stringizing" operator # to put the variable name into the debug statement, and will produce output such as

DEBUGSTR Module:
DEMO.C Line: 162 String szName =
   'Jones, Bill'
This macro demonstrates another feature useful in instrumenting your program — the use of switches to turn debugging statements on and off. Typing command line parameters can control the use of the debugging statements. It's best to leave your debugging code in place until your program is fully debugged.

The assert macro, also a part of ANSI C, is a powerful tool in your debugging arsenal. Written as

assert (expression)
assert will print a debugging message if expression is false, such as

Assertion failed: expression,
file xxx, line nnn
and then abort the program. The assert macro also adds the nice touch that if you define NDEBUG, either at the command line or through a #define statement, no assert code is generated.

Assertions are best used to catch "impossible" cases. For example, say that flag always carries the value 'T' or 'F'. You might write

if (flag == 'T')
   process_true();
else {
   assert(flag == 'F');
   process_false();
}
Developing the habit of using assert will catch those "impossible" conditions that are so hard to find otherwise.

By the way, programs that run in a windowing environment give you a choice of directing debugging information. Debug output can be directed to a file or to a debug window. Each method has its benefits and can be specified on the command-line. If you direct output to a file, use unbuffered file output to ensure you get the last message in case of a program crash.

Managing Memory

There are probably three key causes of bugs endemic to C:

Memory errors can arise since C allows you to allocate and free your own memory.

Writing your own front-end to malloc, calloc, realloc, and free can be a great help in debugging memory errors. The sample code in Listing 1 serves this purpose. [Also see Robert Ward's article on page 40 of this issue. pjp] By adding CDEBUG.0BJ to a library you are using, you can modify your program's behavior with a command-line switch. By defining DEBUG (in Microsoft C, type the /DDEBUG switch at the command line), the debugging versions of malloc, calloc, etc., will be invoked. When your code has been thoroughly debugged, you can recompile without this switch and all debugging code will be deleted.

To use the code in Listing 1, place the line

#include "memdebug.h"
in your source code. When DEBUG is defined, all calls to malloc, calloc, realloc, and free will refer to your debug versions, and will pass the __LINE__ and __FILE__ variables. You can leave the #include "memdebug.h" in your program even when DEBUG is no longer defined, since the statement will generate no code. The real work is done by the replacement functions in memdebug.c (see Listing 2).

The modified malloc checks that the request is not for zero bytes. It will exit the program with a message if sufficient memory is not available. It also adds the allocated memory address to a linked list of memory allocations. The new free checks that it is not being passed a null pointer, and also that the address to free is in the linked list built by malloc. A call to d__memlist at the end of your program shows what memory has not been freed. Additional protection is given by placing two-byte sentinels at the beginning and end of requested memory. The free request checks that these bytes have not been overwritten. This catches common errors such as

char *str1 = "A string";
char *str2;
str2 = malloc(strlen(str1));
strcpy(str2, str1);
Here, malloc requests one too few bytes, since strlen does not include the length of the null terminator on the string. Such a bug can cause problems that are stubbornly intermittent and that show up somewhere completely unrelated in the program.

Another useful instrumentation technique is to insert a macro call at the beginning and end of every function. For example:

my_func(int i, char *str)
{
   char *fn = "my_func";
      BEGIN_FUNC (fn);
      ... /* Code goes here */
      END_FUNC(fn);
}
These macros can be eliminated by using

#define BEGIN_FUNC(x)
#define END_FUNC (x)
Or they can be converted into function calls. You can use this technique not only to provide a backtrace from some point in your program, but to generate counts for each function call. Sherlock, by Edward K. Ream, a commercial tool (now available in the CUG Library), helps you instrument your programs. Sherlock provides a variety of C macros and support routines called by those macros. It places macro calls at the beginning and end of your functions as described above.

External Debuggers

The last fundamental approach to debugging involves using an external debugger. There are two types: hardware and software. Hardware debuggers (or emulators) can debug code that normal software debuggers can't reach, such as device drivers, interrupt handlers, real-time programs, memory-resident utilities, and non-MS-DOS programs. They offer speed advantages when watching for changes to a location in memory. By using a breakout switch, you can debug a program after it has locked up the computer. Because hardware debuggers often use their own memory space, you can use them where there is no room to fit a conventional debugger. On the down side, they tend to be expensive. Also, my experience with them indicates they are harder to use than a software debugger.

I have used the Atron Source Probe debugger with its optional support for PLINK86 Plus-linked files. I was unable to use a software debugger because of MS-DOS's 640K barrier. However, I later switched to Microsoft's Codeview under OS/2. While Atron is now out of business, Periscope offers a range of hardware debuggers that support source-level debugging, including local variables for Microsoft C, support for popular linkers, and support for Microsoft Windows applications.

The most ancient of software debuggers for MS-DOS is of course Debug, included with MS-DOS until version 4. Since Debug does not provide source-level support for C, Microsoft developed Symdeb. Later, Microsoft developed Codeview, which provides full-screen debugging. Grafted on top of its predecessors, Codeview was a great step forward at the time. To use Codeview at the source level requires compiling and linking your program with special options. Special debugging information is placed in the .EXE file.

Programmers debugging very large programs may find MagicCV, by Nu-Mega Technologies, to be valuable. MagicCV lets Codeview run in a separate virtual machine, taking only 4-8K from the address space of your application. Nu-Mega Technologies also makes a special Windows version that is less significant since the release of Windows 3.0. MagicCV runs on 80386s and can coexist with programs like ramdrive or vdisk, but not with DOS extenders like QEMM-386 or 386^Max. However, MagicCV provides its own memory manager with a Load High facility so that drivers and TSRs can be hidden above 640K.

NuMega Technologies also offer SoftIce, which uses the virtual 8088 capabilities of the 80386 to provide many of the same features as a hardware debugger. It works well with Codeview and MagicCV. While slightly weak in ease of use, I found SoftIce invaluable in debugging TSRs. The company's most recent product is Bounds-Checker, which catches invalid memory use in MS-DOS programs.

Other vendors of C compilers offer their own source debuggers, the most noteworthy of which is probably Borland's Turbo Debugger. Using a more modern and friendlier user interface than Codeview, Turbo Debugger offers a number of additional features such as 80386 Virtual Machine support, remote debugging, and special support for hardware debuggers. Borland also offers a utility, TDCONVRT, which converts programs with Microsoft Codeview debugging information to work with its debugger.

Another notable debugger is Logitech's Multiscope Debugger for OS/2 (requires Release 1.1 or above). This modern, window- oriented debugger runs either in text mode or under Presentation Manager. It also includes a post-mortem debugger, in which a special monitor program creates a memory-dump file if your program crashes. The file can then be investigated to discover what went wrong. This feature is especially valuable if users in the field are testing your program. They can send you the dumps they encounter if the program should crash. Multiscope Debugger versions for MS-DOS and Windows are also available.

As I mentioned earlier, I have found the OS/2 environment very helpful for debugging programs which I intend to deliver under MS-DOS. OS/2's protected memory mode automatically catches invalid pointers. Instead of locking up the machine, as it does under MS-DOS, Codeview will highlight the line in error and display a message — Segmentation Violation, for example. Another OS/2 advantage is the lack of MS-DOS's 640K memory space limitation. MS-DOS must hold both the debugger and program in that relatively small space. Finally, I was able to purchase OS/2 versions of the add-on libraries I used in my programs, or I could recompile the source code for them under OS/2 and get them to work.

Another software approach towards debugging is to use a C interpreter. I have worked with C-terp, by Gimpel Software, and Instant-C, by Rational Systems. Both products provide an integrated development environment, support the Microsoft C libraries, support third-party libraries, and support powerful debugging commands. Instant-C includes a DOS extender, making a 16Mb address space available for the interpreter.

Interpreters offer clear benefits over debuggers. Each function you write can be immediately tested with a variety of parameters, to make sure all the boundary cases work properly. Programs can be modified on the fly and tested again without a lengthy compile-link-test cycle. However, for both of the interpreters mentioned, using third-party libraries requires some set-up time, and in some cases I have been unable to get library functions to work. Another minor gripe is that the interpreter environment restricts you to using its editor, which you may not like. However, if you are writing for a vanilla environment or doing a lot of work in a stable environment, a C interpreter may be the debugging environment of choice.

Products Mentioned

Bounds-Checker, by Nu-Mega Technologies, P.O. Box 7607, Nashua, NH 03060-7607. $249.

C-terp, by Gimpel Software, 3207 Hogart Lane, Collegeville, PA 19426. (215) 584-4261. DOS $298, UNIX $398.

CLEAR+ for C, by CLEAR Software, Inc., 385 Elliot St., Newton, MA 02164. (617) 965-6755. $199.95.

Codeview, by Microsoft. Packaged with Microsoft C Compiler and other language products.

Instant-C, by Rational Systems, 220 N. Main St., Natick, MA 01760. (508) 653-6006. $495.

MagicCV, by Nu-Mega Technologies, P.O. Box 7607, Nashua, NH 03060. (603) 888-2386. $199. Also Windows version, $199, or $298 for both.

Multiscope Debuggers, by Logitech, 1235 Pear Ave., Suite 111, Mountain View, CA 94043-1444. (800) 999-8846 or (415) 968-4892. FAX (415) 968-4622. DOS version, $179. OS/2 version, $449. Windows version, $379.

PC-Lint, by Gimpel Software, 3207 Hogart Lane, Collegeville, PA 19426. (215) 584-4261. $139.

Periscope Debuggers, by Periscope, 1197 Peachtree St. Atlanta, GA 30361. (800) 722-7006. From $195 to $4,000.

Sherlock, by Edward K. Ream, was until recently a commercial product. It is now available in the C Users Group Library.

SoftIce, by Nu-Mega Technologies, P.O. Box 7607, Nashua, NH 03060. (603) 888-2386. $386; $300 with MagicCV.

SourceDoc, by Intelligent Solutions, Suite 17E, 10740 Lyndale, Minneapolis, MN 55420. (612) 884-0200. $345 single user.

Source Print, by Powerline Software, Inc., 826 Douglass St., San Francisco, CA 94114. (800) 257-5773. $99.

Turbo Debugger, by Borland, International, P.O. Box 660001, Scotts Valley, CA 95066. (800) 331-0877. Included with Turbo C and other compilers.