Debugging


An Alternative Debug Function Macro

Jon Jagger


Jon Jagger currently works for National Transcommunication Limited, Manchester, England. He has six years C programming experience. Jon received a B.Sc. 1st in Computer Science from University College Cardiff in 1989. You can reach him by phone: +44 (0) 1962 822788, by fax: +44 (0) 1962 822409, by e-mail: academic@accu.org.

Why do we need another debugging macro? Aren't the ones we have good enough? Well, maybe not. Consider the common debug function macro:

#define malloc(x) \
   DB_malloc(x, __FILE__, LINE__)
While often very useful, this technique cannot cope with the thorny old problem of variable argument functions (such as printf). If you dig a bit deeper you also find that it fails in two other cases:

In this article I present a little-known alternative macro technique that overcomes these three limitations. The technique is completely generic. Throughout the article I use printf as an example, and utilize macros and functions from check.h and check.c (first versions, shown in
Listing 1 and Listing 2 respectively).

A Simple Macro

Listing 3 and Listing 4 show first versions of printf.h and printf.c. These listings demonstrate that you can immediately eliminate all three problems mentioned above by using a simple macro:

#define printf DB_printf
int DB_printf(const char format[], ...) { ... }
DB_printf as defined here makes a number of checks and issues warnings where appropriate. For example, DB_printf checks if the format string is null. It warns if the '%' character does not occur in the format string, if you are using "%ul" in a format string, and about any use of "%p". After this bit of paranoia, DB_printf makes a printf-equivalent call using vprintf, and finally warns if the call to vprintf failed.

These checks may seem strange, but they all derive from mistakes I have made in the past. For example, I often do this:

printf("Hello world\n");
Since there is no % in the format string, DB_printf reminds me to consider using fputs instead. It's probably faster, and if this is the only way printf is used replacing it with fputs will probably make for smaller executable code as well.

When I print an unsigned long, I remember that the format specifier uses a u (for unsigned) and an l (for long), but I often seem to get them the wrong way round. I write

unsigned long ul = 23;
printf("%ul\n", ul);
but the format specifier should be "%lu". DB_printf catches this for me.

When I use "%p" in a format string I often do this:

struct foobar FB;
printf("%p\n", &FB);
I like to be warned about this; strictly speaking, &FB must be explicitly cast to a void*

printf("%p\n", (void*)&FB);
since no prototype information about it exists.

Different people make different mistakes, but everyone is likely to repeat their own mistakes. When I track down a bug, one small extra check often ensures that when (not if) I repeat that mistake it is automatically detected.

__FILE__ and __LINE__

_FILE_ and _LINE_ are compiler-supplied macros that evaluate to the current source file name and line number respectively. These macros would be much more useful if they enabled reporting of the source-file and source-line of each rogue printf. This is possible. Start by redefining the printf macro:

#define printf Ptr_printf()
and defining Ptr_printf to be a function that returns a pointer to DB_printf:

typedef int PrintfLike( const char format[], ... );
PrintfLike * Ptr_printf( void )
      {
      return DB_printf;
      }
You can now use this extra level of indirection by passing __FILE__ and __LINE__ as arguments to Ptr_printf, which stores them in file scope variables, srcFile and srcLine (so that modified versions of WARNING_IF and UNDEFINED_IF can see them):

#define printf Ptr_printf(__FILE__, __LINE__)

const char * srcFile = NULL;
int srcLine = 0;

typedef int PrintfLike( const char format[], ...  );
PrintfLike *Ptr_printf( const char file[], int line )
      {
      srcFile = file;
      srcLine = line;
      return DB_printf;
      }
However, there are three minor problems with this:

1. Standard C says that __LINE__ evaluates to the line number of the current source line as a decimal constant. But the type of a decimal constant depends on its value, so a large __LINE__ may not be an int. One solution is to convert (with great care) the value of __LINE__ to a string constant:

#define printf Ptr_printf(__FILE__, STRx2(_LINE_))
#define STRx2(tokens) STRx1(tokens)
#define STRx1(tokens) #tokens
2. The technique fails when using an & to take the address of a function. For example,

int main( void )
      {
      int (*pf)(const char format[], ...) = &printf;
      return pf("hello world\n");
      }
is valid, but when the printf macro is visible it also becomes a compiler error. This problem can be overcome with

#define printf (*Ptr_printf(__FILE__, STRx2(__LINE__)))
Now the superfluous * "cancels" any superfluous &.

3. The number of arguments to DB_Trap (via UNDEFINED_IF and WARNING_IF) is unwieldy. I overcome this by wrapping srcCall, srcFile, and srcLine inside a struct. I also split the srcCall into three parts — the return value, the function name, and the parameters. (The reason for this will become apparent later in Listing 11, dbmeta.hi.)

struct Func
      {
      const char * ret;
      const char * name;
      const char * parms;
      const char * file;
      const char * line;
      };
struct Func func =
      {
      "int",
      "printf",
      "(const char format[], ...)",
      NULL,
      NULL
      };
Tying this all together produces new versions of check.h, check.c, printf.h, and printf.c in Listing 5, Listing 6, Listing 7, and Listing 8 respectively. Finally, this code could benefit from a little polishing.

Final polishing

A minor irritation of this technique (exemplified by Listing 7) is that it pollutes the global namespace with two identifiers per debug function; one for the typedef (e.g. PrintfLike) and one for the function (e.g. Ptr_printf). However, we can reduce this pollution to one global name per debug function, since

typedef int PrintfLike (const char format[], ...);
PrintfLike *Ptr_printf(const char file[], const char line[]);
can be rewritten as

int (*Ptr_printf(const char file[], const char line[]))
    (const char format[], ...);
Removing the typedefs in this manner makes the function declarations somewhat tricky. Listing 9 shows dbmeta.h, a header file defining two generic macros, DB_DECL and DB_CALL, that simplify the creation of printf.h. Listing 10 is the final version of printf.h that uses dbmeta.h.

Finally, Listing 11 shows dbmeta.hi, a header file defining a generic macro that simplifies the creation of printf.c. Listing 12 is the final version of printf.c, which uses dbmeta.hi.

Advantages

To recap the advantages of this technique:

1. It works for variable argument functions.

2. It works for parenthesized function calls.

3. It works for a call made via a function pointer.

4. It works if you do not have the function source code.

5. DB_XXXX macros ease the creation of .h and .c files.

Disadvantages

There is a price to pay for this versatility:

1. File scope function pointer initializations may break. For example,

int (*pf)(const char format[], ...) = printf;
at file scope (i.e., outside a function definition) is valid. However, with the printf macro visible it becomes a compiler error, since the initialization is now invalid. This is an inherent limitation of the technique.

2. Code that uses the # preprocessor operator can silently change. Consider the following:

#define STRx2(tokens) STRx1(tokens)
#define STRx1(tokens) #tokens
const char Func[] = STRx2(printf);
Normally this probably preprocesses to

const char Func[] = "printf";
Using the debug printf macro, it preprocesses to:

const char Func[] = "(*Ptr_printf(\"GLOOP.C\",\"122\"))";
Note, however, that you have to try hard to achieve this; the more common

#define STR(tokens) #tokens
const char Func[] = STR(func);
is not affected.

3. Finally, it is possible to write convoluted code such that the __FILE__ and __LINE__ details are incorrect. However, once again you have to try hard to achieve this. For example,

#include <stdio.h>
#include "printf.h"
int main(void)
      {
      int (*fp1)(const char format[], ...);
      int (*fp2)(const char format[], ...);
      fp1 = printf;
      fp2 = printf;
      fp1("Hello world\n");
      fp2("Hello world\n");
      return 0;
      };
4. This technique is not thread-safe.

Thanks

I'd particularly like to thank Kevlin Henney, and everyone I know in The Association of C and C++ Users (ACCU, http://www.bach.cis.temple.edu/accu/) for their helpful advice and dedicated professionalism in C and C++.