Columns


Standard C

Bugs

P.J. Plauger


P.J. Plauger is senior editor of The C Users Journal. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standards committee, WG14. His latest books are The Standard C Library, published by Prentice-Hall, and ANSI and ISO Standard C (with Jim Brodie), published by Microsoft Press. You can reach him at pjp@plauger.com.

Introduction

For the past two years, I have devoted this column to the Standard C library. A side effect of that effort has been a book. (See P.J. Plauger, The Standard C Library, Prentice Hall, 1992.) The book includes most of the earlier columns on the subject. More recently, these columns have been excerpts from the book. I still have a few more episodes to present here before I turn my attention to other matters.

Among other things, The Standard C Library contains a fairly complete implementation of that library, itself written in Standard C. (You can buy the code disk from The C Users Group. Look for the ad near the center of this magazine.) I've shown as much as I can of the code in these columns. I know that seeing working code always helps me understand a specification better. Other programmers report the same thing. The code doesn't have to be the only way to implement the spec. It doesn't have to be the best way. So long as it is fairly exemplary, it can be a big help.

Of course, it's also nice if the code is correct. That's why I tested the code I presented fairly extensively. I had access to the popular C Validation Suite from Plum Hall. That ensured that the code conformed fairly closely to the C Standard. I also had customers help me test the math functions fairly extensively. That let me assert that these functions never lose more than two bits of precision. I even made my own set of (far less ambitious) test programs.

Nevertheless, the code still appeared with quite a few errors. I am a mere mortal, like most programmers. Try as I might, I couldn't test the code nearly as well as it required. I keep finding errors at a steady rate long after the book hit the streets. Worse, other people keep finding errors as well. They have been uniformly polite about reporting the errors to me, which I deeply appreciate. (I appreciate both the politeness and the reports, for different reasons.) But I wish the bugs weren't there to find.

I figure it's time to issue a new release of the code disk. (Version 1.1 is now available from The C Users Group. Existing customers can upgrade at reduced cost.) I also figure that I owe my readers a measure of disclosure. Just what sort of errors does a putative expert (me) make when trying to write good code? Here's your chance to find out.

I lump the errors I made into four broad categories:

I address these categories in order.

Formatting Errors

The first category, formatting errors, ranges from the trivial to the important. At least, you can call the errors trivial if all you care about is code execution. Inconsistent spacing or inappropriate comments don't hurt the final product, but they do cause maintenance problems. I made a number of such errors, which I won't bother to present in detail here.

I have always been religious about formatting source code uniformly. I believe that makes code more readable. It makes it more maintainable by others. And it makes it easier to scan with various text-processing tools. The price you pay in finicky typing and editing is repaid many times over.

The style I adopted for this project differs from my earlier styles. I insisted, for example, that no single source file spread over more than two facing pages. (That's 110 lines of code.) It was particularly dense, as a result. It also led me to indulge in heroics from time to time. Fixing a large file without exceeding my self-imposed limit often cost hours of thought. That's a price worth paying only for a tutorial presentation. A typical coding project should avoid such nonsense.

Some of my formatting errors came from learning this new style. The first files I wrote had the most lapses. Others I can ascribe to a steady evolution of the style. I didn't always go back and catch everything that needed changing. A few arose in the normal course of rewriting code to make it better. Sometimes I ended up leaving dead wood behind. The most flagrant example, to me at least, is this expression statement from malloc:

   _Aldata._Plast = qb ? qb : NULL;
This, of course, reduces to:

   _Aldata._Plast = qb;
The second form not only executes a tad faster, it is much more readable.

My coding style naturally emphasized portability. I was annoyed, therefore, to find that my development C compilers were a bit lax on type checking. (I started out with Borland's Turbo C++ v. 1.0 and the Gnu C compiler on the VAX.) Both let me be cavalier about matching pointer types, for example. Only when I moved to other, more ANSI-compliant compilers did I pick up a spate of diagnostics. Again, these caused no execution errors on the platforms I exercised. But they did make the code less portable.

One violation of my coding standards created an accident waiting to happen. The function tmpnam originally contained the following code:

   char fname[L_tmpnam];
   
   tmpnam(fname);
   if ((str = fopen(fname, "wb+")) != NULL)
       {    /* file successfully opened */
       str->_Tmpnam = malloc(sizeof (fname) + 1);
       strcpy(str->_Tmpnam, fname);
       }
Note what happens when the malloc fails. The code bulls ahead and copies the temporary file name using a null pointer for the destination. It also leaves the temporary file open. Bad news. My improved version instead reads:

   char fn[L_tmpnam], *s;
   
   if ((str = fopen((const char *)tmpnam(fn),
       "wb+")) == NULL);
   else if ((s = (char *)malloc(sizeof (fn) + 1))
       == NULL)
       fclose(str); str = NULL;
   else
       str->_Tmpnam = strcpy(s, fn);
This is much more hygienic.

By rough count, over two dozen files have turned up formatting errors to date. I believe that all were well worth fixing.

C + + Incompatibilities

A major regret is that I didn't make the library code C+ + compliant before I froze the book. I had a C++ compiler handy, but I was running out of time. (Correction — I was running out of extensions. I had already run out of time.) It turned out that an hour's work did the job, but I didn't dare risk the effort before the book was done.

C++ -compliant code is C code that also meets the rules of C++. It's what Tom Plum and Dan Saks call "typesafe C." I consider that term a bit stretched, but it's accurate enough. C++ is somewhat pickier than C in compiling even the common dialect. Code that survives both kinds of translators is bound to be more conservative than even Standard C requires.

I had another reason to launder my library code for C++ land. The Standard C library is a significant component of your typical C++ library. It has already been blessed as such by X3J16/WG21, the joint ANSI/ISO committee developing the C++ standard. I wanted to make sure that the library I wrote works well with current and future implementations of C++. And I didn't want to require that it be compiled as C code.

Finally, I can't help but observe that C++ is a common fixture in more and more C compiler packages. For many, the distinction between the two dialects is already blurring. (As editor of CUJ, I often see articles accompanied by code that the author says is C. But the comments are delimited by C++ -style double slashes.) I figure the less the library code cares about C/C++ distinctions, the more widely usable it will be.

So for all these reasons, I stuffed the library through a C++ compiler and shook out the nits. Two language differences led to numerous changes. One was the meaning of void pointers. The other was the use of braces in initializers.

ANSI C introduced type pointer to void to solve a specific problem. With the inclusion of function prototypes, the language could now both check and convert arguments to library function calls. But that led to problems with, for example, a number of the string functions. Typical existing C programs have been pretty diverse in the types of pointers used in calls to, say, memcpy. The tighter type checking would require all sorts of diagnostics where none had occurred before. We on the ANSI committee X3J11 knew that would be unacceptable for a C language standard.

So we defined pointer to void as a generic data-object pointer type. It has the same representation as a character pointer, but much more lax conversion rules. In fact, you can convert between a void pointer and any other data-object pointer type without writing a type cast. That gives the behavior we wanted for function arguments and return values.

In this way, we intentionally differed from the emerging C++ practice. Stroustrup envisioned pointer to void as the root of a pointer type system. It thus serves as a generic pointer, to be sure. But the conversion rules are generous only in one direction. You have to type cast a void pointer before you can assign it to any other data-object pointer. Thus, C and C++ work much the same way for function arguments. But for function return values, you have to write many more type casts to reassure C++.

The primary culprits are the memory-allocation functions malloc and calloc. I had to change essentially every one of these calls to include a type cast. The string functions whose names begin with mem also tend to return void pointers. That was the second largest batch of changes.

The third largest batch was data initializers. I found that I had much less latitude in writing braces in C++ than in C. Partially bracketed initializers are a no-no. But so too are braces around a scalar initializer, as in:

static const double sqrt2 =
    {1.41421356237309505};    /* BAD C++ */
I still don't know whether this is a C++ requirement or a peculiarity of Turbo C++. I haven't read the draft C++ standard in sufficient detail to find out. In either event, I figure that the tighter rules are probably a good idea. C compilers have been notoriously inconsistent in how they interpret initializers with elided braces. Even the C Standard seems to confuse more people than it helps in this area.

There were two places where I ran afoul of more specific dialect differences. I was foolish enough to define the rename function as:

int (rename)(const char *old, const char *new)
Since new is a C++ keyword, this line produced mysterious diagnostics. I changed the argument names to oldnm and newnm.

The second problem occurred when I defined an enumeration inside a structure:

typedef const struct {
    const char *_Name;
    size_t_Offset;
    enum {L_GSTRING, L_NAME, L_NOTE, L_SET,
    L_STATE, L_STRING, L_TABLE, L_VALUE
    }_Code;
    .....}
In C, this is a compact way of defining the enumeration constants and the structure both at once. In C++, however, the enumeration constants go out of scope at the end of the structure declaration. I had to pull the enumeration outside and give it a name:

enum _Lcode {
    L_GSTRING, L_NAME, L_NOTE, L_SET,
    L_STATE, L_STRING, L_TABLE, L_VALUE
    };
typedef const struct {
    const char *_Name;
    size_t _Offset;
    enum _Lcode _Code;
    ....}
I wasn't crazy about writing all the extra type casts that C++ requires. Otherwise, I felt that the changes I had to make in this category were for the better.

C Standard Compatibility

I thought I knew C pretty well. The Plum Hall Suite showed up several places where I was wrong. Several nasty customer programs showed a few more. Even so, I've learned about another handful in the past year.

The header <float.h> tells you all about the properties of the floatingpoint representation. I made a point of deriving all the macros from a mimimum number of parameters. I spent hours getting all the expressions just right, twisty as they are. Still, I ignored some learned advice about how to compute the number of effective decimal digits (DBL_DIG, FLT_DIG, and LDBL_DIG). I was wrong to do so.

If m is the number of bits in a binary mantissa, you'd think at first blush that the number of decimal digits is m*log102. Not so. The actual number is (m--1)*log102. That led to numerous corrections to the files xfloat.c, where I kept the values used by the <float.h>, and to the test program tfloat.c.

I developed this neat little language for specifying locales. Unfortunately, I didn't make it easy to specify the value of the macro CHAR_MAX, defined in <locale.h>. You need that to specify items such as mon_grouping in struct lconv. So I tossed in the caret ^ as a symbol with that value in locale-file expressions.

When any of the printf functions fail, they are supposed to return EOF. I overlooked that subtlety and just returned the count of characters transmitted. I had to change a macro definition in the file xprintf.c to fix it.

Another printf problem arose with the behavior of the call:

   printf("|#x|", 0);
Should this produce | 0x0 | or simply | 0 |? Whatever your sense of aesthetics may conclude, the C Standard calls for the latter form. (It takes what X3J11 likes to call "a careful reading of the Standard" to find this out.) Luckily, I was able to add a simple qualifier to a test to change from the first result to the second.

Finally, I was surprised to learn that asctime is weirder than I thought. It wants to print dates as "Sun Dec 2 06:55:15 1979\n", not as "Sun Dec 02 06:55:15 1979\n". Unfortunately, the more general function strftime can only produce the second form. I had to introduce a nonstandard extension to strftime to suppress the leading zero on days of the month. I also had to change several other files, including the test program ttime.c, to deal with the consequences.

All in all, however, I feel that I came pretty close to conforming to the C Standard from the outset. I'm happy and relieved to now conform even closer.

Botches

The last category is the most embarassing. I can't argue that I was learning a new style or a new language. I can't argue that I missed a subtle point. I just plain made mistakes.

Interestingly enough, the bulk of the errors took one of two forms:

Each of us is a sucker for certain kinds of errors. I think I know what sorts of botches to look for more carefully in future in my own code.

Here are the nasties I (and others) have caught to date in the library code:

The function setlocale ends by constructing a name for the current locale. If the locale is mixed, the name has different category components separated by semicolons. I botched the test for no subcategories (writing (n == 0) instead of (n == 1)). I also failed to skip over the semicolons as I inserted them into the locale name.

I wrote this really neat additional function called _Fmtval. It uses the detailed locale information to format numbers and currencies for you. The heart of the function is a two-dimensional array of format strings. Unfortunately, I got the subscripts backwards between the declaration and the references. Don't ask me how that botch survived testing — I don't know myself.

Locales need lots of translation tables. The function _Makeloc allocates them as needed. It also marks allocated tables to be freed if the locale proves to be flawed. I got cute and recycled a table element also used sometimes by calls such as isalpha(EOF). The effect was to muck up the behavior of such calls sometimes. I had to get even cuter to rescue my cute trick. (You'd think that I'd know better by now than to overload fields this way. I don't.)

The sort function qsort is hard to get right. I rewrote it several times before it started sorting sanely. I should have rewritten it once more. A careful reader discovered that I was shortening a sort interval too aggressively. Once again, my testing proved inadequate. I failed to throw sets of random data at qsort. Otherwise, I would have found the error much sooner.

The UNIX version of system did a laughably poor job of invoking the shell. I got the arguments all wrong. How the code survived testing is another mystery. Probably I ended up linking in the standard version of system instead of my own.

strtok is a function with an ugly interface that can be implemented with surprisingly little code. That's if you call the other string functions to do all the work. Unfortunately, I chose to call strpbrk, which can sometimes return a null pointer, instead of its hardier cousin strcspn. Rather than add checks for a null pointer return, I rewrote the code to call the better function.

The function strxfrm translates a string into a form suitable for performing locale-dependent orderings. My version calls another function _Strxfrm to do the actual translation. On each call, it passes the space left in the user-supplied buffer. Only I passed the negative of the proper amount. Thanks to unsigned arithmetic, that happens to work right most of the time. If the translation doesn't fit in the buffer, however, it writes beyond the end.

I also got a subtraction backwards in difftime:

   return (t0 <= t1 ? (double)(t1 - t0)
       : -(double)(t1 - t0)); /* WRONG */
The complex expression is designed to work right even if the times (of type time_t) are unsigned. But the negative case is obviously wrong, if you think about it. Obviously, I didn't think about it enough. The correct version reads:

   return (t0 <= t1 ? (double)(t1 - t0)
       : -(double) (t0 - t1));
My last bogus subtraction occurred in computing local times. I followed the usual convention in specifying time zones — offsets increase going West from Greenwich. Unfortunately, I corrected local time as if offsets increase going East. Luckily, I was able to change the sign of just one return value, in function _Tzoff, to fix everything.

That bug led me to my last (or latest) discovery. Seems the code I wrote to parse the environment variable TZ was nonsense. It got all sorts of things wrong. I rewrote it (and tested it) just before I froze v. 1.1 of the code disk.

Conclusion

The lessons to learn from this experience are hardly new. Programmers are not very good at testing their own code, however experienced they may be. Validation testing can catch the obvious conformance bugs, but not always the subtle ones. And they are not very helpful in testing the internal logic of a given implementation.

I was at my best when I took time to write good tests. I was at my worst when I was in a hurry. I was helped immesurably by having tests developed by independent agencies.

On the bright side, I can report that the rate of new bug discoveries is dropping off. (I like to think that that doesn't reflect a growing disinterest. I have reasons to believe otherwise.) Most of the reports I've received the past few months have either been repeats or misunderstandings on the part of the reporters. I appreciate even those. At the least, they show where the presentation or the code is confusing.

I look forward to feedback on the new code disk. Keep those cards and e-mail rolling in, folks. And thanks.