Columns


Questions & Answers

Pete Becker

The Value of Code Walks

A second pair of eyes is invaluable in detecting questionable code. Test your troubleshooting talents by joining Pete in a code review.

To ask Pete a question about C or C++, send e-mail to pbecker@wpo.borland.com, use subject line: Questions and Answers, or write to Pete Becker, C/C++ Users Journal, 1601 W. 23rd St., Ste. 200, Lawrence, KS 66046.


Q

Function money in your March '96 column has some errors:

1. You don't check for amount==0 when you prepend a comma. Thus, outputs like $,100,000.00 are possible.
2. You don't check consistently for buffer overflow; for example, money(-10000, buf, 8) tries to write past the beginning of buf.

Here's my attempt. I submit it both to illustrate some "morals" and because I've been picking on you and Bobby Schmidt of late. I thought you two deserved a chance to pick on my code for a change. :-)

#include <stdlib.h>
#include <stdio.h>
static char* prepend (char *current, char *buf, char c)
{
     if (current == NULL ||
         current <= buf)
    return NULL;
    *--current = c;
    return current;
}    //line 9
#define PREPEND(c) ( current= prepend(current,buf,(c)) )
const char *money (long amount, char *buf, int max_len)
{
    char *current;  //line 13
    int comma;
    int negative;
    
    *current-- = '\0';  //line 17
    negative = 0;
    if(amount < 0) {
    negative = 1;
    amount = -amount;
    }
    
    PREPEND('\0');    //line 24
    PREPEND('0' + amount%10);
    amount /= 10;
    PREPEND('0' + amount%10);
    amount /= 10;
    PREPEND('.');
    comma = amount<10000? 6: 3;
    do {
    PREPEND('0' + amount%10);
    amount /= 10;
    if (--comma == 0 && amount != 0) {
    comma = 3;
    PREPEND(',');
    }
    } while(amount != 0);
    if(negative)
    PREPEND('-');
    PREPEND('$');
    return current;
}
    
int main(void) {
    long i;
    //line 47
    enum {max_len = 100};
    (void) printf ("%s\n", money(0, buf, max_len));
    for (i=1; i<=1000000000L; i*=10) {
    (void) printf("%s\n", money(i, buf, max_len));
    (void) printf("%s\n", money(-i, buf, max_len));
    }
    
    {
    const char* s = money(-10000, buf+8, 8);
    if (s && s < buf+8)
    (void) printf ("overflow:%s\n", s);
    else if (s)
    (void) printf ("%s\n", s);
    else
            
       (void) printf ("NULL\n");

    }
    
    return 0;
}

Morals:

1. Always test for buffer overflow.
2. Make sure your test is right.
3. Centralize common logic in one place. (function prepend)
4. In C, the element one past an array can always be addressed. Not so the element one before.
5. When testing, check all boundary cases. (Like $,100,000.00)
6. (Related) Beware off-by-one errors.

I also benefited from running my code through gcc with all the warnings I know of turned on. Maybe there's another moral: Use picky compilers when developing.
—Jim Stern

A

Thanks for your insightful letter. I certainly missed the possibility that a zero amount could remain at the point of a comma insertion. I think it's important to note, however, that your code as it was received by me does not compile. I say this not to gloat (well, maybe a little), but to remind myself and you that it's much easier to find problems in other people's code than to fix them in our own. That's one reason that code reviews and structured walkthroughs are so important to successful development efforts: getting more people to look at the code makes it more likely that the code will work correctly and be maintainable. With that said, let's walk through your code example.

Here are the messages I got when I compiled it with Borland C++ 5.0:

D:\TEST>bcc32 -c test
Borland C++ 5.0 for Win32 Copyright (c) 1993, 1996
Borland International
TEST.CPP:
Warning TEST.CPP 17: Possible use of 'current' before
definition in function money(long,char *,int)
Warning TEST.CPP 25: Conversion may lose significant
digits in function money(long,char *,int)
Warning TEST.CPP 27: Conversion may lose significant
digits in function money(long,char *,int)
Warning TEST.CPP 32: Conversion may lose significant
digits in function money(long,char *,int)
Warning TEST.CPP 43: Parameter 'max_len' is never used
in function money(long,char *,int)
Error TEST.CPP 49: Undefined symbol 'buf' in function
main()
*** 1 errors in Compile ***

The first warning is the compiler's slightly muddled way of telling us that current has not been initialized. If we look at the code, the compiler is indeed correct: current is defined at line 13 without any initialization, then dereferenced at line 17. The fix is simple: initialize it to point to the last character location in the buffer, like this:

char *current = buf + max_len - 1;

The next three warnings all arise inside the macro PREPEND, in the call to the function prepend. They all involve an implicit conversion of a long value to a char. If you expand the macro by hand you'll see that each of these macro invocations does something like this:

current=prepend( current, buf, '0'+amount%10 )

The compiler warning comes up because amount is a long, so amount%10 is also a long, and therefore '0'+amount%10 is a long. Since prepend takes a char as its third parameter the compiler has to truncate the long value to fit in a char, and it is warning us that we're doing something that might not be right. In this case we know what we're doing and we know that the truncation is harmless, so we can use a cast to tell the compiler that it's okay. I think it's a close call whether the cast belongs in the macro PREPEND or in the code that invokes the macro. I lean slightly to making it explicit at the point of the macro invocation, so I'd change each of the three lines that generate warnings to look like this:

PREPEND((char)('0'+amount%10));

The final warning is another symptom of the missing initialization of current. Adding that initializer eliminates the warning.

Finally, the error message at the end comes up because buf has not been defined anywhere. The solution is, of course, to define it. I've added the definition right after the definition of max_len at line 47:

char buf[max_len];

With these changes, the code compiles and runs as expected. But that's not enough: to ensure maintainability and efficiency we really should go through it line by line to see what it does and why. So let's start at the top.

Lines 3-9 define the static function prepend, which inserts a character at the beginning of the current text. That's a good idea. I didn't do it in my original code because I thought it made the presentation of the algorithm more complicated. In production code this sort of thing ought to be isolated in its own function unless there's a good reason not to.

The other thing that prepend does is check whether we've run off the bottom of the buffer each time we insert a character. If so, prepend returns NULL. Since the return from prepend is used as the buffer pointer in the next call to prepend, this error condition propagates forward, and the result of such an underflow is that money eventually returns NULL.

There's a legitimate design question here, concerning how best to check for edge cases like this. In your code you check each time a character is inserted to make sure there's room for it. That's a perfectly reasonable way to do it, provided you've considered the performance and maintainability tradeoffs that this choice imposes. In my code I made a check once at the beginning of the money function to see whether the buffer was large enough to hold the minimum number of characters that the function could generate. In hindsight that clearly should have been a test for the maximum number of characters.

Making this check once takes less time than checking each time we insert a character, so the single check is faster. It has a cost in flexibility, though, because it requires the user to pass in a buffer that is large enough for any possible amount of money, even when the user knows that only a few characters will actually be needed. It results in slightly cleaner code, because insertions are not cluttered up with range checks. I'd opt for the single check, despite its cost in generality. I think the cost is small compared with the gain in speed and readability. I haven't run any actual speed comparisons, but I'd be willing to bet that there's enough of a difference between your version and mine that in some applications the difference would be noticeable. In those cases, of course, the single check wins out.

Line 17 inserts a terminating zero into the string buffer. Line 24 then prepends a terminating zero to the string buffer, resulting in two zeros. Of course, only one is necessary. Not a big deal, but it's something that will puzzle maintenance programmers.

Line 47 defines an enumeration, max_len, which is used to specify the size of the buffer. That's good: avoid the beginner's habit of using magic numbers for things like this. Don't write it this way:

char buf[100];
money(i,buf,100);

That's asking for trouble. When you change the size of buf you have to hunt down every one of those 100s and change them to the new size, except for the 100s that don't refer to the size of the buffer, of course. There are two ways to avoid this problem: create a named constant, as you did, or use sizeof, as I did:

enum { max_len = 100 };
char buf[max_len];
money(i,buf,max_len);

or

char buf[100];
money(i,buf,sizeof buf);

The choice really depends on what other uses you need to make of that constant. I tend to use sizeof when I know that the buffer and its size will always be used together. In a small example program like this that's easy to know. In larger applications it's more likely that you'll have additional uses for the size of the buffer, and using sizeof everywhere will be confusing. In that case, use the enumeration technique. Or switch to C++, where you can create a true constant without a #define:

const max_len = 100;
char buf[max_len];
money(i,buf,max_len);

Another thing in main merits comment: those casts on the calls to printf. If you're using a lint program that demands them, then they're okay. The last time I used lint it was smart enough to know that very few people pay attention to the return from printf (indeed, very few people even know that printf returns a value or what that value means), and it doesn't generate warnings when this return value is not used. The casts are a significant distraction, so unless your tools require their use I recommend getting rid of them.

I don't know if the code as I received it reflects the actual indentation style that you use. I've reproduced it exactly as I received it, but it's possible that some indentation was squashed out in the course of your transcription or in transmission. But for my readers, please think carefully about the effect of indentation on readability. I find this code hard to read, because I can't see where blocks and conditions begin and end.

    comma = amount<10000? 6: 3;
    do {
    PREPEND('0' + amount%10);
    amount /= 10;
    if (--comma == 0 && amount != 0) {
    comma = 3;
    PREPEND(',');
    }
    } while(amount != 0);

Here's how I would indent it:

    comma = amount<10000? 6: 3;
    do  {
        PREPEND('0' + amount%10);
        amount /= 10;
        if (--comma == 0 && amount != 0)
            {
            comma = 3;
            PREPEND(',');
            }
        } while(amount != 0);

This makes it much easier to see where the do-loop ends and which code is controlled by the if. I also changed the location of the opening brace of the if statement to fit my personal preference. There's lots of room to disagree here, and I don't claim that this is the only reasonable way to handle braces in if statements. The indentation of the controlled code, however, is important: it makes the flow of the code much easier to follow.

While we're on the subject of style, I can't help noticing that there are no comments in your code. That's always a mistake: comments tell future maintenance programmers what's happening in the code. They provide a map that helps maintain perspective when someone starts to get lost in the details. In my original version there were a few comments, but they didn't have anywhere near the level of detail that I would have liked. I cut them short because of the limitations of writing a column: the more comments in the code, the less space there is for my own words. When you're writing code you don't have that limitation. Use comments generously.

Finally, I like the test case generator you wrote in main. It hit those edge cases that I overlooked. For functions with fairly restricted sets of input parameters this sort of technique works quite well. Be careful about extending this sort of automatic testing to more complex cases: it's easy to run headlong into a combinatorial explosion of input cases, and produce a set of test cases that takes hours to run. In cases like that, a well chosen subset of the full range can be a better alternative. That, of course, is where testing becomes an art.

Please don't take my comments as personal criticism. Part of the discipline of programming is to be able to accept constructive criticism and improve your coding habits. I'll be more careful about checking for edge cases, and will write more thorough test cases. Thanks for your letter.

Q

In the March issue you answered the perennial question about how to put commas into a formatted number. Your answer was a C function, and you mentioned the usual problems associated with the buffer.

You can get around those limitations by using a C++ class. Design a class similar to this:

class fmtNum {
public:
    fmtNum(long n);
    const char* c_str() const;
private:
    char buff[buf_len];
};

The buffer is filled in appropriately in the constructor, as in your function money. The cast operator just returns a pointer to the buffer. The size of the buffer is set by the designer, who will know the maximum length the string needs, so the user doesn't need to worry about it at all. It is easy to use:

cout << fmtNum(space_remaining).c_str() << endl;
sprintf (results,
        "Candidate %s received %s votes out of %s cast.",
        candidate.Name(),
        fmtNum(candidate.Votes()).c_str(),
        fmtNum(total_votes).c_str());

This has three advantages over a standard C function: 1) you do not need an external buffer (which also eliminates worry about its size), 2) it can be used multiple times in one statement, and 3) it is thread-safe.

I have implemented a version of this class, called formatComma, and it is posted in the Borland BCPP forum on CompuServe (the C++ section, file FMTCOMA.ZIP). Now, it doesn't handle float numbers, and it doesn't handle money, but it could be adapted to do so without too much trouble. It does use the appropriate thousands separator, which it reads either from the C locale information or by calling a Windows system function.

The concept was so simple that I felt stupid that I hadn't found it much earlier. And it was simple; I designed it during lunch, without shortchanging either the code or the enjoyment of the meal.
— Jim Howell

A

That's the great thing about good ideas: they're so simple that when you see them you know they're right. When this happens you shouldn't feel stupid: you should get a feeling of triumph at having accomplished a major simplification of your code. Encapsulation is one of the things that C++ does quite well. By encapsulating the buffer inside a class, you get complete control over its size and its accessibility. Written correctly, this wrapper class prevents multiple uses of a single buffer, which is why it's thread safe. This also means that you can't accidentally use the same buffer more than once, which makes it much easier to use than my C version.

I've changed your code a bit, to make your accessor function look more like the one that will be in the string class in the C++ Standard library. It's named c_str, to suggest that what it returns is a C-style string. If you're tempted to use a conversion operator instead, think again: automatic conversions are trouble. For example, suppose your class had a conversion to const char *:

class fmtNum {
public:
    fmtNum(long n);
    operator const char*() const;
private:
    char buff[buf_len];
};

It could still be used in pretty much the same way as before:

cout << (const char *)
    fmtNum(space_remaining) << endl;

But it would be rather dangerous, because the compiler can use that conversion even when it isn't explicitly called out by your code:

const char *str = fmtNum(space_remaining);

Here, the compiler creates a temporary object of the type fmtNum, applies the conversion to const char *, then destroys the temporary object. If the conversion was written in the obvious way, to simply return a pointer to the internal buffer, the result would be disastrous: str would point to somewhere in the stack, which could conceivably be in the middle of some new object. That's why the C++ standard avoids conversion operators: in many cases they can lead to trouble.o

Pete Becker is Senior Development Manager for C++ Quality Assurance at Borland International. He has been involved with C++ as a developer and manager at Borland for the past six years, and is Borland's principal representative to the ANSI/ISO C++ standardization committee.