Stupid IOStream Bugs


Assume you have a string that contains the following "08-MAY-2000". Now, let's say you want to parse this string into its components so you can create a Date object. Do you think the following should work, or not?

using namespace std:
istringstream istrm("08-MAY-2000");
int day, year;
char sep;
string month;
istrm >> day >> sep >> setw(3) >> month >> sep >> year;
cout << "Day=" << day << ", Month=" << month << ", Year=" << year << endl;

I tried this on several different platforms, and what I got really annoyed me.

On one popular platform, I got

Day=0, Month=-MA, Year=-2000

This particular library apparently treated the leading zero as indicating that the number was in octal notation. When it encountered the "8", which is not a valid octal digit, the scan stopped, leaving "day" set to zero. This is clearly incorrect. Even if you think this behavior makes sense (and at one point I did), the Standard is clear that in the absence of an explicit base setting, input streams are suppose to be treated as decimal.

On another platform, I got

Day=8, Month=MA, Year=-2000

In this case, the library only read two characters into "month" even though the manipulator setw clearly specifies that it should read three characters. This is the kind of behavior expected if reading into a character array. If I specify

char month[4]
istrm >> day >> sep >> setw(4) >> month >> sep >> year;

the date parses correctly. The fact that operator>>(char*) will read one less character than the maximum specified field width (to guarantee that the resulting character array can be properly null terminated) is well documented behavior. This is not the way that string is specified to work. This library is also clearly incorrect.

These bugs are more than just annoying. The first bug cost one company that I consult for a week's effort when a working program was ported to a different platform. The port had gone smoothly and appeared to work fine with initial test data. When tested with a full scale input load, the program would abort after some time. This behavior was actually quite lucky — the Date object that was being constructed from the parsed data had assert statements internally to ensure that input data made sense. Fortunately, these assert statements had not been disabled and were rudely rejecting a date that had a "day" value of zero.

Even though he knew exactly where the program was failing, the poor programmer responsible for the port spent four days trying to figure out why the vast majority of dates would parse just fine, only to eventually fail for no apparent reason. (Note only day values "08" and "09" will fail.) When I was asked to look at the problem, one look at the failing data immediately caused me to recall this bug, which I had encountered in this library more than a year before on another project. The fix then involved hunting down all places in the code that input a number and explicitly setting the base to "decimal"

istrm >> dec >> day ...

Sadly, it wasn't just luck that I was asked to look at the problem — I had written the code that was failing. I was a little annoyed that I had not been more defensive in my programming, but only a little annoyed. After all, the bug is in the library, and I had first discovered it over a year ago. It should have been fixed a long time ago.

I have not explicitly named the libraries that contain these bugs for a number of reasons. Maybe the bugs have been fixed, but the platforms I used do not have the latest versions. Or maybe the library vendors have fixed the bugs, but the compiler vendors who are redistributing the libraries have yet to release the new version of the libraries. This latter situation is particularly annoying. It means that users may know that fixes are available, but be unable to obtain them in a timely fashion because their maintenance agreements are with the compiler vendor instead of directly with the library vendor.

My real reluctance is more personal, however. I have made my share of stupid programming mistakes, some of which actually made it out the door and into the hands of paying customers. But I can guarantee you that they were not still there over a year after someone found them.

This situation is just flatly not acceptable. In one of the cases above, the library vendor would not accept bug reports from users who were not direct customers — if the library came bundled with a compiler, then the bug report had to be submitted to the compiler vendor. The compiler vendor in turn would not accept bug reports from anyone other than the site support contact. On one hand, I can understand the business sense of this — without some filtering, companies can get inundated by stupid programmer errors mistakenly reported as bugs. On the other hand, it completely ties the hands of someone like myself who might be called in as a consultant to do a port as part of an evaluation of a new platform. The evaluation reveals bugs in the compiler and/or library, but until the company is willing to commit to using them anyway, they have no leverage with the vendor to get the bugs fixed. In many cases, they do not have any channels (other than sales) by which to even report the results of the evaluation. As a result, I write memos documenting the bugs I find and then work around them. I know it is unlikely my bug reports will even make it back to the sales representative, let alone get forwarded to the vendor's technical support group, and even more unlikely that they will get fixed. And so they are still there months or even years later waiting to pounce on some unsuspecting programmer. Likewise, the workarounds may not be portable and may bite the unwary sometime in the future.

As programmers, we depend on our compilers to correctly implement the language and the standard library to work in the specified manner. We do not need compiler/library bugs compounding our own mistakes. One of the reasons we have a C++ Standard is so that it is not up to individual vendors to determine how their library should work in cases like the above. I am reasonably sure that most vendors would willingly fix their libraries, if they just find out about the problems. But, since the bugs are their fault in the first place, they need to take the initiative in fixing them. How they do that I will leave up to them. The rest of us should make it a point not to let bugs like this slip between the cracks and hang around to repeatedly infect new code.