March 2000/Standard C/C++

C/C++ Contributing Editors

Standard C/C++: Testing Conformance

P. J. Plauger

It's impossible to prove conclusively that an implementation conforms to a standard, but you can at least build some level of confidence.

Introduction

"If you are sitting in an exit row and you cannot understand this card or cannot see well enough to follow these instructions, please tell a crew member." — from a United Airlines safety card

I write these words on my way from London to Boston, aboard a United Airlines Boeing 767. Every time I fly United, which is pretty often by civilian standards, I marvel at the wording of that safety card. On the face of it, it is sheer nonsense. People who can understand the first sentence, shown above, are doubtless well equipped to understand the rest of the safety card and follow its instructions. People who cannot understand the first sentence are not likely to disqualify themselves — they don't even know they're supposed to do so. Looks like a vacuous request, doesn't it?

Of course, like most airline speak, it doesn't mean exactly what it says. (How can you "preboard" an airplane? Do you get on it before you get on it? And how can "flight attendants prepare for landing?" If the cockpit crew can't handle that important chore themselves, the people in back aren't likely to be of much help, now are they? And the list goes on.) I eventually concluded that the mysterious sentence is actually airline speak for the union of three messages:

1) If you are sitting in an exit row and you cannot fully understand this card, or cannot see well enough to follow these instructions completely, or are feeling generally bloody minded or oppositional, insist that the flight crew trade you up to a better seat.

2) If you are sitting in an exit row and the bloke between you and the window exit looks like a real survival risk in the event of an emergency landing, whisper an anxious warning to the flight crew and hope they do something about it.

3) If you are sitting in an exit row and you freak out during an emergency landing, the attorneys for the airline will do all they can to shift responsibility to you for any deaths or injuries, having warned you of your duties with this safety card.

In each of these scenarios, the safety card has achieved a useful goal (for the airline, at least) without having to state the goal too baldly. Not bad, for a piece of apparent nonsense.

I have learned to be more sympathetic to the anonymous author of that safety card. Over the past several years, I've found myself dealing with more and more test code. Test code that checks the basic sanity of a compiler and its library has to get pretty introspective at times. Some of the tests appear tautological, others simply mysterious. I wouldn't want to have to explain them to a flight attendant, to be sure. On the face of it, some of those tests are as nonsensical as that landing card. That's self assessment for you.

My company, Dinkumware, Ltd., develops the libraries that accompany standard-conforming C and C++ compilers. We have also developed a core Java library, though it's harder to talk about standards with that language. In any event, we have to test the libraries we produce before we ship them to customers. We license existing test suites from others, where possible, but we also develop our own, to be sure of adequate coverage. We even license these test suites, which we call our Dinkum Proofers, to others, be they customers for our libraries or not. I am the initial author of some of the test code, while Bobby Schmidt initially wrote the rest. Pete Becker has subsequently worked things over, so we all now share credit or blame for the final products.

Of course, there's more to conformance testing than just poking at the libraries. The C and C++ Standards (and the Java core language specification) each come in two roughly equal parts. The first half defines the language that should be recognized by a conforming translator, the second half defines the library that should accompany the translator. Being in the library business these days, I naturally focus on the second half, but that emphasis is a bit unusual. Particularly in the case of Standard C++, the traditional emphasis has been on the language proper. The Standard C++ library did not begin to take its final form until just over five years ago. Tracking the evolving language specification has been a popular indoor sport among C++ cognoscenti for well over a decade. Only recently has the library received anywhere near as much attention.

Validation Suites

But whether you focus on the language proper or the standard library, you face the same generic problem. How do you determine whether a given implementation conforms to an ISO standard, or indeed to any reasonably precise specification? Standard C, Standard C++, and Java are each defined by hundreds of pages of dense prose. Reducing every requirement to a test program is a daunting proposition. Yes, there are a large number of requirements, but that's only the beginning. Distilling out the relevant test for each requirement involves more creativity than you might imagine.

Moreover, even the simplest requirement can seldom be tested to exhaustion. Perhaps you can verify that, say, the Standard C library function isdigit gives the proper answer for every possible value of type char, but it might still fail. The tests probably assume that calls of the form isdigit(ch) and isdigit(lo | hi << 4) are equivalent. Test for the former and the latter must assuredly also work properly, But I once saw an implementation of the form:
#define isdigit(x) _ctyptab[x + 1]
Omitting parentheses around a macro argument is a stupid mistake, but simple tests of the form isdigit(ch) would never catch it.

You can take the easy position that testing for conformance is impossible, so there's no use trying. That's a nice purist position, but it doesn't wash in the real world. What practical programmers settle for is something that at least creates some level of confidence. Maybe you can't test everything, but you can test the obvious. The obvious usually involves:

whether a feature is present at all

whether it can be used in the most obvious, simple way

whether the corner cases work right

whether any of the commonest bugs are present

I list these obvious tests in increasing order of thoroughness.

A useful tool is thus a validation suite — a set of tests that covers all aspects of conformance to some more or less uniform degree of thoroughness. To use the suite, you compile each separate test program and run it. If it fails to compile, you've learned something, usually about some missing or malformed feature. If it runs and reports errors, you've learned something else. And if it runs to successful completion reporting no errors, you've gained a bit more confidence.

Listing 1 shows the simplest kind of validation testing. It is the one we supply with our Dinkum C Library to test the basic workings of the header <wctype.h> (added with Amendment 1 to the C Standard). As simple as it is, it often shows up errors in a Standard C library. In fact, very few C compilers pass this test completely.

Who uses validation suites? There are several distinct customers:

Original implementers use them as a kind of checklist, and as a debugging aid.

Maintainers use them to catch problems that arise in adding enhancements and fixes to apparently unrelated bugs, or in porting code to new environments.

Reviewers use them to check the degree of conformance of an implementation.

Developers use them to ensure that the tools they buy adequately conform to standards.

Each of these customers puts slightly different demands on a test suite. Making a suite that meets all of these needs well involves a number of tradeoffs, between thoroughness and speed, between subtlety and ease of use.

Validation suites tend to be large, and involve a highly varied set of tests. Obviously, you don't want to have to run each test by hand. That could take all night, in some cases, and even for a small number of tests it's altogether too easy to miss a test or two. Equally, you don't want to have to wade through pages of output to see if all the tests pass. So a good validation suite is one that you can run as a single step, either by typing a single command to an interactive shell or building a single project in an integrated development environment. And you can quickly check for the presence of any errors once the test is run.

A good validation suite has other virtues:

It should not take too long to run, or you'll tend not to rerun it often enough. This argues for an economy of tests. Repeating exactly the same test 100 times does not build confidence; it just uses up wall clock time.

It should catch most errors with at least one test failure. This argues for a breadth of tests. A forgiving validation suite does not build confidence either.

It should avoid subjective tests, such as for "adequate" performance or capacity. If people can rightly object to the result of a test, the test fails to raise confidence regardless of a particular outcome.

As you can well imagine, making a good validation suite is not easy. Even a weak one is of some use, but a good one is worth a lot more. Wise programming managers know to invest in as many good validation suites as possible. Each suite typically has a different emphasis, so the union of two or more good suites is more powerful than any one alone.

History

Once upon a time, the most important programming-language standards were promulgated by the National Bureau of Standards (then NBS, now NIST). They published FIPS (Federal Information Processing Standards). A FIPS certified compiler required only half a ton of paperwork to sell to a government agency, a real savings over the five tons of paperwork normally required for a software sale. NBS developed a validation suite for each language. To get certified, a compiler had to pass this validation suite.

Well, actually what happened was that IBM usually handed NBS both the draft language standard and the validation suite on a silver platter. (Their compilers were particularly good at conforming to both, as luck would have it.) IBM would even throw in a few warm bodies for a year or so to help NBS whip the goods into proper shape. NBS didn't seem to mind — they got their work done for them and didn't have to shell out a dime.

Then along came the C programming language. It was the first language with compilers from multiple vendors instead of just one big player. Yes, AT&T developed C along with Unix, but they were never a major force in the commercial C marketplace. C was also the first language to attract commercial implementers of validation suites. Two major players quickly emerged, Plum Hall Inc. (www.plumhall.com) and Perennial, Inc. (www.peren.com). To this day, you can license a validation suite from either, or both, to check the conformance of a Standard C implementation. And, of course, both have offerings for the newly completed revision to the C Standard, formerly known as C9X and now unofficially dubbed C99.

When the time came for NIST to make a FIPS standard, they had the ANSI C Standard as a pretty complete document. But when the time came for NIST to make a validation suite, the dynamics were even more different. There was no IBM eager to donate software and staff, just to expedite the sale of compilers to the US government. Eventually, NIST struck a deal with Perennial, which became purveyor of validation suites for those seeking FIPS certification. And Plum Hall won the blessing of BSI in the United Kingdom and ITSCJ in Japan, as the validation suite to pass for certification in those countries.

But the times they keep a changing. With a robust industry providing a broad range of software products, the US government has seen the light. They now favor "commercial off-the-shelf software" (or COTS, in government-ese). The days of hundred-page software contracts may not be completely gone, but they are indeed fading. Government workers can even buy software with credit cards now. So the need for FIPS certification, or ISO certification in other lands, has diminished considerably. The world still needs validation suites, the more the merrier for my money, but government certification no longer drives their development.

The C++ standardization effort began rather late in this period of rapid change. People needed validation suites to track the ever-changing draft C++ Standard. Naturally, Plum Hall and Perennial rushed to fill the need. But their early efforts focused almost exclusively on the C++ language itself. Remember that the library was a late comer. Moreover, C++ is a big language, compared to C at least. So a language validation suite alone is still a hefty chunk of code. When the need for a C++ library validation suite came along, later in the process, it was natural for both major vendors to treat it as a separate entity.

I wrote the original library validation suite licensed by Plum Hall. When we parted ways, I substantially overhauled all those test cases I wrote to make the Dinkum C++ Proofer. But I also practice what I preach. Dinkumware cross licenses all our library products with all the relevant validation suites from Perennial. That way, we know that a major vendor is beating on our libraries all the time while proving in their validation suites. And we have multiple suites to run against our libraries in house. I can tell you that the suites catch errors after practically every nontrivial change we make to a library. And we often find that one suite catches errors that the others overlook.

In Times to Come

I present the basics of validation suites here because conformance is once again a popular topic. It's been over two years since the last substantive changes were made to the C++ Standard. More and more vendors are beginning to claim conformance. Those claims, if they are backed up at all, are perforce made in terms of behavior under some validation suite. It helps to have some perspective in analyzing such claims.

Less exciting, but still of interest, is the emergence of a new C Standard. Nobody has yet claimed complete conformance to C99, at least not to my knowledge, but that too should start happening soon. Once again, it will be the validation suites that serve as the nearest thing to an impartial referee.

Watch this space.

P.J. Plauger is Senior Editor of C/C++ Users Journal and President of Dinkumware, Ltd. He is the author of the Standard C++ Library shipped with Microsoft's Visual C++, v5.0. For eight years, he served as convener of the ISO C standards committee, WG14. He remains active on the C++ committee, J16. His latest books are The Draft Standard C++ Library, Programming on Purpose (three volumes), and Standard C (with Jim Brodie), all published by Prentice-Hall. You can reach him at pjp@plauger.com.