October 1992/Glass-Box Testing

Debugging

Glass-Box Testing

Techniques for Preventing Software Bugs

Kevin Weeks

Kevin D. Weeks has been programming, primarily (and preferably) on micros, for over ten years. He has written programs ranging from a Radioactive Waste Inventory Management System to a control program for a therapeutic bed. He is currently employed as a software engineer by Electrotec Concepts, Inc. in Knoxville, TN. He can be contacted on CompuServe where his account number is 70262,2051.
Failure to prevent bugs in software can have catastrophic results. For example, in November of 1991 a bug caused the AT&T phone system to shut down in the North East for nine hours resulting in millions of dollars of business losses. And in Canada a computer-controlled, medical diagnostic machine killed two people. Although it is probably impossible, given most real-world constraints, to completely eliminate bugs from a computer program, it is possible to significantly reduce the number of bugs shipped in a program. This article presents a number of simple rules and techniques for identifying and eliminating many of the most common software bugs during both the development and the maintenance phases of a program's life.

Unbiased Technique
I first developed my interest in software quality several years ago when I was assigned the task of writing a control program for a Cervical Manipulation Therapeutic Bed. This was a device intended to replace a physical therapist, and its specific function was to move a patient's head in any of three different axes thus manipulating the patient's neck (cervix). Now keep in mind that the patient's are undergoing treatment because they've already been injured. Theoretically, it was possible for this device to permanently paralyse a patient!
I did everything I could think of to make sure the software was bug-free, but you can't imagine how relieved I was when the company making the things went out of business before they sold any. Looking back now, some four years later, my testing technique had more holes in it than a sheet of fan-fold paper.
Typically the programmer who writes a piece of code is a poor choice for testing that code. "...it is extremely difficult, after a programmer has been constructive while designing and coding a program, to suddenly, overnight, change his or her perspective and attempt to form a completely destructive frame of mind toward the program."(Myers 1979) It is the purpose of the tester to demonstrate that a body of code does not work. A successful test is one that uncovers an error thus improving the code's quality.
The question, then, is: Given our natural bias as programmers, how can we successfully test our own code? My solution is to make the testing process as mechanistic as I can wherever I can. To do so, I simply follow a set of rules for writing test code and thus take my own attitudes out of the equation. I also realize that I'm simply not psychologically equipped to perform some forms of testing and so, whenever possible, I rely on others for that.

Error Sources
There are five primary sources of software errors. These are:

external factors (OS/compiler/hardware)

syntax errors

logic errors

design errors (system and implementation design)

analysis errors
Each of these, with the possible exception of syntax errors, are worthy of discussion; but I will concentrate on logic errors since these are most amenable to a mechanistic approach. I define a logic error as a failure, by the software, to perform in the manner intended by the programmer.
Please note that this definition implies that it is quite possible for a function or module to perform exactly as the programmer intended and still fail to perform as required. However, as the implementor I'm not responsible for errors resulting from an incorrect specification. The purpose in making these distinctions between error sources is not to assign blame but to refine techniques for ferreting out particular classes of errors. Logic errors are particularly detectable with glass-box testing.
We're all familiar with the term, black box. This refers to a device which receives input and produces output without the user knowing what processes took place in between. For most of us a photocopier is black box. A glass-box (or white box) is a device where the user knows the processing intimately. No one knows the "innards" of a function better than the programmer who wrote it.

Code Format
I am extremely distrustful of embedded, in-line test code. First, pointer errors are often position sensitive and I would rather not have them shift after I've decided the code works. Second, embedded test code makes the target source code harder to read. Third, decisions (relational tests) are a prime source of errors in their own right. A statement such as

#if !defined(PARTIAL_TEST)
could accidentally remain enabled following a final, hurried test just prior to release.
To avoid these problems I place test code at the bottom of a module with a single conditional

#if defined( TEST )
on which all other conditionals depend. By including the test code in the module I am testing, I have complete access to all static variables and functions. This reduces the need for in-line test code. Eliminating embedded test code means that pointers don't move just because they're being observed (my favorite example of Heisenberg's Uncertainty Principle). If you dislike the added bulk of including the test code with the target code you can write a separate test code module which you then conditionally #include in the source module.

Statement Coverage
Robert Frost once wrote a poem entitled "The Road Not Taken." In testing, one wants to be sure every road is taken. This is referred to as statement coverage.
I write test statements designed to exercise each path through a function. Then I use a source-level debugger and simply walk through the test code and its target using the debugger to visually verify coverage. (Most programmers debug this way.) However, during this walk-through I have an ulterior motive. I want to spot areas of the target code for which I may have failed to develop effective test cases.
Listing 1 demonstrates simple statement coverage. I wrote test code to execute both possible paths in the target function. This example is certainly trivial, but there are cases that will test your ingenuity. Listing 2 is such an example. In this case there are two difficulties. First, the first branch, if target, depends on the return value from another function call, calloc in this case. We can overcome this difficulty by creating a dummy function whose return value we can control. Second, the function is called more than once depending on the results of earlier calls. To solve this, I created a wrapper function called testCalloc (Listing 3 and Listing 4) which will call calloc the number of times specified in a previous call to SetCalloc and then fail.
Once I've completed a module, I use a third-party tool such as Borland's Turbo Debugger to verify independently that my test code does indeed execute every line of target code.

Decision Coverage
Obviously statement coverage, although essential, is insufficient. The complexDecision function in Listing 5 contains a complex decision and so we must make sure we execute each path through the decision itself. Figure 1 shows the cases we must test. (The last five test cases may seem redundant but they're useful for finding erroneous parenthetic groupings.)
As you can see, the first statement in the compare function requires eleven test cases. In a real-world situation the number of test cases can grow nearly exponentially, especially when we add in boundary tests (discussed next). To simplify the effort, I created a test structure, testParameters, and then an array of test cases that can simply be looped through. Again, Listing 5 provides an example. I've defined a structure that contains the input values, the expected results, and even an error message which serves the double duty of documenting a particular test case. The use of a test structure also simplifies adding and deleting test cases as the function evolves.
Our new requirement, then, is to execute every line of code and to exercise every decision.

Boundary Conditions
Listing 5 contains more test cases than I listed in Figure 1, because I've combined decision coverage test cases with boundary condition test cases. A boundary condition is the point at which the rules governing a parameter's behavior change. For instance, natural boundary conditions occur at 0 for all integers and at 127, 32767, and 2147483647 for signed ints. This change in behavior tends to make boundary conditions weak points in a program, so the test cases in Listing 5 make much use of MAX, 0, and -1.
There are, of course, other boundaries. On an IBM PC there's an address boundary at 65535. Many computers can only address objects at even bytes. On top of that, the application itself may impose boundaries. In the example, 15 is such a boundary. To test a boundary requires three test cases. One case within the range, one on the boundary itself, and one outside the range. In the case of an integer's zero boundary we need test cases for -1, 0, and + 1. In the case of the number 15 we're interested in 14, 15, and 16. Fortunately in testing the boundaries we can usually presume that all values in the range included and excluded by a particular boundary pair will behave the same way as our test cases. In other words, if 1 and 14 work then it's reasonable to assume that 2 through 13 will also.
Before moving on there is one additional point. Although complexDecision returns what should be the c variable's current value, I still explicitly confirm it. I never believe anything a function being tested reports. All operations and side-effects should be independently verified, if at all possible. If a function repositions the cursor, then I check the hardware for confirmation. If a function writes to disk, then the test code reads from disk whatever was written. When testing you must always be explicit about the results you expect and then make absolutely sure those are the results you got.

Tools
There are a number of tools that can be of great help in testing your code. I mentioned Borland's profiler above for testing statement coverage. I've seen ads for other products that provide statement coverage testing.
In the October 1991 issue of C Users Journal, Robert Ward authored an article entitled, Debugging Instrumentation Wrappers For Heap Functions where he discussed using a memory monitor (Ward 1991). Please, use a memory monitor of some sort. If you don't want to code your own or you want more sophisticated capabilities there are products such as MemCheck from StratosWare available. I started using such a tool several years ago. In that time, I've twice performed maintenance on programs I'd written prior to getting a memory checker. In both cases, I found out-of-bounds memory writes and memory leaks.
The biggest drawback to the type of testing I've described is providing user interaction for the user-interface portions of the code. It helps to isolate such code to a few modules. Ultimately, though, you need to test that code also. This poses some problems. One of my goals is to automate the testing as much as possible, but if I'm required to provide input and confirm output then I'll eventually get lazy and not do it. In this case, use something like Dr. Taylor's Test from Vermont Creative Software or Test from Microsoft. These tools are also invaluable later during the integration phase for automating regression testing.

Summing Up
Effective glass-box testing depends to a large degree on proper software construction. Design your code, don't hack it. If, in addition, you design with testing in mind then you'll find the job easier yet. I know you've heard it before, but let me reiterate, don't use global variables! When a module accesses a global variable, the potential paths through that module goes up significantly. Keep your module cohesion high and the coupling between modules low. I highly recommend an object-oriented approach even in C.
Implement and test the module incrementally. I write a function, then write the test scaffold, and then test the function. Once I'm satisfied with the first function, I move on to the next. If I have a function pair such as SetCursor and GetCursor, then I'll implement and test them together (keeping in mind that I don't trust either function to verify the other). An incremental approach makes the burden of writing the tests easier and also allows one to build the module on a solid foundation.
As you add functions, continue to run the tests for earlier functions. This is known as regression testing and will allow you to immediately spot any bugs your newest code may have introduced into already-tested code. It seems like half the errors I see result from side-effects in previously-tested code that wasn't thoroughly re-tested.
Test each module in isolation. Provide dummy functions for calls outside of the module so that you can control the results of the calls. Once the module has been checked out in isolation, link in the outside functions and run the tests again. This usually requires some conditional compilation in your test code but better there than in the target code.
Glass-box testing sounds like a lot of work but it's actually not that bad. The statistics I've run on my own efforts show that in a completed module somewhere between 55% and 60% of the statements are test code. (These numbers are close to those noted by Marc Retig in his article, "Testing Made Palatable" in the May 1991 Communications of the ACM.) However, much of the test code is the same thing over and over with just the parameters changed. I only spend about 30% of my time writing test code. However, that 30% produced an estimated 50% reduction in time spent integrating the modules. I've only written one complete, non-trivial program using these techniques and don't yet have any numbers on post-release bugs.
As professionals we need to address the problems of software quality proactively and not reactively. Test, don't debug.

References
Myers, Glenford J. 1979. The Art of Software Testing. New York, NY: John Wiley & Sons, Inc.
Rettig, Marc. May 1991. "Testing Made Palatable," Communications of the ACM. pp 25-29.