The Black Art of GUI Testing

Automated testing in an event-driven environment

Laurence R. Kepple

Dr. Kepple is president of Segue Software. He can be reached at kepple@segue.com or on CompuServe at 71670,467.

When developing an application with a character-based user interface (CUI), the standard automated-test strategy is to use record/playback to drive the application and bitmaps to validate the application's state. Graphical user interfaces change this, however. The richness of the GUI and the complexity of its object-oriented, message-passing paradigm has greatly increased the complexity of the testing problem to the point that testing GUI-based software can be as difficult as developing it. In fact, GUI testing is so technically difficult that software developers are taking on the role of software tester when they're asked to create tests that validate program modules and test code. In turn, software testers are becoming test-code developers just to keep up with the magnitude of the tasks before them. This article describes how the shift from CUIs to GUIs affects test automation, and why programming, rather than record/playback, is a superior solution.

The CUI Test-tool Paradigm

In the standard test-automation strategy for CUI, the tester records a live-interaction session with the target software, later playing back the recording using bitmaps (taken at recording time) to validate the application state after or during playback. CUI-paradigm tools bypass the logical information known by the GUI about application objects. Instead, CUI tools rely on bitmaps to provide information about the application to the tester. Both recordings and bitmaps expect application components to remain at the same screen location over time--a fairly reasonable assumption for most CUI applications. In the CUI environment, one app owns the screen (often writing directly to it), the arrangement of screens is fixed, and graphical elements such as fonts are weak or missing altogether.

Context Sensitivity

Because software recording captures a live-interaction session between the tester and the target software, it is "context sensitive"--it captures the context that existed when the recording was made. The total context that a recording captures is extensive, consisting of timing, screen location, fonts, and the like.

However, context sensitivity is a problem in GUI environments. Ironically, in GUI software recording most of the information you record actually works against you at playback time. Application-object attributes such as screen location and font are constantly changing, yet a recording that captures all of this temporary context at creation time naturally replays the same context at playback time. Context identity between record time and playback time is a special case in GUI environments--it can happen. But the general case is that the playback context will be different from the creation context. The resulting context conflict limits usefulness of the automated recording approach when applied to GUIs; see Figure 1.

Instead of depending on context-sensitive components such as bitmaps, the GUI paradigm demands an approach that focuses on "logical object functionality"--what an object essentially does, rather than how it happens to look on the screen. For example, when given a valid filename, a typical File Open dialog box brings up the specified file in a new window. The File Open dialog box retains this essential functionality no matter where it appears on the screen, no matter what system font happens to be selected, and no matter which color scheme the user may currently have selected. A recorded test (especially one validated by bitmaps) buries this logical object functionality under irrelevant contextual data that relates to the temporary screen appearance of the object. Context conflict can cause playback failure or false indications of error; it is the single biggest obstacle to effective GUI test automation using record/playback technology.

Test-tool manufacturers who are trying to retrofit GUI compatibility onto test systems originally designed for CUI environments have devised several means to cope with context sensitivity. Often, however, these strategies are complex, error prone, and resource intensive. For example, some manufacturers compensate for the variable screen location of objects by scanning screen bitmaps and finding the wayward objects in their new locations.

Synchronization Strategies

The second major obstacle in adapting CUI test systems to GUI environments is synchronization. Test-tool manufacturers have circumvented the problem in two ways. In the first, the tester may direct the test tool to "sleep" at various points during the test. These sleep intervals break a recording into short spurts of activity surrounded by long periods of inactivity. Thus, timing differences between a recording's creation and playback contexts are obliterated by the long waits. But hardcoding timing assumptions into automated tests is a poor practice, leading to failure-prone, unmaintainable code. In addition, this approach dramatically decreases the speed of automated testing.

Another strategy for overcoming timing problems is to use bitmaps to "pace" a recording at playback. Pausing playback until the application's screen matches a stored bitmap compensates for inherent timing incompatibilities. Like the "sleep" strategy, however, bitmap pacing dramatically slows testing speed. Bitmaps are large objects, and constant bitmap loads and compares are expensive, slow operations. Since bitmaps are highly context sensitive, changes to screen appearance render stored bitmaps useless for pacing. Therefore, automation that depends on rigid stability of screen appearance puts those who depend on it at great risk in real-world GUI projects.

Both the sleep and bitmap pacing strategies make it impossible to use the resulting test automation for performance testing. This is because both approaches deliberately slow the target application down so much that the inherent timing incompatibilities between record and playback are overwhelmed. Consequently, it's impossible to use such automation to time the performance of the target or to see how fast it can process input.

Programming a Response

While traditional test tools grind away at bitmap analysis, the GUI holds the very information the test tool needs--the current location of the desired object. A simple call to the GUI can determine the current location of this screen object, but using this and similar strategies means rethinking the test tool in terms of the GUI paradigm.

Programming languages such as C and C++ provide facilities to name GUI objects and drive and validate their operation. In my case, however, I've written a higher-level language called "4Test" (part of my QA Partner test tool). 4Test is an object-oriented language that interacts with GUI objects via a class library that defines the properties and methods associated with each class of GUI object. 4Test uses a suite of GUI drivers that turn the logical test actions requested by the test programmer into the object- and GUI-specific event streams needed to drive and validate the tests. Before acting on an object, the GUI driver asks the GUI for the current location of the target object. Since the GUI is the ultimate authority on object location in real time, the test tool always knows where to find an object.

Checking for the object's current location also allows the test tool to perform positive object identification. This means that the test tool will perform, in effect, an assertion check on each object named in the test program. Is it available? Is it in the right state for the desired test action? Even a simple click on the OK button of a dialog box involves extensive state validation to determine whether the right dialog box is up and whether the OK button is clickable or grayed out. This powerful and automatic state-checking mechanism is an invaluable aid to testers drilling down through layers of menus and dialog boxes in complex GUI applications.

Eliminating Synchronization Problems

The two-tiered architecture comprised of the test program language process and the driver process also allows GUI-paradigm test programs to be event driven. After the test program requests an action against an application object (a click on an OK button, for example), the process is suspended until the GUI notifies the test driver that the target object is now available and that the desired test action was successfully executed. This event-driven architecture eliminates the synchronization problem. The tester simply decides on an acceptable time-out interval beyond which the test tool should not wait for an object to become available. After that interval expires, the test program awakens with an error.

This triggered-on-object synchronization frees test programs from timing dependencies. The same GUI test suite that runs on a 25-MHz machine will run on a 66-MHz PC without changes, making it possible to reuse GUI application test suites on a wide range of different systems as part of their standard system-validation process, for example. By accessing GUI objects solely through the medium of the GUI, event-driven test programs are safe at any speed.

The event-driven approach also allows any regression suite to become a performance test without any additional work. By simply setting the time-out interval to a desired threshold and rerunning the regression suite, a tester can determine if system response time, at any point during the test, falls below the specified threshold.

Conclusion

Effective software development for GUI environments requires tool-supported, automated testing strategies grounded in the GUI paradigm. Tests should be event driven and focused at the level of logical object functionality, not temporary screen appearance. Test portability should be a major concern and will pay off handsomely with increased reusability of tests across both GUI and hardware boundaries.

Capture/Playback Techniques

George J. Symons

George is vice president at Software Research and can be contacted at symons@soft.com.

While Windows-based applications have become the norm, they have complicated the testing of applications. Although user interfaces now provide the user with more aesthetic options, the fact that options can be invoked in any order has created a more complex environment for testing, as inconsistencies across platforms are now possible in terms of colors, fonts, screen size, and general look-and-feel.

Capture/playback tools can be operated in a variety of modes, and no vendor implements all of these modes today. It is important to understand the strengths and weaknesses of each mode because testing is not a single task--it is a process that goes on throughout the life of an application, and each mode has its benefits at different times during that process. The following are the three capture/playback modes.

With true time, keyboard and mouse inputs are replayed exactly as recorded by the tester. Playback timing is duplicated from the server's own timing mechanism, allowing tests to be run as if executed by a real user. The results of the tests indicate any variance from the baseline cases, permitting the tester to determine the implication of those differences.

Therefore, if a button moved to a different location in the window, it would be flagged as an error, and the tester must then determine its significance. For instance, the movement of a button will affect documentation, even though the program still runs as it did before.

Character Recognition

Character recognition allows the test to search for items that may have moved or fonts that may have changed since a previous version of the application was tested. Character recognition helps extend the life of a test script by allowing it to adjust for minor changes in window layout or fonts being used. The downside of character recognition is that it requires some additional time to create the scripts. It also may pass a test even if an error should have been reported. In this case, a moving button may not be caught, and the documentation will go out unchanged. Character recognition can also be used to take a portion of a screen image and convert it to ASCII characters to be saved in a file for printing or comparing with other values as part of the test-verification procedure.

Widget Playback

The final mode is widget, or object-level, playback. With widget playback, the X and Y coordinates on the screen are no longer significant, as the application's widgets are activated directly. Widget testing is the only reasonable way to do portability testing.

The same test script can run on multiple hardware and operating-system platforms. Such tests will not check for GUI correctness, but will check that the application's engine ran successfully. With widget testing, tests might pass despite conditions in which a user could not operate the application interface, such as a command button being hidden behind a window. Therefore, even if widget testing has been run, it is still important to do user-level testing, either manually or with the true-time capture/playback mode.

Figure 1: Record/playback context conflict.

Copyright © 1994, Dr. Dobb's Journal