Dr. Dobb's Journal February 1997
Graphical user interfaces may be a dream for users, but they can be a nightmare for testers. As Java development takes off, so does the need to test GUI applications written in Java. The newness of Java and Java-oriented testing tools makes this a challenge, however. This is as true for development groups within Sun (such as the WorkShop Products group, of which I'm a part), as for third-party developers. In this article, I'll examine Java GUI testing issues in general, then present some of the solutions we've adopted for testing Java GUI software.
Though automated test suites can never entirely eliminate the need for manual testing, they are invaluable as an efficient way of:
For applications with command-line interfaces, test automation can, as a rule, be readily accomplished with the help of I/O redirection, which feeds the application input from a file (rather than from the keyboard) and sends the application's output to one or more files (rather than to the screen). With a scripting language that makes it easy to perform initial setup, you can also run individual tests, and evaluate and report results.
Automated GUI testing, on the other hand, is as problematic as it is essential. Special tools are needed to simulate input, which ordinarily comes in through the GUI. Similarly, graphical output can be captured and analyzed only with special-purpose tools. For Java GUI applications, the need to use special testing tools is unusually problematic -- because of Java's newness, the necessary tools are just beginning to appear.
Since a major attraction of Java is its platform independence, testing Java applications (or applets) tends to require testing on multiple platforms. On the face of it, therefore, the tools need to work on all targeted platforms. A colleague facetiously suggested that there is really no need for multiplatform tools for Java testing, because a Java program verified to work properly on one platform is guaranteed to work identically on all others. "Write once, run anywhere" is indeed one of the promises of Java, but remember that Java is still a relatively immature language and is by no means perfect. Early versions of the AWT (Abstract Window Toolkit), in particular, behave differently (and at times quite improperly) depending on the operating system used. Presumably, these problems will be significantly diminished in JDK 1.1, but for the foreseeable future, serious testing of Java applications will require testing on all major platforms.
There are two basic ways of creating the required sort of multiplatform Java GUI test tool. The first option, adding Java capability to a preexisting non-Java tool, has the advantage of inheriting the strengths of the preexisting tool. On the other hand, such a tool is not guaranteed to run on all platforms. The second option, of course, is to create a Java-based tool from scratch that can be expected to migrate automatically to new platforms along with the application being tested. In the short run, however, the new software is likely to lack refinement. Our search for test-development tools turned up one candidate of each kind -- Segue's QA Partner and SunTest's JavaSTAR.
The most recent version of Segue's QA Partner (http://www.segue.com/), a well-established GUI testing tool running on UNIX, Windows, OS/2 and Macintosh, incorporates a "Web-aware" capability which, among other merits, provides the ability to test Java applets and applications. Given its non-Java roots, QA Partner naturally achieves this capability not by interacting with AWT components directly, but by interacting with associated native GUI components on the various platforms, such as Motif widgets on UNIX and Windows controls on Windows 95/NT.
For Java testing, this strategy has pros and cons that need to be considered according to your needs. Basing testing on native components is attractive for testing applets running under a browser such as Netscape Navigator, which itself uses the native GUI. With a test tool that uses native components, Netscape Navigator events can be captured, and Navigator interface properties tested, along with the events and interface properties of the Java applet (or more precisely, along with those of the native implementation of the applet's GUI elements).
On the other hand, the tool is not directly aware of the Java components, specifically at the AWT level. Therefore trouble may result from inability to recognize Java (AWT) component events and properties, or from inability of a single test script to cope with the differing native manifestations of the same Java component from one platform to another.
For example, the JDK 1.0.2 AWT FileDialog has different native implementations on Windows and UNIX, each containing native components (such as text fields) that do not match up with corresponding components at the AWT level. The AWT interface allows interaction with the FileDialog as a whole, rather than with particular items (such as native text fields) within the FileDialog implementation.
Thus, if a tool that is aware of only the native components is used to record interaction with a FileDialog on UNIX, the resulting test script will not be able to handle the corresponding interaction on Windows, since the Windows components will differ. Consequently, at least the FileDialog interaction will have to be rerecorded. For a tool that operates at the AWT level, on the other hand, the details of the native components from which the FileDialog is constructed are ignored, so the differences in native implementations will not prevent a test script recorded on one platform from playing back unaltered on another.
Because we primarily wanted to test applications such as Java WorkShop which are written entirely in Java, the ability to handle Netscape Navigator or other native browser events was not a significant benefit in our situation. (Though Java WorkShop is browser based, it uses an adaptation of the HotJava browser, which is written in Java.) Also, in our particular case, QA Partner's lack of Macintosh Java support was a hindrance. (Segue has expressed interest in implementing Macintosh Java capabilities if there is sufficient demand.)
JavaSTAR is a Java GUI test tool from SunTest, a Sun unit that develops Java testing tools; see http://www.suntest.com/, As a purely Java-oriented tool, it is aware of AWT components, recognizing them the same way regardless of platform. The corresponding drawback, of course, is that JavaSTAR is unaware of non-Java components and thus cannot be used to directly control or test them. Again, this was not a problem for us because our target application (Java WorkShop) was written entirely in Java.
However, Java WorkShop's installation and startup are not Java based; on Solaris, for instance, installation uses tar. Windows installation employs InstallShield. Thus JavaSTAR cannot help us test Java WorkShop installation, except indirectly by determining whether Java WorkShop works after it has been installed. Since installation is relatively straightforward, this limitation was not a serious concern.
Platform-specific startup mechanisms were a more serious issue. On UNIX, for example, Java WorkShop uses a wrapper script to arrange the proper execution environment; on Windows, it uses an executable (jws.exe). Though details of getting the environment right -- setting paths and the like -- tend to differ from one platform to another, you may wonder why the housekeeping cannot, nevertheless, be done with Java, since a Java program can determine which operating system it is running on (with System.getProperty("os.name");) and should therefore be able to tailor the setup accordingly. However, there would still be a residual problem of invocation complexity. On Windows, for instance, users should not be forced to type a relatively complicated command such as "java sun.jws.Main"; plain "jws" is obviously preferable. Although platform-dependent configuration can be handled by Java, existence of at least a minimal wrapper script or executable is, at least for the time being, unavoidable. Given that there must be a wrapper in any event, it is hard to make a compelling case that the tool must handle configuration elsewhere. Though we lacked a way to involve startup wrappers in automated test runs, we determined that we could nevertheless keep JavaSTAR tests fully platform independent by reimplementing the necessary path setting operations in a Java class that could be called from the test harness.
Another factor in favor of JavaSTAR was that, because it was written in Java, we should be able to use it on any platform on which Java WorkShop might be supported in the future. To be sure, there will be new startup issues to resolve in moving to additional platforms, but these problems shouldn't be insurmountable.
On the other hand, an obvious drawback of JavaSTAR (when we began evaluating it in August, 1996) was that it was new, and in many respects, rudimentary. For example, paths to golden directories were generated using forward slashes, with the consequence that they worked only on UNIX, not on Windows. However, such glitches were fixed with successive releases.
Being able to construct a GUI test that runs without human intervention is a big step. However, automated testing requires running collections of tests, with convenient mechanisms for controlling the set of tests to be run and for controlling the conditions under which the tests are run (such as the set of binaries used). Consequently, a test "harness" is required to manage the tests. I won't go into the ins and outs of test harnesses. Instead, I refer you to a minimal harness I wrote that provides basic functionality (available electronically; see "Availability," page 3). This Java test harness is a work-in-progress and is provided for illustrative purposes only.
Having decided to use JavaSTAR to construct our tests, our next challenge was to choose a "real" harness. The only one we found for running Java tests was JavaTest from JavaSoft with its Java API test suite (also known as the Java Compatibility Kit, or JCK) harness. At the outset, JavaTest presented two advantages. First, it had been in use for some time and thus was relatively mature. Second, we had already used it as a framework for a non-GUI benchmark test suite we had constructed, and we wanted to avoid a proliferation of harnesses that would complicate the job of running any arbitrary subset of our test suites. Another advantage was that JavaTest was written in Java and thus could be expected to run on all platforms.
However, during our initial review in August 1996, JavaTest was deficient in its ability to manage GUI tests. For instance, it was designed to manage suites of API tests, a task that differs in important respects from managing GUI test suites. A typical API test is a short program, perhaps 20 lines, written with the specific aim of testing a particular API feature. JavaTest assumes that the test developer has total control over the test program, and requires that the test program interface with the harness in specified ways, the most straightforward of which is that the test program's "main" method must return one of a limited set of test-status indicators (PASSED, FAILED, NOT_RUN, and so on). On the face of it, this won't work when the test program is large, preexisting, and not under the test creator's control.
You can envision getting around this restriction by arranging for the program JavaTest regards as the test to be a wrapper program that conforms to the JavaTest interface requirements and runs the application you actually want to test. However, this approach does not solve the fundamental problem that only meager results are passed back to the harness, with the result that, on the face of it, the harness will be able to generate only a limited report of test results. Clearly it is not enough simply to run tests automatically; reports that make it reasonably easy to pinpoint the causes of failures are also essential. When the test consists of just a few lines of code designed to exercise a single limited bit of functionality, a simple PASS or FAIL report is fine. A GUI test that brings up a large, complex application and exercises it with a stream of inputs to test a particular condition, on the other hand, has many potential points of failure. Consequently, the report of a failing test needs to indicate where in the elaborate sequence of procedures the trouble occurred.
A closely related issue is subtest-handling capability. The overhead of bringing up a large application such as Java WorkShop and manipulating it into a particular state for performing a set of tests makes it highly undesirable to restart that application for each individual test. The obvious solution is to bring it up once per test, but to conduct several subtests within that test. In other words, as we exercise the application, we'll check that various conditions are satisfied along the way -- for instance, we may want to check that after a given button is pressed, a certain dialog pops up, with a field filled in with a certain string. If the dialog does appear properly, we may then want to exercise it and check further results. If we conduct enough subtests as we exercise our application, we may not need much detail in reporting results for each subtest. The nature of the subtest and our knowledge of preceding subtest results should give us a pretty good idea what went wrong. Unfortunately, just as API-oriented harnesses such as JavaTest fail to allow for detailed reporting of results of each test, they also tend to fail to report the results of subtests within an elaborate test. This is not to say that these harnesses are faulty, but that test developers need to recognize that harnesses designed for API testing (or for any testing based on running small programs written solely to test a particular limited feature), cannot be expected to have the special capabilities needed for large-scale GUI testing.
Additional features we wanted, but that JavaTest lacked, included filtering capability and reporting of golden-file differences. We would like, for example, to create an interface with the Java WorkShop GUI builder, generate source code for the interface, and then compare the generated source with previously generated source that has been established to be correct, or "golden." We found that although JavaTest allowed for golden-file comparisons, the resulting report merely indicated whether the comparison succeeded or failed, not how the actual output differed from the golden version.
Filtering is needed when a result (such as the content of a text field) includes data that varies depending on the conditions under which the test is run. The obvious example is the full path of a file; for example, a file storing information about the project currently active in a development environment being tested. The first part of such a path will differ depending on where in the file system the test is being run. In comparing such a result with the expected value, we need to remove the variable part before doing the comparison to avoid wasting time with false failure reports. Again, this sort of capability was not a priority for the API testing for which JavaTest was designed, so JavaTest did not provide it.
Because JavaTest did not possess the needed GUI capabilities, I began work on an alternative harness designed with GUI testing in mind. This effort ultimately resulted in development of "JavaHarness," a general-purpose harness for Java testing that SunTest makes available with JavaSTAR and JavaScope (a Java coverage tool). JavaHarness is, of course, written in Java, so it will run wherever needed for Java testing. Because it was initially designed for GUI testing, JavaHarness naturally addresses some of the key requirements neglected by JavaTest. JavaHarness reports subtest results, reports how golden-file comparisons failed, and provides filtering capabilities.
For the WorkShop Products group, an important attraction of JavaHarness is that it enables us to use a single test harness to run all our test suites, including those already written for JavaTest. JavaHarness can function as a meta-harness as well as a harness, allowing it to start up a JavaTest test suite instead of, or in addition to, one or more JavaSTAR test suites. The main JavaHarness configuration file, JavaHarnessSetup.ini, can include an essentially unlimited number of sections, each containing specifications for running a particular test suite. For a JavaSTAR test suite, the specifications refer to a separate file containing a list of tests to be run. For a JavaTest test suite, the specifications point to the ".jtp" file containing the information JavaTest needs to set up and run a set of tests.
The all-too-familiar pattern with test harnesses is that each is developed with a particular sort of testing need in mind, limiting its usefulness for other sorts of testing. JavaHarness attempts to be more flexible, providing interfaces that let users tailor its capabilities to a particular kind of testing. For instance, when API testers wanted to try using JavaHarness to run their tests directly (rather than via JavaTest), it was easy to create a new JavaHarness test type, CompileAndRun. The harness, when given a list of source files, then compiles and runs each in turn, reporting how each test program's output compares with the corresponding golden file.
The chief drawback of JavaHarness, at this writing, is that it is still in an early stage of development. Though it provides the most essential harness capabilities needed to get a serious Java GUI testing program off the ground, we look forward to considerable refinement.
Though problematic, automated testing is essential to guarantee high quality in substantial Java applications with complex GUIs. Though tools to help automate Java GUI testing are still immature and will doubtless become much more attractive with future refinement, they can already provide significant benefit.
In choosing tools for Java GUI testing, it is important to consider whether testing will be confined solely to Java GUIs or whether there will be a mix of Java and non-Java user interfaces. If the testing must involve the use of non-Java interfaces, as in testing the running of applets under the control of a browser such as Netscape Navigator or Microsoft Explorer (which are not written in Java), have a look at Segue's QA Partner. For testing purely Java applications (or applets with Java-based browsers or players such as HotJava or AppletViewer), tools written in Java are preferable due both to their direct awareness of AWT components and to the fact that their portability will automatically match that of the application being tested.
DDJ