July 1995/Testing Sort Functions

Software Tools

Testing Sort Functions

K. B. Williams

K. B. Williams received a B.A. in Mathematics from California State University, Sacramento, in 1955. He is a veteran of forty years in the scientific applications programming trenches and is the founder of King Baker Enterprises, of Stillwater, Oklahoma. He consults in applied numerical techniques. He can be reached at 802 South Ridge Drive, Stillwater, OK 74074, or via e-mail at Kbwms@aol.com.

Introduction
Sort user: beware. If you plan to use a sort function, what do you know about its performance? If you are uncertain or ignorant you could be suffering your career ship to go upon rocks and shoals. This article presents a program that subjects a sort function of your choosing to a variety of test targets. Its output consists of activity counts and timing data. The activity counts include numbers of comparisons required on each sort target and, when the sort function source is available, numbers of exchanges. (Counting exchanges requires insertion of a software "probe" into your sort function.)
This program is the product of years of bitter experience with incautious implementations of Quicksort. Even C.A.R. Hoare, who invented Quicksort, didn't get it right [1, 2]. In 1982, I wrote my first C version of Quicksort, using a mid-element pivot strategy. I used this strategy on the advice of Software Tools [3, p. 116] which warned that "pivoting on the end element is a bad thing to do." The early versions of my test program tested Quicksort only on three targets: random-uniform, already-sorted, and reverse-sorted.
I soon added first-item-out-of-order to the list of targets. This happened after I nearly lost the friendship of a valued colleague when I recommended Quicksort but forgot the caveats. He was using two-way merge sort, which requires 2*N memory locations to sort N items, and was running short of memory. The day after he tried his new Quicksort I got an earful of invective. His application involved adding small amounts of data to an already-sorted list of several thousand items. This is just the thing for insertion sort, totally wrong for a poorly-implemented Quicksort. (For guidelines in implementing a good Quicksort, consult Sedgewick [8].)
Over the years I've added new targets to the test program and included additional sort functions in the software package. The principle test function, tstsort, appears in Listing 1. The entire source for this utility, with included C source files, runs to about 1,000 lines — too long to list here, but it's available on this month's code disk. The utility has grown up primarily under Microsoft C; the most recent is C/C++ Version 7.0. Unfortunately, I have been unable to compile it with Borland C++.
The software on the code disk includes ten sort functions, including four Quicksorts and one each of Shellsort [4], Heapsort [5 & 6], Two-Way Merge Sort [7], Bubble Sort, Insertion Sort, and Selection Sort. I've tested seven of these functions extensively using tstsort. The results appear in a later section, "Sample Test Results."

Description Of Test Program
A sort-test program as envisioned here consists of the following:
1. A function tstsort, that grinds out test targets, calls the sort function under test, and prints performance statistics.
2. A main function (and possible satellite functions) that invokes function tstsort.
To build the test program you link tstsort with main.

Function tstsort
Function tstsort resides with its local functions in source file tstsort.c. (Listing 1 shows function tstsort only; the complete file is on the code disk.) This file contains a suite of ten target generators that can be used to verify the performance of virtually any sorting function. You must, however, arrange the sort function under test to use the same argument list as the standard library function, qsort. The test suite will generate the sort targets shown in Table 1. The code for one of the more interesting generators, Reverse-Sorted-by-Halves, appears in Listing 2.
To call tstsort, the main program passes to it a pointer-to-struct of type SORT_TEST_STRU. The definition of SORT_TEST_STRU appears near the end of Listing 3. This structure provides essential parameters to tstsort, including the address of the sort function under test, and command-line parameters. The main program that calls tstsort must therefore be able to accept command-line arguments.
An example of a feature that has outlived its usefulness is member OKCodes in data type SORT_TEST_STRU. It holds a string of character codes that tell tstsort which targets to use (codes listed in Table 1) . At one time some of the sort targets were just too large or complex for the sort function under test. If OKCodes is assigned either a null pointer or an empty string, tstsort will use all sort targets.
Listing 4 shows a program that tests Microsoft's qsort. All main needs to do is fill in the blanks, so to speak, then turn the dirty work over to tstsort. main's parameter list must receive the standard command-line arguments. These arguments are passed to tstsort via its struct argument.
To test your own sort function, tack main onto it. When you get everything put together here is what you have:
Your main
 tstsort
   &
 friends
sort under
   test
The code disk contains twelve sort drivers. Ten consist of a sort function with a main appended. These are shown in Table 2. The other two drivers contain only function main, relying as they do upon Borland's and Microsoft's native qsort functions.
A makefile accompanies the program source files on the code disk. If you make a change to one of the source program files, just enter
nmake
at the command line and wait. Function tstsort drives each sort function under test using test targets that can be up to 65,535 unsigned long elements in length. Therefore, programs are compiled using the Microsoft C compiler's huge model. The command-line arguments for invoking the compiler are geared for C/C++ version 7.0.

Running the Test Program
The test program has a simple command-line interface. You invoke the program by typing the name of the program that calls tstsort, followed by arguments. These arguments enable you to control a few test parameters. For example, the test program typically sorts multiple instances of each target type. Successive instances increase in size, from a starting to a stopping size that you specify. This enables you to plot sort function performance vs. target size. Other command-line parameters let you control which target types to use, and whether or not to employ a millisecond timer. If you type <sortname> with no parameters you get a quick set of usage instructions, where <sortname> is the name of your executabale program. Instructions for using the command-line interface are included on the code disk.
Function tstsort produces two kinds of test outputs — an activity recap and plot-timing data. Activity recaps are printed on stderr and contain the following data:

range of sort targets requested

name of sort function under test

setting of Plot-Timer option

one or more sets of exchange/comparison counts, one for each suite in the range of sort targets

name and command-line abbreviation of each target

running time for each suite of targets

total running time
tstsort prints an activity recap for each target and for each suite of targets.
tstsort produces plot-timing data when the millesecond timer is enabled from the command line. Plot-timing data goes to stdout.

Sample Test Results
When I ran tests on the candidate sort functions, I wanted enough data to mean something, so I ran the functions at or near the upper range of target size. I selected a size of 50,000. Some of the sort utilities took an inordinate amount of time and one wouldn't run at all. I ran the full suite on eight of the candidates and estimated the time on the ninth. I also made runs on selected targets in the suite, all with a target size of 50,000 items.
Table 3 shows the total execution time required to sort 50,000 four-byte items over all target types. I could not run Borland Quicksort on targets larger than 16,384 four-byte items so I had to estimate the time.
Table 4 shows performance statistics to sort 50,000 four-byte items over one target type, random-uniform data.
On data that are not random, quite another story emerges. For example, the performance figures to sort 50,000 four-byte items of reverse-sorted-by-halves data are as shown in Table 5.
To make some sense of all this, I provide an analysis of the various sort functions employed:

Two-Way Merge Sort
I threw in this sort function for grins. It is my translation of D.E. Knuth's Algorithm S (Straight two-way merge sort) [2, p. 11]. My implementation is not coded efficiently so the stated running time is bloated; my only interest was comparing its efficiency with other algorithms.
Two-way merge sort is a solid (stodgy but guaranteed) O(NLgN) algorithm, which means that its timing performance is proportional to N times the binary logarithm of N. (So, for that matter, is Heapsort [5, 3].) Two-way merge sorting requires a companion storage area the size of the original sort target. This requirement frustrated early computer users and led to a flurry of development papers on sorting in the 1950s and early 1960s.
Note that in both single-target runs Knuth's Two-Way Merge Sort uses the least number of comparisons. Keep in mind that the 800,000 exchanges tabulated are not really exchanges, but represent moves of the data from area to area, whereas each exchange (a.k.a. swap) used by the other methods requires three moves of the data.

Fat-Pivot Quicksort
Fat-Pivot Quicksort uses fat-pivot detection [3, p. 116] and median-of-three pivoting [8, p. 186] plus sifting (Insertion Sort) on small segments as suggested by C.A.R Hoare [9, p. 11] and Knuth [7, p. 116]. Its behavior on the 50,000 item test suite is proportional to 0.6NLgN. This is not a wholly accurate measure because the fat-pivot algorithm especially favors four of the targets (Q, E, A, B). On the other six targets its behavior is proportional to 0.95NLgN.

Sedgewick Quicksort
Sedgewick's Quicksort performance was a pleasant surprise. His article [8] showed that it should perform well and it does. Sedgewick's Quicksort is a solid O(NLgN) procedure. Comparison-plus-exchange counts run at about 13.3NLgN on a run of 10,000 items in the test suite and 13.99NLgN for 50,000 items. The timing factor on the 50,000-item case is 80 µsec/item which yields an estimate of 10.63 seconds for the run of 10,000 items — very close to the actual time of 10.49 seconds.

Hoare's Quicksort
This implementation of Hoare's Quicksort might have brought tears to its originator. It performs credibly on six targets (R, S, V, H, F, T), with a timing behavior of O(NLgN). The other four targets (Q, E, A, B) present quite another story, where its behavior is O(N²) with a vengeance.

My Standard C Quicksort
The timing data (Table 3) for my version of Standard C Quicksort [10] illustrates how poorly this implementation fares. The execution times are longer than the original version by about six percent.
The timing behavior for this implementation is roughly proportional to 3.025N² + 4.3NLgN on the test suite. It exhibits poor performance, O(N²), on targets S, V, F, T, Q, E, A, and B.
Of the remaining two targets, My Standard C Quicksort showed fair performance on the random-uniform target (as previously shown) and sub-par performance on the Reverse-Sorted-by-Halves target. This is clearly the poorest performing Quicksort of the lot — at least on this test suite — and should be avoided.

Microsoft Quicksort
Microsoft's Quicksort on the test suite exhibits behavior roughly proportional to (61/36)N² + 4NLgN + 6N. Using a timing factor of 4.5 µsec/item closely approximates the time required in seconds on my 33 MHz 80486 CPU. This is another utility that should be avoided until the known O(N²) behavior can be eliminated. This utility performed poorly, i.e. exhibited O(N²) behavior, on targets V, F, T, A, and B.

Borland Quicksort
The timing for Borland Quicksort is an estimate based on runs of three populations — 5,000, 10,000, and 15,000. It exhibits behavior on the test suite proportional to (N²)/8 + 7NLgN + 6N. The O(N²) behavior stems from its inability to handle the sort target First-Item-Out-of-Order (F) efficiently. The timing factor used for Microsoft's Quicksort (4.5 µsecs/item) did a credible job of estimating the running times on the three populations. I could not compile the test programs under Borland C++ to sort targets greater than 16,384 four-byte items despite using their compiler option for huge data, -mh.

Heapsort
The performance of Heapsort on the entire test suite is proportional to 21NLgN. The timing factor for this algorithm is about 6.94 µsecs/item based on the run made on 50,000 items. These parameters predict that Heapsort running the suite on 10,000 items will take 19.4 seconds. This is close to the actual time of 19.6 seconds.

Shellsort
The Shellsort algorithm as implemented here is guaranteed to have a running time factor no greater than O(N^1.5) [11, pp. 17-26]. That is, the algorithm has an asymptotic upper bound proportional to N^1.5. My tests show that it runs at about O(N^1.272) on random data. (Shell estimated the timing to be proportional to N^1.226 on his data [4, p. 3].) But on other kinds of data I am at a loss. And I am not alone [7, p. 91; 12, p. 110]. The controlling mechanism is the gap-setting algorithm that seems to make Shellsort work less efficiently for some starting gap sizes (Shellsort works by breaking the target into subsets, separated by "gaps."). This is evident, for example, when the test program runs Shellsort on a suite with 29,519 items and then on 29,520 items. The starting gap for 29,519 items is 1,093 whereas the starting gap for 29,520 items is 3,280. Shellsort performs much better on the latter.
The Shellsort function can be compiled to use one of two kinds of diminishing gap increments: (2^k - 1) or (3^k - 1)/2. My tests indicate that the preferred increment sequence overall is produced by (3^k - 1)/2 despite the anomalous behavior just described.

Conclusions and Recommendations
This program can help applications programmers determine whether the sort function provided in the runtime library performs adequately on a variety of sort targets. For example, this program reveals that Microsoft's qsort exhibits O(N2) performance in five of the ten tests. Borland's qsort, claiming a median-of-three implementation, exhibits O(N2) behavior on one of the tests. Judging from these tests. Borland's qsort also employs "fat-pivot" detection.
This program can be invaluable to developers of runtime libraries who are unable to spend the time to perform extensive tests that a sort function demands.
tstsort's targets come from a variety of sources. The traditional ones are Random-Uniform, Already-Sorted, Reverse-Sorted, Reverse-Sorted-by-Halves, and All-Items-Equal. Each of the others I either stumbled over in practice, such as First-Item-Out-of-Order and Boresight-Simulation, or dreamed up because I knew that one of the sort methods would struggle over it. You can put your own sort target in the arsenal either by replacing one that exists or by adding your own.
Quicksort, properly implemented is a high-speed sorting algorithm. Its behavior is still O(N²) in the worst case but a carefully implemented algorithm should behave this way very rarely. Users who do not want to bet the farm on a contingency of "very rarely" should fall back on ultra-reliable Heapsort or Shellsort.

References
[1] C.A.R Hoare. "Algorithm 63: Partition," and "Algorithm 64: Quicksort," Communications of the ACM 4 July: 1961, p. 32.
[2] C.A.R Hoare. "Quicksort," Computer Journal 5 April: 1962, pp. 10-15.
[3] Brian W. Kernighan and P. J. Plauger. Software Tools (Addison-Wesley, 1976)
[4] Donald L. Shell. "A High-Speed Sorting Procedure," Communications of the ACM 2 July: 1959, 604.
[5] J.W.J. Williams. "Algorithm 232 (HEAPSORT)," Communications of the ACM 7 June: 1964, pp. 347-348.
[6] Robert W. Floyd. "Algorithm 245 (TREESORT 3)," Communications of the ACM 7, December: 1964, p. 701.
[7] Donald E. Knuth. The Art of Computer Programming, Vol. 3: Sorting and Searching (Addison-Wesley, 1973)
[8] Robert Sedgewick. "Quicksort with equal keys," SIAM Journal Computing 6 June: 1977, pp. 240-267.
[9] Thomas N. Hibbard. "An Empirical Study of Minimal Storage Sorting," Communications of the ACM 6 May: 1963, pp. 206-213.
[10] P.J. Plauger. The Standard C Library (Prentice Hall, 1992)
[11] Vaughan Ronald Pratt. Shellsort and Sorting Networks, Ph.D. Thesis (Garland Publishing, Inc., New York, 1979)
[12] Robert Sedgewick. Algorithms in C (Addison-Wesley, 1990)

Further Reading
Alfred V. Aho, Brian W. Kernighan, and Peter W. Weinberger. The Awk Programming Language (Addison-Wesley, 1988)
Thomas E. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms (McGraw-Hill, 1990)
Robert W. Floyd. "Algorithm 113 (TREESORT)," Communications of the ACM 5 August: 1962, p. 434.
A.F. Kaupe. "Algorithms 143 and 144 (TREESORT)," Communications of the ACM 5 December: 1962, p. 604.
Richard C. Singleton. "Algorithm 347: An Efficient Algorithm For Sorting With Minimal Storage (SORT)," Communications of the ACM 12, March: 1969, pp. 185-187.