August 1990/Writing Your Own Quicksort

Features

Writing Your Own Quicksort

Mark Nelson

Mark Nelson is a programmer for Greenleaf Software in Dallas. Mark works on Greenleaf's line of C libraries for the MS-DOS, 0S/2, UNIX, Xenix, and VMS operating systems.
Sorting is a fundamental problem in computer programming and the quicksort, as described by Hoare [1] and Knuth[2], is generally recognized as one of the best general-purpose sorting algorithms. Quicksort is useful enough that a standard library routine, qsort(), has been included in many C programming libraries such as the UNIX C, Borland Turbo C and Microsoft C libraries. qsort() is now in the ANSI C standard. These facts have encouraged C programmers to treat the quicksort as a "black box" and to be relatively unconcerned about how it operates.
qsort() has its limitations. It lacks the flexibility we might want for a library routine. qsort() operates only on fixed-width arrays in RAM. To sort records in a file, for example, you would have to sort an index array in RAM using qsort() and then rearrange the records based on the index array. In the MS-DOS world, qsort() can only handle indices 16 bits or less. Sorting a list with 250,000 records would not be possible.
In some cases, though, qsort() is too flexible. Almost all implementations of qsort() require a call to a user-supplied comparison routine ž definitely a detriment to performance. Hand coding a sort routine for a specific application will frequently produce a sort routine that runs much faster than qsort().
It's reasonable to want a general-purpose source code version of the quicksort algorithm that could be inserted into programs needing something more versatile than qsort(). This article describes such a routine.

Terminology
Here are my assumptions. The sorting operations I discuss are designed to sort arrays of records. A record is a collection of data, but the record does not need to be fixed length. Records are accessed using indices, the first index being 0. The sort operation will sort N records, so the indices will run from 0 to N-1. The sort will rearrange the records in ascending order, based on a comparison function. The comparison function can use any criteria you want to determine which record should precede another. However, in general, records will be sorted on one or more keys, which are usually a single element in the record. Finally, the sort routine also needs a swap function which, given their indices, can swap two records.
A truly general-purpose sorting routine only needs three input parameters: N, to tell it how many records to sort; a comparison function, to tell it when to swap; and a swap function, to perform the swap. Note that the ANSI qsort() function does away with the swap function by assuming that the data will always consist of fixed-length records in RAM. This makes the code more efficient, but it does away with some of the flexibility a good sorting routine needs.

Quicksort
The reasons for using the quicksort algorithm are fairly well established. It runs much faster than the simpler bubble and insertion sorts. These two sorts can both be coded in just a few lines, but have the unfortunate property of running in N² time. Quicksort runs in N^*log(N) time, which is considerably faster as N becomes very large. Other N^*log(N) sorting algorithms exist, but at least in the general case, the quicksort will usually run faster.
The idea underlying the quicksort is simple. To sort a group of records, you first select an arbitrary record. The initial quicksort algorithm described below selects record number 0. Then you partition the entire array of records into three distinct groups by a series of exchanges. The first group consists of all the records having keys below that of the selected record. The second groups consists of the selected record by itself. The third group consists of all the records with keys above that of the selected record. When you place these three groups in order in the array, the arbitrarily selected record will be in its proper place in the list. All the records in the first group are going to stay in the first group, and all of the records in the third group are going to stay in the third group.
This completes the first portion of the quicksort algorithm. Now, the same algorithm will be applied recursively to the lower and upper partitions. The process continues until the recursion creates partitions that are only one record long.
In principle, this seems simple enough. You pick an arbitrary record in your array, and put it exactly where it belongs in the final sorted array. At the same time, you make sure that all records that should be lower than the arbitrary record are positioned below it in the array, and all of the records that belong above it are positioned above it. A simple idea, but the implementation is not necessarily intuitive.
Traditionally, you perform the first part of the quicksort algorithm by using a pair of pointers that start at either end of the array and work their way towards the middle. For example, if your selected record was at position 0, you'd start the low pointer at position 1 and the high pointer at position N-1 (see Figure 1) .
The partitioning process starts by increasing the low pointer. The record pointed to by the low pointer is compared to the key record. If the record is lower than the key record, the low pointer can be bumped up a notch. You repeat the process until either the low pointer is pointing to a record higher than the key record or has gone to the end of the list. In Figure 1, the low pointer would be advanced to position 5.
After the low pointer has been pushed as far up as it can go, you start lowering the high pointer, using a similar decision rule. If the high pointer is pointing to a record that is higher than the key record, and the high pointer is not below the low pointer, the high pointer is decremented. In Figure 1, the high pointer will be bumped down two notches to record 8.
At this point every record above the high pointer also belongs above the key record, and every record below the low pointer belongs below the key record. If there are still some records between the high pointer and the low pointer, then we haven't finished partitioning the array into the higher and lower sections. To continue, we exchange the two records pointed to by the high pointer and the low pointer, putting them into their proper places in the array. We can then continue the process of moving the low pointer up and the high pointer down. Figure 2 shows the array before and after the switch. As you can see, all the names above the high pointer belong above KURT in the list, and all the names below the low pointer belong below KURT.
Figure 3 shows that after another increment/decrement cycle, the high pointer has fallen below the low pointer, meaning the high pointer now points to the spot in the array where the key record KURT belongs. An exchange is made, and now KURT is in the right spot.
Not only is KURT in his correct spot, but the main array has now been split into two smaller arrays. All of the records above KURT will stay above KURT. All of the records below KURT will stay below KURT. Now you have two smaller arrays to sort.
Now comes the recursive part of the algorithm. After you've located the position of the key record, you can apply the quicksort algorithm to the three high records and the six low records and continue to subdivide the array recursively into smaller and smaller partitions until all have been sorted.
Figure 4 shows the array after the first sort. Note the resulting division of the array into partitions one and two. The algorithm then moves on to sort partition one. Figure 4 shows the array after partition one has been divided into partitions three and four.
Figure 5 shows the array after partition three is sorted. There are now four remaining partitions to sort, but they will remain unchanged. The quicksort algorithm effectively broke the sorting operation into several smaller sorts. The first operation yielded partitions one and two, which held six and three records respectively. The second operation broke partition two into partitions three and four, which had four and two records. Finally, partition three was subdivided into a two-record and a one-record partition.
The array in Figure 5 still must go through four small sort operations. Partitions five and four will each be sorted once. Partition two will be broken down into two more partitions, one of which will have to be sorted again.

The Algorithm
Figure 6 shows a pseudo-code description of the algorithm; quicksort is both attractive and easily defined in a language that supports recursion. Not that you can't (or shouldn't) implement it in languages that don't support recursion. Note that Figure 6 defines the algorithm, not the implementation.
The algorithm is based on ordinary comparisons between array (key) elements, such as ARRAY[HIGH] <= ARRAY[FIRST], where [HIGH] and [FIRST] are records. (The algorithm definition should be rewritten with a generic comparison function replacing the specific comparison.)

Analysis
The quicksort algorithm runs faster than the N² time characteristic of slower methods because of its divide and conquer philosophy. If an array has 100 records, dividing it into two 50-record sorts may speed things up. After all, 100 times through an array of 100 records yields 10,000 potential exchanges (100*100 = 10,000) but 50 times through 50 records in two partitions produces only 5000. Even better would be to split the file into four arrays, since 4^*25^*25=2500. So, if the splitting algorithm is efficient, quicksort can outperform N² sorts. As it happens, the partitioning method described in the algorithm is efficient, at least for large numbers. For sorts of fewer than 10 or so records, however, the algorithm slows down.
During the sort operation of our first example, the quicksort algorithm performed 11 exchanges and 46 comparisons. Performing a bubble sort on the same data takes 21 exchanges and 55 comparisons. The gap between the two algorithms will continue to widen as the size of the array increases. As a test, I increased the array size to 100 elements and did another comparison. The quicksort algorithm performed 169 exchanges and 1045 comparisons. The bubble sort performed 2360 exchanges and 4950 comparisons. The insertion sort did a little better, with 1280 exchanges and 2459 comparisons.

The Code
There are at least three good reasons to code your own quicksort routine:

The qsort() routine supplied with your compiler is too slow.

The qsort() routine supplied with your compiler won't work with your problem because you have too many records, because they're stored in a file, or because they're not fixed length.

You're working in a language other than C.
The code I will develop below addresses all three issues.
The minimal quicksort routine in Listing 1 works fairly well. In fact, if you write a short main() that sets up a big array of random integers, you'll probably find this routine runs faster than the qsort() supplied with your compiler. Performing the comparison function in-line with the code gives this minimal quicksort a big advantage.
To compare the performance of this program to qsort(), write a short driver program that creates a big array of integers (see Listing 2) . To use your qsort(), you'll also need to create a compare routine to pass to qsort(). Unfortunately, C compilers lack standardization in their implementation of time/timer functions. My driver program uses the Microsoft C timer functions to give a rough look at the elapsed time.
My timing routines have only a one-second resolution, so be sure your data and ARRAY_SIZE constant are set up for a sort of at least ten or so seconds. Otherwise the jitter in the timer will cause unacceptably high error levels.
Using a test program nearly identical to the one shown in Listing 2, I obtained the test results shown in Figure 7. The quicksort algorithm exhibits nearly straight-line behavior, not exponential. This test graphically demonstrates the superiority of the quicksort algorithm. An interesting exercise would be to plot the behavior of the bubble and insertion sorts over the same range of array sizes.
I recompiled the program using my_qsort() routine instead of qsort(). I changed the line
qsort( test_array, ARRAY_SIZE,
      sizeof(int), compare );
to read
my_qsort( test_array, 0,
        ARRAY_SIZE-1 );
I compiled the program with optimization turned off. Figure 8 shows results of the same series of timing tests.
Once again, you can see that quicksort performance is nearly linear. However, note the improved performance gained by coding the comparison function in line. Calling the comparison function indirectly adds a heavy burden to qsort(), and almost doubles the runtime.
This qsort() routine is one to tuck in your toolbox. However, there are some improvements that can be made to this algorithm.

Recursion — Friend Or Foe?
There is something appealing about using recursion in an algorithm. When used properly, it can simplify a definition. However, a "good" algorithm definition does not necessarily make "good" code. my_qsort() uses a recursive call to itself, and this can cause several problems.
While C and Pascal (and some BASIC implementations) support recursion, other languages don't. Implementing recursion in Assembly language can be simple or extremely difficult depending on your target machine.
Secondly, recursion can place a big burden on the runtime stack. In the MS-DOS implementations of C, the stack size is usually relatively small. A recursive program that goes a few hundred levels deep may very well blow up with a "stack overflow" message. Each call to my_qsort() burdens the stack with a return address, a few parameters, and a few automatic variables. A slightly different implementation can eliminate most of this baggage.

A Quicksort Shortcoming
As you will recall, quicksort managed to improve on N² performance by repeatedly partitioning the data array into smaller arrays. A little mathematical analysis shows that the best way to partition the array will be to divide it into equal halves, repeatedly. 100*100 gives us 10,000. A single partition into two equal parts yields 50*50 + 50*50, which is only 5,000. Another partition gives (25*25) + (25*25) + (25*25) + (25*25), which is down to 2,500.
Unfortunately there is one case where even quicksort will show this performance ž when working on an already-sorted array!
My code arbitrarily picks record 0 to start the quicksort. The low pointer starts at element 1 and will not advance, since it already points to an element greater than element 0. The high pointer is then decremented all the way down through the array until it reaches element 0, where the partitioning is complete. Since the high pointer has decremented all the way down to 0, the next partition to be sorted is now only one element smaller than the previous one. This process continues, over and over, with the partition decrementing by only one element each time.
It's safe to assume that any qsort() routine will encounter sorted data at least some of the time. When it does, the quicksort routine will yield exceptionally long sort times. You can address this problem in a couple of fashions.
One way is to check if the data array is already sorted. A quick scan through the data won't take long, and if it indicates an already sorted array, you can simply exit the routine without doing anything. From the caller's point of view the quicksort will appear to run lightning fast on sorted data. Microsoft has taken this approach in its runtime library.
Checking to see if the array is already sorted will drastically improve your sort times for the fully sorted array, but it won't help a bit in the other pathological case, the reverse sorted array. And a "very nearly" sorted array will also perform extremely poorly.
To demonstrate Microsoft's poor design, I rewrote the test program shown in Listing 2 to sort the array again after it has finished the first sort. To make the array "nearly sorted", I just changed a single element of the test array so that it won't pass the "already sorted" test run.
The resulting test program is shown in Listing 3. Once you adjust ARRAY_SIZE so the first sort takes a reasonably long time, you'll find that the second sort seems to take forever. At one point, my random array sorted in about three seconds, but the nearly sorted array took 300 seconds!
There are several ways of tackling this problem, and the method you choose will depend on the assumptions you make about your data. One good assumption to make is that your data is in one of two states: either "fairly sorted", or random. Fairly sorted data consists of an array that has been tinkered with only slightly after a previous sort, or is still sorted. I won't even try to rigorously define "random data," other than to say that it doesn't show any apparent organization.
If your goal is to choose a key record such that the record nearly bisects the array when sorted, you can see where our current algorithm is flawed. On a random data set, record[0] is just as likely to end up in the center of the sorted array as any other record. But in a partially sorted array, record[0] is more likely to end up on the near or far end of the array. So what would be a better choice for a first record? The natural choice would be the record in the middle of the partition to be sorted.
You can do this by simply performing an exchange of record[0] and record[size/2] at the start of the sort. In a random set of data, the middle record is as good a choice as any, and in a nearly sorted array, the middle record ought to nearly partition the array down the middle. In fact, if we are using this version of the quicksort algorithm, it is somewhat of a challenge to devise an array ordering that will generate worstcase timing.
Choosing the "median of three" for the key record is somewhat more sophisticated. You compare record[0], record[size/2], and record[size-1], and move the record located between the other two (the median) to position 0 where it serves as the key record. "Median of the three" requires more code than the previous approach, but it tends to work better in some less pathological cases, and probably improves the speed of the sort on many sets of data.
For simplicity's sake, my quicksort routine uses the first option. Borland's documentation states that Turbo C uses the "median of three" approach, and my tests show that the Borland qsort() routine does very well on sorted data. In fact, it tends to sort a nearly sorted array faster than a random one, behaving almost like most human beings do.

The Final Enhancement
Earlier, I pointed out that the quicksort algorithm tends to run in N*log(N) time as N increases, while the bubble sort and insertion sort algorithms tend to run in N²time. However, when N is quite small, say under 25, the insertion sort will actually run faster than a quicksort over the same size data set.
Therefore, as the quicksort routine recursively develops smaller and smaller partitions, you'll reach a point where an insertion sort should sort the resulting partition. You can do this by changing the last few lines of the sort program in Listing 1 to look like Listing 4.
Insertion sorts are frequently described by analogy as the process of picking up playing cards from a stack and forming a sorted hand with them. Each time a new card is selected, you find its correct spot in the hand. All the cards that will reside above it in the hand are moved up, and you insert the card in its correct spot. Listing 5 shows a code fragment that accomplishes this.
The insertion sort routine is short and simple. The main loop goes through all the possible values of i, where i represents the card we'll pick up. The variable j represents the position you're testing as the insertion point. First you try to insert the new card at the highest possible position. Then you work your way down to the lowest position, and shuffle all the rejected cards up by one to make room for the card to be inserted.
Ironically, this final implementation of quicksort does most of the actual sorting by the insertion sort! The quicksort routine simply creates several smaller partitions to be sorted with a different method.
Knuth credits R. Sedgewick with one final refinement of the quicksort. Rather than performing an insertion sort on each small partition as qsort() spits them out, it's more efficient to leave the small partitions unsorted until qsort() has finished. Only at this point, should the entire array be subjected to what amounts to a big insertion sort over the entire range. Although you'll still have to do exactly the same amount of comparisons and exchanges, you'll reduce the overhead, since the insertion sort will only be called once.
The final question is: what is a good value of SMALLEST_QSORT_PARTITION? After running test programs on various data sets, I settled on a value of 15. Any value between about 5 and 25 gave consistently good results. Bear in mind, though, that the optimum value will change when the relative costs of comparisons and exchanges differ from the ones I used here.

Conclusion
The final, well-adjusted version of my_qsort() is shown in Listing 6. In this version, I incorporate the refinements suggested in the article, but the result is still a fairly short piece of code. When coded with in-line comparisons and exchanges, my_qsort() handily beats the qsort() routines that come with my compilers. You should also find it relatively easy to translate into other languages, if you need to.

Bibliography
[1] Hoare, C. Computer J. 5, 1962, pp. 10-15.
[2] Knuth, Donald. The Art of Computer Programming, Volume 3, Sorting and Searching, 1973, Addison-Wesley, Reading, Mass., pp. 105-139.