Columns


Doctor C's Pointers ®

Using The Quicksort Function

Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementers of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091 or via UUCP at uunet!aussie!rex or aussie!rex@uunet.uu.net.

For quite some time, the C library has had a sort function called qsort. According to ANSI C, qsort now lives in the header stdlib.h. Prior to ANSI C, this header did not exist so qsort was often declared in some other header or more likely, explicitly declared by the programmer each time it was used.

Introduction

qsort uses the quicksort method to sort the elements of an array into ascending order. Its prototype is:

void qsort(void *base, size_t
   nmemb, size_t size,
   int (*compar)(const void *,
   const void *));
where base is the address of the first element of the array to be sorted; nmemb is the number of elements to be sorted starting at address base; size is the number of bytes in each element in the array; and compar is a pointer to a user-supplied function to be used in element comparisons. (You may recall from previous discussions that size_t is a type declared in a number of standard headers besides stdlib.h; it is the unsigned integer type returned by the sizeof operator.) The comparison function returns a negative value if the first argument tests less than the the second, a zero if they are equal, else a positive value. This is just like the library function strcmp.

Since a void pointer is assignment-compatible with any data pointer, the first argument to qsort may be the address of any data type. When qsort is actually called, the pointer passed will be converted, as if by assignment, to a void pointer. (For most systems, such a cast results in no code being generated since all their pointer types have the same representation.)

qsort walks through an array, given the base address, element size, and element count, using some interesting casts. Note that base is not a const void * since qsort typically must modify some or all of the elements in the array passed. However, both arguments to compar are pointers to const void since the comparison function should not modify the data it's comparing.

Sorting A List Of Integers

To ease into using qsort we'll begin by sorting an array of six ints into ascending order.

The array of six ints has automatic storage duration. Historically, such arrays were not permitted to have initializer lists, although ANSI C now supports this capability. If your compiler complains, simply add the static keyword to the declaration.

After you define the comparison function cmpia, calling qsort is quite straightforward.

cmpia compares two ints whose addresses are passed. However, cmpia takes two void pointers, not int pointers. Since you cannot dereference a void pointer, a private copy of each argument is made in an int pointer. Note that without the const qualifier on the int pointers, your compiler may issue a warning since assigning a pointer to const to a pointer to non-const can lead to undefined behavior. Using two int pointers, you can easily compare the underlying ints and return the appropriate indicator, as shown.

I have developed the habit of using return 0 at the end of main, otherwise my compiler warns me I've failed to return a value from a non-void function. I find this message annoying so rather than ignore it, or worse yet, declare main (incorrectly) as a void function, I take the trouble of actually returning this exit status code even if it is never used. On some systems (VAX/VMS, for example), zero might not signify a success value, in which case you should either return the stdlib.h macro EXIT_SUCCESS or some other passive value.

The variables pi1 and pi2 are really unnecessary and can be replaced by a cast as follows (although you could argue the code is not as clear):

int cmpia(const void *pe1, const void *pe2)
{
   if (*(int *)pe1 < *(int *)pe2)
      return -1;
   else if (*(int *)pe1 == *(int *)pe2)
      return 0;
   else
      return 1;
}
An even simpler solution eliminates the test as well.

int cmpia(const void *pe1, const void *pe2)
{
   return *(int *)pe1 - *(int *)pe2;
}
To sort the elements into descending order, you simply reverse the sense of the comparison function as follows:

int cmpid(const void *pe1, const void *pe2)
{
   return *(int *)pe2 - *(int *)pe1;
}
which produces the output:

descending integer order

array[0] = 25
array[1] = 24
array[2] = 22
array[3] =  3
array[4] =  3
array[5] = -5

The Need For Custom Comparisons

The comparison function is called directly from qsort, not from the user-written program. One consequence of following the prescribed interface is that you need a custom-built version of the comparison function for each type of comparison. cmpia and cmpid can handle only ascending and descending sorts for signed ints. They cannot handle such orders for arrays of type char, short, or long int. Each type requires its own version.

Consider the case where the input array contains unsigned integer values (of any type). In such cases we must revert to the if tests shown in Listing 1 since subtracting one unsigned value from another can never result in a negative value.

If you called the comparison function directly, you could make it more general-purpose so as to handle ascending and descending orders for a given type. Doing so would require an extra argument, however, and qsort calls the comparison function with only two. Of course, you could always use a global flag. In any event, the tradeoff would be using several smaller and probably faster, individual comparison functions versus using a few larger, more general-purpose ones that make decisions on which way to sort.

Sorting A List Of Strings

qsort is described as sorting the elements in an array. However, what if a given array does not actually contain the data to be sorted, but rather pointers to that data? Listing 2 contains code in which the strings are pointed to by an array of pointers to char.

Now this is very useful. qsort doesn't necessarily sort based on the contents of the array elements. In fact, qsort simply traverses the underlying array a pair of elements at a time, swapping those elements as directed by the comparison function. And since we write the source to that function, we can chose to interpret the data directly or to treat the element as a pointer (to some level) to the data to be sorted.

In this example, cmpstra is given the addresses of two char pointers, hence the (char **) casts on pe1 and pe2. The expression *(char **)pe1 has type char * and its value is the address of the first string. Since strcmp will do the job nicely, we call it directly.

A function that simply calls another function and does no other work should raise a red flag — that perhaps the first function should really be a macro. However, you cannot take the address of a macro and qsort needs a function address as its last argument.

To sort the strings in descending order, you simply negate the result returned from strcmp.

int cmpstrd(const void *pe1, const void *pe2)
{
   return -strcmp(*(char **)pe1, *(char **)pe2);
}
The output will be

descending string order
array[0] = xyz
array[1] = abc
array[2] = Xy
array[3] = Abc
array[4] = 1234

Sorting Structures

Sorting an array of structures is a little more complicated. In this example each element in the array of structures contains an object count and description for some arbitrary objects, in this case furniture pieces. It might be useful to sort the structures into either ascending or descending order by count or by description, at the user's pleasure. Listing 3 contains one solution.

The array cmpfun is a list of function pointers which correspond to the option entered, making it easier to setup the last argument to qsort.

The validation of the input option is hardly robust, but adequate for this example.

Using The offsetof Macro

Note how each comparison function in Listing 3 is locked into the structure type. The explicit struct pointer cast is needed to access the correct member within that structure. Consider the case where each structure contains a number of members of the same type. It should be possible to use a general-purpose comparison function that works for all of them. This arrangement, however, would require a global variable the user could set to indicate the offset of the member to be compared.

By using the address passed into the comparison function, you could perform some fancy pointer casts to access any given type of object at any offset from the structure's beginning. In fact, given the structure address as a void pointer and the offset, you don't even need to know the structure's type. That is, the comparison function would work for all structure types containing a given member type. You would compute the offset value using the offsetof macro in stddef. h. Sounds great, let's give it a try.

The function cmpstia in Listing 4 contains some real pointer expressions. The program actually works and in the bargain, I've finally found a use for offsetof. In fact, I'm sure the code is portable, as well as ANSI-conforming.

This approach can, in fact, be used to sort any ints; they don't even have to be in a structure. To sort an array of ints, simply define offset to be zero so that the pointers passed to the comparison function are treated as actually pointing to an int rather than to the base of a structure containing an int.

Multikey Sorts

qsort sorts elements according to one key. To sort elements using two keys, you must first sort them on the primary key. Then for each unique primary key value, you count up the number of duplicates and use qsort to sort them once you make base point to the first in that sublist. You then repeat the process for other keys.

Consider the case of an array of pointers to strings. You wish to search first on columns 1-4 then on 10-15. You could use strncmp for the comparisons for both keys, but you would need a global variable to notify strncmp where to begin comparing.

The code that lets your comparison function know where to start is relatively straightforward although I'm not sure this method is the best way to write an efficient multikey sort.

Special Sorting Orders

There are other sorting sequences than those used here. For example, a dictionary sort would ignore case and possibly punctuation as well. A telephone directory sort might treat names beginning with 'Mac' and 'Me' the same. A phonetic sort would ignore certain letter patterns resulting in more duplicates. How could you implement these with qsort?

The comparison function must make a private copy of the converted input data before it performs the comparison. In no case is the function permitted to modify the elements to which its arguments point. (They have the const protection) And in any event, you don't want to modify them since you could not restore them later on.

If you could know the maximum length of the strings, you could allocate a local buffer (automatic if the code is to be reentrant, else static is OK) to store converted strings prior to comparison. If this size was not known, you could use strlen and malloc to allocate memory dynamically. However, allocating and freeing memory every time the comparison function is called could be quite expensive. And an even bigger issue is converting the same input strings many many times, since the conversion is done each time the comparison function is called.

A much better solution would be a two-dimensional array of pointers. The first pointer in each row would point to the original string, and the second would point to the converted string. Then you sort based on the second pointer finally traversing the array via the first pointer to get the sorted strings. That is, the comparison function is not involved in the string conversion at all.