Columns


Doctor C's Pointers®

Data Structures, Part 3

Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementors of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091 or via UUCP at rex@aussie.com.

This month I'll continue looking at the tradeoffs between arrays of pointers and multi-dimensional arrays.

Dynamically Allocated Arrays

As you saw in Part 1 (Doctor C's Pointers, CUJ, Volume 9, Number 4) you can allocate memory at runtime for any data structure you can declare at compile-time. Allocating memory for one-dimensional arrays is trivial — you define a pointer, assign it the value returned from malloc, and subscript it to get at the underlying elements. It is not much more difficult for n-dimensional arrays. The only trick is that you must use a pointer to an array, not a pointer to the first element in an array. (For a detailed discussion on pointers to arrays, see Doctor C's Pointers, CUJ, Volume 8, Number 5.) For example, the program in Listing 1 allocates an n x 10 array of ints where n is specified by the user.

This approach has limitations. To subscript pl to 2 levels, you must declare pl as a pointer to an array of 10 columns. That is, the number of columns must be hard-coded in the pointer declaration. Therefore, you must create a new pointer type for each possible row size needed in a program. (Alternatively, you could use void pointers and explicitly cast them to pointers to arrays of a given type. However, since the casts must have the array sizes hard-coded in them, you still don't have a general solution.)

The example in Listing 2 uses arrays of pointers instead. The version in Listing 2 could be easily enhanced to handle a variable row size and, therefore, be much more useful. However, both versions are limited to handling a specific base data type, in this case long int. It would be nice to have a way to allocate a 2D array of arbitrary size and any base data type. The code in Listing 3 achieves this, although it is not maximally portable. By displaying the addresses of each element, the program reveals that each row is not necessarily contiguous with the logically adjacent one.

This solution should work on most implementations. It is not maximally portable because it assumes that all pointers have the same size and representation. When the rows are allocated, an expression of the form:

aryadr[i] = malloc(...);
is used. However, aryadr is declared as a pointer to pointer to void. As such, the expression aryadr[i] (which is equivalent to *(aryadr + i)) involves pointer arithmetic. The offset i is scaled by the size of the underlying element, in this case a pointer to void. When aryadr is returned to the calling function it is assigned to a different type. For example, in

pl = alloc2da(...);
the pointer to pointer to void returned is assigned to a pointer to pointer to long int, pl. If you then subscript pl (as in pl[i]), the scaling factor for the offset i is the size of a pointer to long int. Because C permits each pointer type to have its own unique size and representation, subscripting pl can produce a different result than subscripting aryadr.

As I stated earlier, this solution is not maximally portable. However, unless you are running on a word architecture or a segmented memory architecture you should be safe. Even on the Intel segmented architecture it will work if you don't mix memory models. All word-architecture-based compilers I've used also make all pointer types the same size and representation. In reality then, the code will be highly portable.

Shuffling Data Vs. Pointers

Thus far, you have been dealing with lists composed of constant data. Of course, this is not always the case. If you are sorting a list of objects, the list entries must be modified at runtime. You can have a default list at program startup, but based on events that occur, you wish to substitute different values for one or more list nodes. You might also wish to replace a node with some null value indicating that option is not currently available.

Listing 4 shows a simple sort of a list of strings. In this case, the list is an array of pointers. While you must go through those pointers to compare the underlying strings, when you need to reorder the list, you can swap the two pointers. This is a very cheap exercise — the cost of the swap is independent of the length of the strings being compared. Essentially, you are pretending the strings have been swapped. Because the strings themselves are never moved, they need not be stored in writable memory. Listing 5 shows a 2D array version of this.

Because there are no pointer objects, you cannot simply swap pointers around. Instead, you must swap two whole rows. Swapping two whole rows is considerably more expensive because you must allocate a temporary array instead of just a pointer, and you must call strcpy three times to make the swap. The cost of the swap at runtime is directly proportional to the length of the strings — the longer the strings, the more time it takes. You also have the overhead of the three function calls. Because the strings themselves are moved, they must be stored in writable memory.

Conclusion

You can use arrays of pointers to achieve the same effect as multi-dimensional arrays. However, because of their ability to "bait and switch" the underlying data by swapping pointers, arrays of pointers are more flexible. The amount of memory used by each approach depends on whether rows are variable length or not. In any event, even if arrays of pointers require more space, their flexibility often outweighs this cost.