September 1991/Doctor C's Pointers®

Columns

Doctor C's Pointers®

Data Structures, Part 6

Rex Jaeschke

Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementors of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091 or via UUCP at rex@aussie.com.
The two previous installments implemented a linked list in an array. This installment will construct the list using nodes allocated by malloc instead. To accommodate the new approach and to make the project more manageable, the program has been completely reorganized. It does, however, still have much the same flavor as before.

The Revised Problem
Both approaches strive to create a list of nodes and to perform various operations on individual nodes or on the list as a whole. However, rather than have each node simply contain some integer data, we make it deal with strings. Each node contains a unique string and an occurrence count. The list is doubly linked and the nodes are stored in alphabetical order. That is, the forward pointer from one node points to the node containing the string in the next highest order alphabetically. Likewise, the backward pointer points to the next lowest string entry. Of course, the strings being input could arrive in any order. Given the input strings 'Green', 'Red', and 'Blue', the corresponding nodes created should be linked in order of string value 'Blue', 'Green', and 'Red'. Note that 'black' comes after 'Red' since (at least in the ASCII character set) uppercase letters have lower values than lowercase letters.
The structure template for a node is declared as follows:

typedef struct node { /* ptr to next node in list */ struct node *pfwd; /* ptr to prev node in list */ struct node *pbwd; /* ptr to node's string value */ char *pstring; /* occurrence count */ unsigned count; } Node;
Unlike in the previous solution, the forward and backward links are real pointers, not just array indices. In each case, they point to objects of the same type as the object in which they are contained. While C does not permit self-referential structures, it does allow this construction. Note that an object of type struct node can contain pointers to like objects but not to actual instances of that object type. The type struct node is still an unknown size when these pointer members are declared. However, the compiler isn't interested in that type, just the size of a pointer to that type. And the size of a pointer to a structure is independent of the type of that structure.
typedef was used to make the type a bit more logical. The idea is that programmers should use the more logical name Node instead of struct node. (Even though structure and union tag names are in a different name space than typedef names, I chose to spell the two differently to avoid possible programmer confusion.) However, since the name Node is not defined until the end of that declaration, we must use a tag name for the structure so we can refer to that type in the first two member declarations.
If we did not need pointers such as this, the structure tag name could have been omitted. As we have seen, the tag name is necessary in this case. As such, we run the risk of a user actually using struct tag instead of the more logical Node. Unfortunately, this style defeats the purpose of an information hiding mechanism such as typedef or #define, which is to let the user deal with just the logical name without the underlying details. If they do this, the underlying details can be changed with no resulting ill effects.
Note that the string corresponding to each node is not actually stored in that node. Rather, a pointer to the start of the string is stored there. This allows strings to be of arbitrary length. However, we must then allocate space for each new string as it arrives. If we knew the maximum length of the strings, we could have made pstring an array of char instead. However, each node would then have contained that whole array even if the strings in the array were much smaller.
This version of the program has less options than in previous installments. For example, there is no provision for inserting a node since this version eliminates that need by storing the node in alphabetical order after adding it. That is, the user doesn't say where to insert the node since the program determines its location automatically based on the string's value. Since the list is already maintained in sorted order there is no need for the sort option either. Also, there is no need for the options that search for the first and last occurrences since each node's string is unique. The other missing option is one for changing a node's data. This could have been provided but requires some interesting decisions to be made. The possibilities will be discussed in a future column.

The Code
As in previous installments, the code is heavily commented. Therefore, I'll comment only on fragments of special interest. Note that although the program will be shown in separate parts, all were originally compiled in one source file.
In Listing 1, the pointers proot_node and ptail_node keep track of the beginning and end of the list. pfree_node points to the start of a list of free nodes. The counter nodes_in_use is incremented each time we allocate memory for a new node and decremented when a node is removed. It is only used with the count_nodes option. In fact, nodes_in_use could be eliminated altogether if you were willing to count the actual number of nodes each time count_nodes was requested instead.
In order to simplify user-command processing, commands are selected by number. If the user selects a number less than zero or greater than the maximum, the selection is invalid. To avoid the switch/case construct (used in previous installments) or a nested if/else construct, we now use a jump table. A jump table is a table of addresses each of which points to the corresponding function to be called to service that particular command.
The table, called actions, is an array of function pointers. Since the dimension was omitted, the compiler determines it from the initializer list, allowing us to easily add (or remove) entries from the table. Since each element of an array must have the same attributes, each function pointer must point to a function having the same argument list and return type. In this case, all processing functions have no arguments or return values.
The array actions is defined as follows:
static const void (*actions[])(void) = {
..
};
By giving it a storage duration of static, we ensure that the array is allocated and initialized before main begins execution. (Most compilers do this at compile time). Making it automatic would have been okay too. However, if the table were in a function called alot, it would be expensive because the program would allocate and initialize the table every time that function was called. Also, by using the const type qualifier, we prohibit our program from modifying the elements in the table.
The while loop appears infinite. However, if the exit option is selected, the exit-processing function calls the library routine exit() , which terminates the program.
The single statement used to actually invoke the selected action is as follows:
(*actions[code])( );
The code entered is used as an index into the jump table, producing a pointer to the corresponding function. We dereference that pointer, giving us an expression which designates that function. We then call that function using the function-call operator (). The grouping parentheses (in both the expression and the table definition) are necessary since without them we would appear to have an array of functions, which is not possible in C.
Listing 2 shows the header linklist.h. The exit option simply informs the user if any nodes still exist in the list and gives them the opportunity to exit or not. In Listing 3, the macro EXIT_SUCCESS (which was invented by ANSI C) is defined in stdlib.h.
The help function in Listing 4 is quite obvious. Note, however, that the option numbers displayed must match the order of initializers in the jump-table actions defined in main.
Listing 5 shows how to add a node. The process for adding a node is quite complicated since the node being added could be the first node in the list, a new first or last node, or a duplicate of an existing node.
The function locate_node, in Listing 6, is called by various functions to either see if the given string already exists in the list, and if so, at what address. If the string doesn't exist, the function finds the address of the node preceding where such a string would be inserted.
To get a new node we could simply call malloc each time. Then when a node is removed, we could free up the space it occupied using free. However, in Listing 7, I chose not to free up such memory, but instead to add a pointer to it at the front of a free-node list much like in the previous installment. Therefore, we only malloc a new node if none is left in the free node list.

Next Month
The other options will be the subject of next month's installment. Also, I will show a way to allow the user to enter commands as case-insensitive text, bypassing the numeric menu options used so far.