August 1991/Doctor C's Pointers®

Columns

Doctor C's Pointers®

Data Structures, Part 5

Rex Jaeschke

Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementors of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091 or via UUCP at rex@aussie.com.
Last month in Part 4 I showed the first half of a program to implement a singly linked list in an array of structures. This installment presents the rest of that example. See Listing 1.

Some Commentary
Maintaining a linked list requires more work than maintaining a simple array. Not only must you allocate memory for the links between nodes, must also make sure these links are correct when nodes are inserted or deleted. However, such insertions and deletions typically only affect their immediate neighbor nodes and not the whole list.
One property you lose by going the linked list route is the ability to index into a given node directly. To get to the nth node, you must traverse the list starting from the root, counting nodes as you go. This makes the Change node, Display node, Insert node, and Remove node operations more expensive in terms of coding, program size, and execution time.
I have made the node pointer's type signed int. Standard C guarantees that an int can represent values at least in the range -32767 to 32767. Consequently, lists up to 32K entries are allowed, which should be adequate for most applications. Because the value stored in a link pointer is actually an array subscript, the 32K negative possibilities are never used. I have chosen a value of -1 to represent EOLIST, although I could have chosen any negative value. If, however, you wish to make these pointers an unsigned integer type, you can no longer use -1. Instead, you must chose something like the value all bits set, and make sure your array never actually contains this many elements.
You could maintain the nodes in ascending sorted order, thereby eliminating the need for the Sort node option. This would also eliminate the need for the Insert node option, and change the Add node to mean "add this new node in the correct place." Currently, duplicates are treated as different entries. Instead, you could insert them in the order they arrived or based on the value of some secondary key. Alternatively, you could add a duplicate count field to each node. Then, when a duplicate value arrived, you would increment the counter rather than insert a new node. However, this would only work for fixed data. If data values can change at runtime, each duplicate must be maintained in separate nodes because their values may change independently. (This is also true for nodes containing more than one data member.)
The array ary has automatic storage duration. It could just have easily been static. It could also have been allocated dynamically at runtime via malloc. As noted in Part 1 (CUJ, April 1991), with careful design the timing of allocation should be transparent. One interesting aspect is that if you allocate the array using malloc and you fill it up you could extend it using realloc. If necessary, realloc will move the old block of memory to a new location. Because all your node forward pointers are relative offsets from the array base you shouldn't care. (This would not be true if these were real C pointers.) If you need more dynamic memory than you can get , you may try writing the list to disk and recycling the memory it occupies. If you do this repeatedly, you finish up with a set of disk files, each containing a sorted subset of nodes. By merging these files you can get one sorted list, as a file.
A major advantage of more advanced linked lists is that the nodes can be linked in more than one list. You can have several lists threading their way through the same set of nodes, but in different orders. Only one copy of the node data need exist. If you update a node's data by locating that node via one list, the change is immediately reflected when that node is located via any other lists that include it. In fact, the number of links leading to and from a node in a list depends entirely on your requirements. Implementation is simply a matter of adding more links to each node's structure definition and making them point to the right place.
The functions that maintain the free node list could easily be made more general. Instead of dealing with fixed names for the array of nodes and the various pointers, all this information could be passed in via arguments. As such, these functions can maintain any list they are asked to. Similarly, the options to manipulate the used node list can also be made general.

Still To Come
You could revise this example in a number of other interesting ways. For example, instead of using a fixed size array of structures to represent the list you could allocate each node at runtime as it was needed. Then when a node is removed instead of freeing it, you simply add it to the free node list. Also, depending on the amount and type of data stored in each node, you might wish to store the actual data elsewhere and only store a pointer to it in the node directly.
We will look at these and other issues such as doubly linked lists, circular lists, and lattice constructs in future installments.