July 1991/Doctor C's Pointers®

Columns

Doctor C's Pointers®

Data Structures, Part 4

Rex Jaeschke

Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementors of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091 or via UUCP at rex@aussie.com.
In the three installments thus far, I have used arrays to implement my lists. Although this works, there are trade-offs. For example, an array is an efficient way to store a list if the nodes need not be in any particular order or if their logical order exactly corresponds to their physical ordering. When inserting or removing a node, to maintain the logical/physical order relationship I had to shuffle all subsequent nodes up or down. This process can be expensive, particularly for long lists or ordered lists having frequent node insertions and deletions.
I need a list construct that allows insertion and deletion of nodes with minimal impact on existing or remaining nodes. This requires the physical order of nodes in a sorted list to possibly be different from their logical order. With an array you intuitively know the nodes are stored contiguously, so it is easy to go to the next subscript. As such, there is no overhead required to logically join the nodes together — you just use the properties of an array. However, if logically adjacent nodes are not going to be physically adjacent, you need some way of linking consecutive nodes. Therefore, I will use a construct that is a linked list.

Linked Lists
A linked list is a set of links (or nodes) that are somehow linked together to form a list. The nodes may be linked in the order in which they were created, in some sorted order, or even in no particular order at all.
In addition to having space to store the data corresponding to that node, a node must also contain information about how to get to the next node in the list. Each node contains some overhead to help manage and traverse the list. (In fact, in this month's example, the overhead takes up the same amount of memory as the actual data stored, so each node is twice as big as it was in Part 1, CUJ, April 1991.) A list in which each node contains only one link to the next node is called a singly-linked list. Typically, the nodes will be linked in the forward direction but they could be linked in the backward direction instead. If each node had room for two links it could simultaneously point to both its predecessor and its successor. This is called a doubly-linked list.
You can define a node to have both data and link information. The obvious data type to use is a structure. In the example, it is declared as

struct node { int value; /* the node's data value */ int fwd_ptr;/* ptr to next node in list */ };
Each structure of this type has room for an integer value and a pointer to the successor node. (I use the term "pointer" here in a generic sense. The pointer does not actually have a C pointer type.) Each node's forward pointer points to the next node in the list, with the last node's forward pointer having the special value EOLIST to indicate end-of-list. The whole list of nodes is represented in an array of structures of this type, as in

struct node ary[MAX_NODES];
Now I am combining arrays and linked lists. I have a linked list of nodes stored (possibly) in logical order, but that list is actually represented as elements in an array.

The Example
In Part 1, I used a lengthy example to implement lists using arrays. I have recycled that same example and have made numerous changes to accommodate the linked list approach. (Due to a high volume of mail suggesting how I could eliminate the use of goto from the End case, I have changed that approach too.) From an overall point of view, the program remains largely intact. Only the details of the list management have been changed.
Along the way, I have been developing a new commenting style, one which I'm sure will generate even more reader mail. (Remember, nothing is ever a complete waste — it can always serve as a bad example.) I'll bravely debut it publicly here for your comment.
With regard to program readability, one of the most important thing a programmer can do is to use white space effectively. By liberally sprinkling spaces, tabs, and blank lines throughout, you can give the reader a fighting chance at seeing the overall shape of the logic and the individual source tokens used. The next problem then is program understandability — the ability of the reader to understand what the source tokens mean once he can recognize them. The key to program understandability is using meaningful (but not unwieldy) identifiers.
When writing nontrivial programs in any language, I always use comments. However, I find that comments can be distracting, particularly when placed at the end of many consecutive source lines. Some statements are quite long, and there's no room for a comment. Some programmers spend as much time worrying about their comments lining up as they do writing the code in the first place. Some comments document the very obvious. Others don't get updated when the code to which they apply gets revised.
To avoid many of these issues, I have placed very small comments at the start of source lines. Because I have rigorously adopted a tab indenting scheme, the first eight characters are otherwise wasted, so why not use them. At least then I don't have to worry about comments not lining up. Of course, I'm not really talking about detailed comments here. In fact, my comments are simply labels that refer to detail in a comment block preceding the fragment of code containing the label(s). It's like having pseudo code in the source.
Now, back to the problem. As in Part 1, the nodes will be stored in a fixed size array. However, in this version it will be an array of structures, not ints. Initially, there is no data so the list is empty. As a new node is needed, a spare entry in the array is allocated for it, and the new entry is inserted in the appropriate place in the list of used nodes. At any time, the number of nodes in use plus the number of nodes free is constant and equals the number of nodes in the underlying array.
Just how does the program know which nodes are free? Well, not only does it maintain a singly-linked list of used nodes, it has a singly-linked list of free nodes as well. Initially, all nodes are part of the free node list. To maintain these lists, there are root and tail pointers (root_node and tail_node, respectively) to the start and end of the used node list, and a root pointer (free_node) to the start of the free node list. In all cases, pointer values of EOLIST indicate end-of-list. (The only reason I have a tail pointer for the used node list is to assist adding a new node to the end of that list.)
Listing 1 shows the code. Note the comments I've added concerning fragments of special interest.

Next Month
Five more options remain. They are: Change node, Search for first occurrence of a node with a given value, Sort nodes, Insert a node before a given node, and Remove node. The source and explanation for these will be the subject of next month's installment, so insert that issue in your reading list now!