Columns


Doctor C's Pointers®

Data Structures, Part 14: Trees

Rex Jaeschke


Rex Jaeschke is an independent computer consultant, seminar leader, and author of several books, including The Dictionary of Standard C (Professional Press, Horsham, PA). Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091 or via UUCP at rex@aussie.com.

In previous installments, this column has looked at arrays, linked lists, stacks, and queues. All of these data structures are linear. This month I begin a tour of a non-linear structure, the tree. The concept of a tree, with its root, branches, and leaves (or terminal nodes), is common in representing data organized as a hierarchy. By convention, however, you tend to view the tree as being inverted: the root is located at the top and the branches spread out in a downward direction. The whole set of nodes and branches below a given left or right child branch is known as a subtree.

While a tree can have many branches and each branch can, itself, have many other branches, I will initially concentrate on a special kind of tree, the binary tree. A (non-empty) binary tree consists of a root node. This node may have zero, one, or two branches. A node is often referred to as a parent. As such, the two branches from a node lead to that node's left and right child nodes. Since each node may have up to two children and each child is a node, a binary tree can be arbitrarily deep.

Representing Algebraic Expressions

One of the most common instances of trees in computing is algebraic expressions. (Refer to installment #9 in the February 1992, issue for an introduction to this subject.) Consider the expression

((A + B) * (C - (D / E)))
Since every operator is binary (that is, it takes two operands), you can easily represent this expression as a binary tree, as in
Figure 1.

This particular tree is reasonably well balanced. That is, the depth of each branch (as indicated by (1), (2), etc.) from the root is about the same. For the most part it is useful to keep tree depths shallow and the tree balanced.

Throughout this series I have shown that data structures can be represented in memory either as elements in an array or in some form of dynamically allocated list. Trees are no exception. Listing 1 implements the tree in Figure 1 as a statically allocated array of structures.

In this simple tree, each node stores only one character of data, though it could easily store more within the structure, or else contain pointers to larger data structures. What the nodes contain is of no consequence here.

The left and right child pointers are actually indices into the array that lead to the corresponding child, if any. If no child is present, the node's left_child and right_child contain the value zero. Because of this tactic, this scheme cannot use element zero of the array. I could have used some other sentinel value, like -1, but I would have been forced to use a signed rather than unsigned index variable, thereby losing half the possible range of index values.

From a C programmer's point of view, it is more natural to write the left and right child branches as pointers. A NULL can then indicate "no child" and avoid wasting element zero of the array. In fact, using pointers completely frees you from using an array. The program in Listing 2, which uses pointers, statically creates and initializes the tree, and proceeds to traverse it in three possible ways: prefix, infix, and postfix orders. Depending on the data you are using the tree to represent, you may wish to maintain the nodes in one of these three orders.

Note that the codes uses a stack as well as a tree. The stack is a useful tool in traversing trees.