July 1998/A Binary Search Tree with Iterators

Features

A Binary Search Tree with Iterators

Mike Rumore

If you need to access the elements in a tree structure in more than one order, you need a tree that defines more than one kind of iterator.

Introduction

Binary search trees are among some of the most important data structures used in computing. They form the basis for many data structures requiring a hierarchical ordered structure. This article describes an implementation of binary search trees I wrote about two years ago. The algorithms for creating and maintaining binary trees are described in many data structure books [1, 2, 3]. In addition, these sources describe algorithms for visiting and processing the data stored in a binary tree.

Some of the most important tree traversal algorithms are preorder (also known as depth-first), postorder, inorder, and level-order (also known as breadth-first). This article will primarily describe the implementation of these traversal algorithms using iterators [4, 5].

Binary Tree Data Model

A binary search tree is a recursive data structure, each node of which contains a data item and two pointers. One pointer, called the left pointer, points to another node containing data that is "less than" the data in the current node. Similarly, the second pointer is called the right pointer, and it points to a node that is "greater than" the current node. When the nodes contain non-numeric data, the criterion for what is less-than or greater-than is typically determined by a programmer-provided function.

These left and right nodes, in turn, have pointers pointing to nodes that satisfy the same conditions as before: left-node data is less than current-node data, and right-node data is greater than the current-node data. This structure allows for efficient key-value searches.

Like other well-known data structures, the binary search type includes a small set of standard operations which help to define the data type. These operations include inserting a new data element, deleting an element, retrieving the data associated with a given key, checking whether the tree is empty, and deleting all data in the tree. These operations also include the various traversal methods listed above. They are defined recursively using the pseudo-code in Listing 1.

Since these algorithms use recursion, they must use a stack to keep track of nodes visited. On the other hand, level-order traversal requires a queue, not a stack. In a level-order traversal the order that nodes are visited depends on their distance from the root node. That is, the root is visited first, then both left and right nodes of the root. Then the left and right nodes belonging to the root's left and right nodes are visited, and so on. All nodes at the same level are visited before visiting nodes on the next level.

Implementation Approach

The implementation presented here uses a different approach than that used in the Standard Template Library (STL). I use inheritance and virtual functions to define my binary search tree interface, and templates to allow the storing of different data types. STL, by constrast, standardizes the interface to the data structures. It makes very little, if any, use of virtual functions and inheritance (see reference [5]).

The STL defines const and non-const iterators. Const iterators do not allow the data in the underlying data structure to be modified via the iterator; they return data values by value (copies of the data). Non-const iterators return data by reference and thus allow modification directly to the data. This implementation supports const iterators since data is returned by value. Also, the STL allows deletes on the underlying data structure. Deleting items from a container can leave an iterator in an invalid state ([6, p. 133]). In the sidebar I discuss different approaches to solving this problem.

The major classes in this implementation are as follows:

AbstractTree — defines the interface to the binary tree class. The binary tree class is derived from this class (Listing 2).

AbstractIterator — defines the interface for all iterator classes. All iterators are derived from this class (Listing 3).

BinaryTree — contains all the algorithms for inserting, deleting, and retrieving data from a binary tree. This class (Listing 4) is derived from AbstractTree.

BinaryTreeNode — defines the actual structure used to construct the binary tree. It declares BinaryTree and all iterator classes as friend classes. Its internals are all declared protected (see Listing 4).

BinaryTreeIterator_PreOrder — derived from AbstractIterator, it defines the functions operator++(int), reset(), etc. It is initialized from another preorder iterator or from a binary tree (see Listing 4). This iterator uses a stack to track the nodes that still need visiting.

BinaryTreeIterator_PostOrder and BinaryTreeIterator_InOrder — both are derived from class BinaryTreeIterator_PreOrder. Each class overrides the operator++(int) and reset() functions. Each inherits done(), operator()(), and isEmpty() from the base class (see Listing 4).

BinaryTreeIterator_LevelOrder — derived from AbstractIterator, it defines the functions operator++(int), reset(), etc. It is initialized from another level-order iterator or from a binary tree (see Listing 4). This iterator uses a queue to track which nodes still need visiting.

The iterators require two other classes for their implementation: Queue_List, a queue, and Stack_List, a stack. Both are implemented using a doubly-linked list.

The code for the creating and maintaining a binary tree is available from the CUJ ftp site (see p. 3 for downloading instructions). See reference [1] for all the details on inserting and deleting data from a binary tree.

Iterator Implementation

Iterators are basically handler classes for collection classes. They provide a method for accessing the contents of a collection without exposing the collection's internal implementation. Also, iterators allow different access methods to be defined for a collection class. More than one iterator can be created at any one time, which allows for multiple and simultaneous traversals of a collection.

There are two basic questions to be answered when designing an iterator. The first question is whether elements in the collection can be added or deleted while an iterator exists. The second question is whether the data in the collection can be modified directly via the iterator. A simple way to restate the second question is: is an object accessed via an iterator returned by value (no direct modification) or by reference (direct modification possible)?

The iterators described in this article are read-only; they return the data by value. As shown in the example, it is possible to reset an iterator and start from the beginning again if the binary tree was modified (via inserts or deletes). The full source code contains sections (not shown here) that compile if the conditional #define MUTABLE_ITERATOR is defined. If this conditional is defined, the iterators track whether or not the underlying binary tree was modified. It is then possible to call the conditionally defined function resetByValue to reposition the iterator at the node it referenced prior to modification.

Listing 5 presents the code for the preorder iterator. There are two constructors and a destructor. The first constructor is the copy constructor. The second one uses a BinaryTree & as an initializer. The default constructor and the assignment operator are declared private and are not provided. Reset reinitializes the iterator to the beginning. isDone returns true if the iterator has completely traversed the tree, or false if not. operator++(int) advances the iterator to the next node in the binary tree, and operator() returns the node's value.

Listing 6 presents the code for the inorder iterator. New constructors and a destructor are provided. The inorder iterator overrides the reset function and operator++(int) from the base preorder class. The postorder iterator (not listed) is implemented similarly.

Finally, Listing 7 presents the code for the level-order iterator. This iterator is not based on the preorder iterator since it requires a queue, not a stack. It provides the same functionality (reset, operator++(int), etc.) as the other iterators.

Iterator Performance

STL iterators are required to allow access to a data item in O(constant) time (amortized), with N accesses for N data items [6]. The iterators described here provide the same performance. For example, access via the preorder iterator (Listing 5) involves one call to the push, pop, and top stack operations once for each node in the binary tree. Since the push, pop, and top operations are O(constant) in time, access time is also O(constant).
The inorder iterator has O(constant) performance, if the the total amount of push, pop, and top are amortized over all data accesses. The for-loop in the preorder operator++(int) can execute more than once on a given call, but in the long run, each node is still pushed, popped, and topped only once. The net result is an average of O(constant) for each access (amortized). The level-order traversal iterator also has O(constant) performance. The postorder iterator is the least efficient, but it still maintains an amortized O(constant) performance.

An Example

Listing 8 shows an example of a binary tree used as a dictionary. The code inserts a few strings into the dictionary, and then uses an inorder iterator to print the strings alphabetically. Then, several strings are deleted, some new ones are inserted, and the inorder iterator is reset to start a new traversal of the binary tree. Again the strings are listed in alphabetical order.

The example code calls the increment operator++(int) every time it advances the iterator to the next element in the binary tree. It calls IsDone to determine whether the iterator has finished traversing the binary tree. At each node, the program calls operator() to get the current value through the iterator. Note the call to reset in the head of the second for-loop. This call reinitializes the iterator since nodes have been deleted and inserted, rendering it invalid.

Changing the Implementation

It's relatively easy to change the current implementation to support non-const iterators. The recommended changes are shown below. Supporting inserts and deletes is also possible but more difficult. If you're interested, see the sidebar.

To support non-const iterators:

1) Referring to Listing 3, change the return type of operator() from DataType (return by value) to DataType & (return by reference).

2) Referring to Listing 4, note that all iterators contain a constructor of the form BinaryTreeIterator_XXOrder(const BinaryTree<DataType> &), where XX is Pre, In, Post, or Level. Remove the const modifier from the formal parameter. Next, change all return types of operator() from DataType to DataType &. Last, in the constructors BinaryTreeIterator_PreOrder and BinaryTreeIterator_LevelOrder, change the data pointer from const BinaryTree<DataType> * to just BinaryTree<DataType> *.

Summary

In this article I've presented an implementation of a binary search tree that uses templates to support node data of any type. I've implemented iterators that perform standard tree traversal operations (namely preorder, postorder, in-order, and level-order). These iterators are implemented as a class hierarchy that uses inheritance and virtual functions. In this respect my implementation differs considerably from STL's way of doing things. However, like STL iterators, my iterators provide access in O(constant) time (amortized).

Since I wrote this binary tree class, I have added a height-balanced binary tree class to the library for those cases where a large and degenerate tree would be very expensive. I have found that this class is very useful as a dictionary when parsing large log files.

References

[1] Alfred V. Aho, John E. Hopcroft, and Jeffery Ullman. Data Structures and Algorithms (Addison-Wesley, 1983).

[2] Timothy A. Budd. Classic Data Structures in C++ (Addison-Wesley, 1994).

[3] Mark Allen Weiss. Data Structures and Algorithm Analysis (Benjamin/
Cummings, 1994).

[4] Robert B. Murray. C++ Strategies and Tactics (Addison-Wesley, 1993).

[5] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley, 1995).

[6] David R. Musser and Atul Saini. STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library (Addison-Wesley, 1996).

Mike Rumore has a Ph.D. in Nuclear Physics from the University of Colorado, Boulder. He has been working for Lucent Technologies (formerly AT&T) Bell Labs for 11 years as a software engineer. He has been using C++ for over six years for tool generation. His current interests are data structures and numerical algorithms (linear inversion methods applied to radiation spectra).