Using Multiway Trees for Fast Access


Figure 2 shows the layout of a multiway tree. The nodes of a multiway tree contain more than one key, and the keys in a node are always in sorted order. Both parent and child nodes have these properties.

Figure 2 illustrates the following pattern:

All the keys in Child0 are less than Key0
All the keys in Child1 are greater than Key0 but less than Key1
All the keys in Child2 are greater than Key1 but less than Key2
... All the keys in Childn-1 are greater than Keyn-2 but less than Keyn-1
All the keys in Childn are greater than Keyn-1.

When the nodes of a mutliway tree are not full, considerable storage is wasted. Despite this possible waste of storage, multiway trees are frequently used since the monetary cost of storing data on an external direct-access device like a disk is very inexpensive.

Search efficiency can be increased by increasing the multiway tree order, the maximum number of children at each node. Given that the tree must hold a certain number of keys, increasing the order of the tree decreases its depth. The shorter the tree, the less nodes that must be accessed during a search. As noted before, accessing a node in external storage takes a lot of time, so a tree with a higher order leads to more efficent searching.

Listing 8 shows a partial definition of the DskMBtree class.

I define the following constants for use with the multiway tree:

enum { MAXKEYS=101, MID=MAXKEYS/2 };

MAXKEYS specifies the multiway tree order as 101. Class BTNode (Listing 8) is a class that holds a multiway tree node via its data member, BTnodeStruct, which is repeated here:

typedef struct BTnodeStruct {
    short keycount;
    short numtrees;
    unsigned long key[MAXKEYS];
    streampos data[MAXKEYS];
    streampos child[MAXKEYS+1];
};

The key field holds the OIDs; the data field holds the positions of the objects in the object file. The child field contains pointers to child nodes. Since the multiway tree is maintained in external storage, the block of storage that makes up a node must be read into memory before any of its fields (keycount, numtrees, child) can be accessed.

The DskMBTree member function open_root opens a multiway tree file. get_root reads a root node from a multiway tree file; if no root node exists, it calls makeroot to contruct the root node.

DskMBTree contains two find functions.

find(int& pos, slist<PATHNODE>& path,
   const unsigned long key)

This function finds a key, starting the search from the root node. Upon return, pos will contain an index for use with the key, data, and child arrays contained in the node.

find(int& pos, BTnode& node, slist<PATHNODE>& path, 
   const unsigned long key)

This function finds a key by starting the search from a specified node, which may or may not be a root node.

When searching a node, these two functions put all nodes encountered into a single-linked list of type slist<PATHNODE>, where a PATHNODE is defined by the following typedefs:

typedef pair<streampos, int> NODEINDEX;
typedef pair<BTnode, NODEINDEX> PATHNODE;

Storing nodes in the linked list makes it easy to find a node's parent, simply by examining the previous node in the path. A second, and expensive external search is not required.

Some other functions of class DskMBTree include:

child(streampos, BTnode&);

This function reads a node from external storage.

makenode(const BTnode&);

This function writes a node to the disk file.

The multiway tree is built from the bottom up. When the bottom node is full it is split into two nodes, and its middle key is promoted up to its parent, using a function named split.

The function deleten is used to delete a node from the multiway tree; it is a relatively big function because deleten not only deletes a node but also tries to keep the tree as balanced as possible.

Functions split and deleten are not shown here, but they are available with the full source code listings at www.cuj.com/code.

The code to manage the multiway tree also defines a class BTnodeAvl, which is similar in function to the objAvl class seen before. BTnodeAvl is a class that performs disk storage management. However, its implementation is much simpler than objAvl's, because nodes of a multiway tree are of fixed size. BTnodeAvl acts as a stack. Newly deleted nodes are pushed on top of the space available list, and empty nodes are popped off the top when additional space is needed.

Although this implementation uses a separate file to store the information contained in the BTnodeAvl object, it would be relatively easy to embed this information in the multiway tree file, similar to the way the objAvl object is embedded in the object file.