Features


A Recursive Visit Template

Marc Briand

Navigation is always a difficult art. Distilling out the common operations in a reusable template can help organize the job.


A template function is a nice mechanism for encapsulating an algorithm. Ideally, a programmer can implement an algorithm in terms of a template function without knowing the types the algorithm will operate on. The user of the algorithm can then just fill in the types and go. Things don't always work out so neat and clean, of course. A lot depends on the nature of the algorithm.

One algorithm that can be used with this approach is what I call a recursive visit algorithm. This is an algorithm for visiting the elements of a recursive composite structure — a structure that may contain both simple elements and (sub)elements of its own kind. A file-system directory is a good example. It may contain both files (simple elements) and more directories.

Using a template function, it is possible to encapsulate the behavior of walking through such a structure, without knowing what type of structure it will be. What's more, the template function can be written to apply some function to each element as it is visited — again, without knowing at the time the template function is written what operation is to be performed.

In this article I present an implementation of such a template function, and show how it used to apply operations to recursive composite structures. It is a fairly simple template function. As a plus, it is not in itself recursive. Recursive functions can often be replaced by iterative algorithms, with the help of a good stack container or two. That is what I've done here [1].

Template Function rVisit

The definition of template function rVisit appears in Figure 1. Template parameter CompositeType is the type that will be used to represent the composite structure. ProcessType is a type that encapsulates the operations to be applied to the elements of CompositeType. ProcessType has a couple of responsibilities. First, it must supply a pair of functions, processComposite and processSimple, to be applied to each composite and simple element of CompositeType, respectively. Second, ProcessType must supply functions to navigate within the composite structure. Specifically, it must supply the following:

This implies that ProcessType must be intimately familiar with CompositeType. It might seem that CompositeType should be responsible for supplying the functions for navigating itself. I made it the responsibility of ProcessType to allow plugging in existing implementations of the CompositeType class without modification. ProcessType thus acts as a sort of adaptor between the CompositeType and rVisit. I show how to create a ProcessType class later. For now, I focus on the template function.

Function Operation

rVisit makes use of two stacks, implemented in terms of STL deques. cdeq is a stack of CompositeHandle; ndeq is a stack of int. Each element of ndeq represents how many unexpanded CompositeHandles are on the cdeq stack at a particular level. In this implementation the code pushes elements to the stack by calling the deque's push_back member function, and it pops the stack by calling pop_back. The top element of each stack is represented by the deque function back.

The recursive visit function operates as a simple state machine with two states, searching and not searching. In the searching state, the function calls the ProcessType function getHandles, which loads up the vectors tempcvec and svec with handles to the composite and simple elements at the current level. The size of tempcvec (the number of composite handles found) is pushed to ndeq and the elements of tempcvec are pushed to the cdeq stack in reverse order. Thus, the top portion of cdeq represents the composite handles left to be expanded at the current level. The top element of ndeq represents the number of composite handles contained in the top portion of cdeq. After pushing to the stacks, rVisit calls the functions supplied by ProcessType to be applied to the composite and simple elements.

In the not-searching state, the function checks how many composite handles are left on cdeq at the current level. If there are any unexpanded composite handles at the current level (ndeq.back() > 0) the function tries to move down a level. It attempts to move to the composite element indicated on the top of the cdeq stack, by calling ProcessType::moveDownTo. If all the composite handles at the current level have been expanded (ndeq.back() == 0), the function attempts to move back up a level, by calling ProcessType::moveUp. However, if it's time to move up and the depth is already zero, there's nowhere to go and the algorithm has completed its task.

A Usage Example

Figure 2 shows a Directory class. Directory has a single data member, curr_path, which indicates a directory on the file system. Directory's sole purpose in life is to return lists of the subdirectory and filenames found in curr_path. Directory doesn't claim exclusive access to a file system (so it doesn't need to be a singleton). It just tells you what's in the directory indicated by curr_path. You can read and change curr_path through the functions getCurrentPath and setCurrentPath. The functions getSubdirNames and getFileNames retrieve the directory and filenames found at curr_path. The function pathIsValid checks whether curr_path is a valid path on the file system. These last three functions contain code specific to the compiler and operating system. Their implementations are not shown here, but are available on the CUJ ftp site (see p. 3 for downloading instructions). I implemented these functions to work with C++Builder 3.0 and Windows 95.

Directory is pretty simple, and you can't do a lot with it. But by supplying a ProcessType class, you can use the rVisit template function to visit every file and directory under Directory::curr_path. You can apply the function of your choice to each directory encountered, and another function of your choice to each file.

Figure 3 shows the class DirAdapter. It will be used as the base class for a ProcessType class. It contains a single data member, separator, which holds the separator character to be used in pathnames on the file system. DirAdapter declares the two public typedefs, CompositeHandle and SimpleHandle, which are required by the rVisit function. In this case the handles are just strings. DirAdapter supplies the three navigation functions mentioned earlier. The implementations are also shown in Figure 3.

To apply specific operations to the elements of a directory, derive a class from DirAdapter and add member functions processComposite and processSimple. Figure 4 shows two classes defined to perform two different operations. NamePrinter, when used as the ProcessType class with the rVisit template function, will print all the directories and files found under Directory::curr_path to an output text file. The usage is as follows:

int main()
{
   NamePrinter np("outfile.txt");
   Directory dir("c:\\some_dir");
     
   rVisit(dir, np);
}

rVisit deduces the template parameters Directory and NamePrinter from its arguments.

If you use FileCounter as the ProcessType template parameter, the template function will count all the files under Directory::curr_path:

int main()
{
   FileCounter fc;
   Directory dir("c:\\some_dir");
     
   rVisit(dir, fc);
   cout << "Number of files is "
        << fc.getCount() << endl;
}

Conclusion

Template function rVisit encapsulates an algorithm for walking a recursive data structure. It separates the visitation algorithm — which will always be the same, regardless of element type — from type-specific operations, such as movement from element to element. By supplying an adaptor class, you can use this template function to apply operations to every element of a recursive data structure, without having to write any recursive code.

This technique could easily be adapted to encapsulate other visitation patterns. It may also be possible to use polymorphism in the adapter class for more versatility. With a little luck and ingenuity, we need never dread working with tree-like data structures again.

Note

[1] I have nothing against recursive function calls except that they have a tendency to blow the program stack when the nesting levels get too deep. That has happened to me often enough that I just don't care to mess with recursion. Besides, I think the iterative code is easier to understand.

Marc Briand is Editor-in-Chief of C/C++ Users Journal. He loves programming, writing, and too many other things for his own good. However, he hates to work, which is why he is an editor. He may be reached at mbriand@mfi.com.