Features


A Class for Scanning Directories

Michal Niklas

Expanding wildcard filenames is something you want to do uniformly across many utilities and different operating systems.


Like many other programmers, I write many simple file utilities. My first utilites operated on the standard input, or on a list of files named in the command line. As I learned more about Unix, I found that the shell can do some work for me. If I write *.c, for example, the shell expands it into a sorted list of the names of all the C source files in the current directory. Of course I wanted much the same capability for MS-DOS tools, so I read about the MS C library function FindFirst and family. With a little work, I got to where both Unix and MS-DOS versions of a utility accepted similar filename shorthand on the command line.

Later I found that it would be useful to scan subdirectories also, so you can describe sets of files in multiple subdirectories. So I have written functions that call FindFirst recursively to do this job too.

Along the way, I noticed that I often would copy all the source code of a utility, just to make a few small changes in it for different environments. But C++ taught me to write code that minimizes the need to make such changes, and C++ compilers have become smarter and smarter over time. So I decided to create a C++ class to serve as the backbone for all my text utilities and eliminate the need for multiple versions. I present here a small but quite useful class, called CFileList, that can automate common directory scanning and file operatons.

The main idea is to to provide an assist for tools that are often called with sets of files that have similar names. For example, in MS-DOS you want to be able to write:

myrm -r *.o *.bak

and have myrm operate on all the files whose name ends in .o, followed by all the files whose name ends in .bak. The Unix shell does this for you, as I mentioned above. But if you want to get these "wildcard" names past the shell so that CFileList does the job, you have to write under Unix:

myrm -r "*.o" "*.bak"

In either case, as the command name suggests, this program should remove all files with .o and .bak extension from the current directory and from all its subdirectories.

Listing 1 shows the header cdir.h, which declares class CFileList. Listing 2 shows the source file cdir.cpp, which defines many of the member functions for the class. The class also uses functions from the package Simple FileList Regular Expression, based on functions described by Brian Kernighan and Rob Pike in an earlier article (Dr. Dobbs' Journal, April 1999). These functions are included the source file sflre.cpp, shown in Listing 3.

In my CFileList class, the member function ProcessFiles is where most of the work is done. Here I used the Posix readdir family of functions to scan directories. I also use the Posix stat function to check whether a name corresponds to a normal file or a directory. I use STL vectors to save the file and directory names. After a whole directory is scanned, I call the user-supplied function for each file. If the user has specified recursive mode, I do the same for all subdirectories.

The CFileList class is designed to serve as a base class, but it can be useful as it stands. In the constructor, you can specify a wildcard pattern that describes the names of the files you are interested in. The member function ParseArgs can then parse command-line arguments. It recognizes the flag -r for recusive directory scan and -i to ignore case sensitivity when matching names. Case sensitivity may be a useful option on Unix-style systems, where filenames are case sensitive, but you probably don't want it under MS-DOS systems, where they aren't. To do extra work while parsing command-line arguments, just create a new class derived from CFileList and write your version of ParseArgs.

To create a program that uses CFileList, just create a function that takes as an argument a filename (with full path information) and, if necessary, any additional data you might need in a derived class. The additional data can also be useful if you want to save some information between parsing calls. For example, your utility may choose to query a user before removing a file, as in (broken to fit column):

remove cdir.o ? (y-yes, n-no, a-all,
   c-cancel):

If the user types a, you can set a special flag in a structure passed to your file function. The next time, your function checks this flag and no query is printed.

To show how simple it is to use this class, I wrote my own version of remove. This program recognizes the recursive, silent, and querying options. Listing 4 shows the source file myrm.cpp, which implements this utility. Here, I derive a new class from CFileList. Its member function ProcessFiles lets you pass a pointer to access the ask and silence flags.

I have tested this class on Linux using egcs 1.1, and under Win95 using Borland C++ 5.02.

Michal Niklas has a Master of Computer Science from the Technical Univerity of Szczecin, Poland. For three years, he has worked at HEUTHES (http://www.heuthes.pl), for financial and banking developers of applications. His interests are Internet applications, cryptography, telephony apps, smart cards, and various low-level class libraries. You can contact him at michal.niklas@heuthes.pl.