Toby Popenfoose is a system analyst developing image processing applications for a large commercial printer. Prior to that, he was an Air Force Instructor Pilot. Toby has a B.S.E.E. from Purdue University and a M.S.C.S from Midwestern State University. You may contact him at 5720 East 200 North, Warsaw, IN 46580.
I have often been asked "How many C source modules do we have in our Image Processing system?", or "Is there an easy way to delete all of the *.SAV files in all 40 subdirectories?". These two questions motivated me to develop a general-purpose wildcard subdirectory search utility function.
I had several design goals. First, I wanted to make the function general enough to put in my utility library. Second, I wanted to maintain wildcard search capability. I wanted the function to perform a subdirectory search based on a switch. In addition, I wanted to minimize machine dependencies so I could port it to my AMIGA workstation. Last, I wanted the function to provide a totals switch.
Design
Looking at the available functions in Microsoft C v5.1, I decided to use _dos_findfirst and _dos_findnext functions, because they support wildcard searches. They also fill in a file info block (find_t structure) with the matching file's name. The drawback with this choice is that these functions are machine dependent. Two other non-portable machine dependent functions that I used are _splitpath() and _makepath(). Note, in Listing 1 the four machine dependent functions that I have used begin with an underscore. The ANSI standard reserves underscores for secret names with external linkage [1]. Now these really are not secret names or they would not have been documented in [2]. Looking at my other target platform, the AMIGA, I had two choices. I could use the AMIGA_DOS built-in functions Examine and ExNext which would require a directory Lock call at each subdirectory level (since the AMIGA is a true multitasking workstation) or I could use the well-respected public domain resident library ARP.LIBRARY (see sidebar, [3], [4], [5], [6] and [7]) with FindFirst and FindNext. I chose the latter because it would also give me wildcard capability, perform vertical subdirectory searches, and it more closely mimicked the Microsoft MS-DOS _dos_findfirst and _dos_findnext.I designed my utility using a recursive main that returns an int value to itself. I did this to prove that a main can be recursive because it is a function and behaves as any other function would [8]. The pseudo code follows.
if (subdirectory switch) { push all subdirectory names onto a FIFO; while(subdirectories left) { get subdirectory off FIFO; recursively search that subdirectory; } } find wildcard matches within this directory; call user function with path argumentMy first pseudo code was with a stack for the subdirectory names, but I have actually implemented a FIFO (first in first out) buffer. If the subdirectories have been sorted is some order, that order will be maintained as it recursively searches each subdirectory.For the switches, I elected to use argv[2] with /S/T. /S is for subdirectory searches to be included and /T is for totals to be printed. For output, my main routine would call an external function subfunc(char *path) with a path pointer as an argument. I decided a maximum of 127 subdirectories per directory would be the upper limit of the stack size.
With some trial and error, I have molded the code in Listing 1. After the first attempt at coding this, I realized that recursive functions must use the heap (malloc) and not the stack (auto variables) for array storage. A static array will not work for this application because a new array is needed at each recursive function call entry. For example, in Listing 1 after the main I have declared char *path and then malloced out space with path = malloc(_MAX_PATH); this uses the heap for my path array data storage. My first attempt used the stack with a declaration such as char path[_MAX_PATH].
The first line of Listing 1 declares main to return an int. You may be surprised to see main() return something other than void. This int returned keeps track of the total as it transverses the subdirectories.
The code fragment shown in Listing 2 deserves a more detailed explanation. Both _dos_findfirst and _dos_findnext return a 0 if a match is found. My if statements use the logical negation of the return value. If there is a match, it will evaluate to TRUE. The _dos_findfirst needs the _A_SUBDIR flag argument to include subdiretory names in its search. If a match is found the file info block (fib) is filed in. This structure is defined in DOS.H. Next, I check to make sure the subdirectory is not one of the two special MS-DOS subdirectories . or .. and that the matched name is in fact a subdirectory. To see if it is a subdirectory name, I look at the file info block attribute and test it for being a subdirectory. The bitwise & has a higher precedence then the logical && so it requires no parentheses. If I do have a valid subdirectory, its name is pushed into the FIFO buffer. This continues until I have no more directory entries or until I have no more room in my FIFO subdirectory buffer. Next, I terminate the FIFO with a NULL and reset my FIFO pointer to the start of the FIFO.
Now while there is more subdirectories, I pop one out and recursively search it with the
total += main(argc, argv);call.
Alternative Implementations
I have also implemented this recursive main as a recursive function int recurs(). It takes as arguments, a character string for the wildcarded file names to match and two Boolean flags (see Listing 6) . These flags are for subdirectories to be included in the search and/or totals to be printed out, which allows the command line argument logic to be contained in a void main(). A more elegant approach to command line arguments would be a variation of [9].I have developed and tested Listing 1, Listing 3, Listing 4, and Listing 6 on a PC clone with an INTEL 386 CPU and IBM-DOS v3.30 using Microsoft C v5.1. Listing 3, Listing 4, and Listing 5 have developed and tested on an AMIGA 1000 with a Motorola 68000 and AMIGA-DOS v1.3 using MANX AZTEC C v5.0.
The only trouble porting was due to non-ANSI functions that were machine/operating system/compiler dependent.
Here are a few examples of how I have incorporated this wildcard subdirectory search: a carriage return, line feed corrector for porting ASCII source to and from IBM-DOS. I have also used Listing 1 in a zaptime utility to zero the time stamp on existing files. I have modifed Listing 1 to remove directories that are empty. I have used Listing 6 to modify the SGREP source supplied from [10] along with adding the feature of the output being to a file with a ~ prepended to the extension. This allowed me to convert over 700 C source files in over 33 subdirectories in less than 15 minutes. From the following two line matches
#include <proto.h> #include "proto.h"to
#include <libproto.h> #include "locproto.h"which was much faster than the time it took to put the prototype includes in the first time.
Bibliography
[1] Plauger, P. J. "Library Ground Rules". The C Users Journal, August 1990.[2] Microsoft C 5.1. Run-Time Library Reference, 1987.
[3] Manx Software Systems. Aztec C Reference Manual Version 5.0, 1989.
[4] Commodore Business Machines. AMIGA ROM Kernal Reference Manual: Exec, Addison-Wesley, 1986.
[5] Berry, John Thomas. Inside the AMIGA with C, Howard W. Sams, 1988.
[6] Peck, Robert A. PROGRAMMER's Guide To The AMIGA, SYBEX, 1987.
[7] Anderson, Rhett and Randy Thompson. "MAPPING the AMIGA," Compute!, 1990.
[8] Kernighan, Brian W. and Dennis M. Ritchie. The C Programming Language, Prentice Hall, 1978.
[9] Colner, Don. "An Object-Oriented Approach to Command Line Options." The C Users Journal, July 1990.
[10] C USERS GROUP disk #236, Highly Portable Utilities (CUG Starter Disk).
Table 1