Mike Cornelison is a systems programmer and occasional consultant working in Santa Clara, CA. He may be reached through the internet at micro@Ix.netcom.com, by phone at 408-970-5294, or by fax at 408-942-9180.
Introduction
Have you ever had to switch to a new software development environment, only to find that some of your favorite tools were missing in the new environment? This has happened to me several times (I have been programming since 1966). Most recently, I moved from VAX/VMS to Microsoft Windows and Windows/NT, and I found myself missing the powerful VMS string matching and file searching functions.The VMS string compare function (str$match_wild) can handle any number of wildcard characters, and find any possible match with a candidate string. This function recognizes the wildcard character '*', which matches any number of characters (including zero), and '%', which matches any single character. For example, the wildcard string "*abc*def*gh%" would match all of the following candidate strings: "abcdefghi", "XXabcXXdefXXXghX", and "abcXdefXXghiXXXabcdefghi" (the last one is subtle).
The VMS directory search function (lib$find_file) can handle the same wildcard notation, in both the directory tree being searched (the path), and in the file name being searched for within the path. In addition, the notation "..." may be used to indicate an arbitrary number of subdirectories within the path. For example, the directory and file specification "[aa*...*bb...]ccc.*" means: search all directories with names begining with "aa", and their subdirectory trees, for directories with names ending in "bb". Search these directories, and their subdirectory trees, for file names matching "ccc.*".
As it turns out, it is not very difficult to make similar tools for the Windows and NT environments. That is the subject of this article.
MatchWild Function
This function compares a candidate string to a wildcard string one containing any number of the wildcard characters '*' and '?'. The function returns TRUE if a match is found, or FALSE if not (See Listing 1) . MatchWild makes no use of system libraries, hence it will work in any system having a C compiler.The logic works as follows: the two strings are scanned in parallel. For each segment between '*'s in the wildcard string, MatchWild must be able to find a corresponding matching segment in the candidate string. If a given comparison fails, and the wild segment began with '*', then the function may still search for a matching segment later in the candidate string. This fairly simple logic will discover any possible match.
My first version of MatchWild used recursion and avoided the unfashionable goto. It was, however, no simpler than the current version, and likely much slower. Some benchmark execution times are included in Listing 1.
The SearchWild Function
This function (Listing 2) searches a directory tree for a desired file or set of files. The directory tree is specified as a path name, using the traditional notation of DOS, Windows, and Windows/NT. SearchWild's objective is to find all possible files in all possible paths that match a given path and file name, where both the path and file names may contain the wildcards '*' and '?'. This function also allows a third wildcard notation, the VMS-style "..." to indicate any number of nested subdirectories. The path and file name notation is best clarified with an example:
d :\aaa*...\*bbb...\cc*.d??This means: disk drive d, all top level directories matching "aaa*", all underlying subdirectories matching "*bbb" (with any number of in-between subdirectories), all underlying files matching "cc*.d??" (again with any number of in-between subdirectories). The following two files would match this specification:
d:\aaa\bbb\cc.d12 d:\aaaxx\xxxx\yyyy\xxbbb\xxx\ccxx.d23At first glance, SearchWild seems simple to implement. After all, DOS, Windows, and Windows/NT all offer basic directory search functions (FindFirstFile and FindNextFile) which are capable of some wildcard handling. Specifically, these functions accept wildcards in the last name of a path, which may be a file name or another directory name. A program could use these functions to iteratively search down several levels of directory names containing wildcards, one level at a time. Using the above example, the search would start with "d:\aaa*" to find the desired top-level directories. Within each directory found (e.g. "aaaxx"), the search would then progress to directories and files at the next level down, (e.g. "d:\aaaxx\*") and these could be matched to the next desired name "*bbb", and so on. It would all be easy, if not for the "..." notation. Implementing this last feature requires a more sophisticated approach.SearchWild uses recursion to make the messy logic into something almost simple. It's entire logic is summarized as follows:
1. Replace any "\xxx...\" notation with the equivalent "\xxx\...A" ("xxx") may also contain the simpler wildcards '*' and '?')
2. If no wildcards are found except in the last name (file name), then use the OS-provided search function to find the files and return all of them to caller. Done.
3. Truncate the path name after the first name having any of the wildcards '*' or '?' or "\...\"
4. If the wildcard is not "\...\"
a. Call the OS search function to get all matching file names at this level b. Substitute each of these names for the wildcard name, and append remaining path\file names truncated from step #3 above c. Call SearchWild with each of these path\file names, return all found files to caller. This is a recursive call, since SearchWild calls itself5. If the wildcard is "\...\"
a. Replace "\...\" with "\". ("\...\" can mean zero or more levels of subdirectories) b. Call SearchWild (recursively) with this name, return all found files to caller c. Replace "\...\" with "\*\...\" d. Call SearchWild (recursively) with this name, return all found files to callerSearchWild returns one file per call until no more files are found, then it returns NULL. The logic depicted above is realized by going back to the current position in the code, after each new entry from the caller.The recursive calls to SearchWild result in efficient execution since only the necessary directories are searched, and no others.
I have tested the code in Listing 2 for Windows/NT. It should work for DOS or Windows 3.1, with only minor adjustments. Note that the OS functions FindFirst/Next support multiple search contexts, which is necessary for this method to work. SearchWild does not support multiple contexts. This could be done, using dynamic allocation of memory for each new context. If the caller abandons a search before it is completed, memory can be lost. Hence, another call type is needed, to allow the caller to abandon a search and recover the dynamic memory.
I welcome your questions or suggestions. Please contact me at the phone number or e-mail address shown in my bio.