Mike Gilson develops C software for VAX/VMS, UNIX, and MS-DOS platforms, for the Allen-Bradley Co. He can be reached at Internet address gilsonm@edn.lcd.ab.com.
make, a tool available for virtually every operating system, speeds the building of systems by recompiling and linking only those elements of a program that have had source code changes since the program was last built. This article presents a small version of the make utility based on a graph algorithm called Depth-First Search. Figure 1 contains a directed graph, or digraph, that represents the dependencies of the files which comprise hello.exe. (The term directed graph, or digraph, refers to the one-way relationship between the files. An executable file is built from the source code. The source code is not built from the executable.)
How make Works
In make terminology, hello.exe is referred to as the target and the object files are its dependencies. make must determine whether hello.exe is up to date with respect to its dependencies. make also verifies that the object files are up to date with respect to their dependencies. hello.exe is newer than either of the object files, making it current with respect to its dependencies. main.c has been modified since main.obj was rebuilt, so main.obj must be rebuilt, making it newer than the target, hello.exe. So make rebuilds the target. make does not rebuild hello.obj since its last change occurred before the last change to the target.
Using a Makefile
A makefile contains statements that describe the file dependencies and the commands required to build objects and executables. A programmer uses a makefile to translate the graph in Figure 1 into something meaningful to the make program.Listing 1 contains the makefile for hello.exe. hello.exe, the target, begins in column 1 of the makefile, followed by hello.exe's dependency list on the same line after the colon. The next line links the two files, creating the hello.exe executable. The third line copies the executable to another directory.
Program Structure
This implementation of make contains four parts: the main function, the command-line parser, the makefile parser, and the executive. The main function (Listing 2) calls the makefile parser and the executive and then exits. The fatal_error routine prints an error message, when necessary, then exits the program.The command-line parser processes any arguments that are passed to the program when invoking make. This program allows the user to pass two arguments to make: -f (filename) and -t (target). -f allows the user to specify an alternative to the default filename, Makefile. -t specifies which target to build. The program creates the default from the first target label appearing in the makefile (in this case hello.exe).
The makefile parser reads the makefile and stores the dependencies in a data structure called an adjacency list. An adjacency list is a two-dimensional linked list. The vertical dimension is a linked list of all n vertices in the graph, where each vertex is a target or dimension: its dependency list and its command list. Either or both of these lists may be null lists. Figure 2 shows an adjacency list of hello.exe and its dependencies. proto.h is an example of a vertex with null dependency and command lists.
Listing 3 shows the declarations of the three structures used to implement the adjacency list representation of the digraph of a makefile. These structures are TARGET, DEP, and CMD. They correspond to each target and its lists of dependencies and commands, respectively.
The executive traverses the target's dependency digraph, rebuilding files as necessary. Using the adjacency list, the executive systematically examines the modification dates of files on which the target depends. The program rebuilds out-of-date files based on their command lists.
Command-Line Parser
parse_options (Listing 4) directs that all options are preceded by a minus sign and followed by an argument. The option flag and argument are separated by whitespace. An example command line when using the makefile in Listing 1 might be
make -t main.objIt specifies that the target main.obj be built (if necessary). The next command line redefines both the makefile and the target:
make -f testing.txt -t some.objThis command line renames the makefile to testing.txt, opens it, and requests the target some.obj be built by parse_makefile.
Makefile Syntax
You must understand the makefile syntax before you can parse the makefile. In this implementation of make, targets always begin in column 1 and are followed by whitespace and a colon. The colon may optionally be followed by a list of dependencies. The list is terminated by a newline character. The format for a target line is
target<whitespace>:[<dependency>][<depenecy>]...<\n>Dependencies may be other targets or filenames, but are always preceded by whitespace. The format for a dependency is
<whitespace><label | filename>Targets may be followed by a list of commands which specify how to build the label. Each command is preceded by whitespace and terminated by a newline character. The list of commands is terminated by the next target in the makefile or the end of file character. The format for a command line is
<whitespace>[<shell command>]<newline>Notice that a shell command is optional. This allows for blank lines in the makefile to improve readability.(In Figure 1, proto.h does not need to be rebuilt, so it has no commands. In fact, it does not need to appear as a target at all. It is included merely for completeness.)
Parsing the Makefile
Listing 5 shows the routines used to parse the makefile and build the adjacency list. add_target, insert_cmd, and insert_dependency are straight-forward, linked-list insertions. In insert_cmd, the order of commands in a command list must be preserved, so commands are always added to the end of the list. In the makefile in Listing 1, hello.exe is copied to another directory after it is built by the link command. insert_cmd preserves this order. skip_ws is used to skip the whitespace that precedes a command-shell command.parse_makefile receives three arguments from main: argc and argv, and the address of a character buffer that will receive the name of the target. parse_makefile uses these arguments to determine whether the user specified a makefile or target on the command line. If not, the defaults are used. The default filename for the makefile is makefile. The default target is the first target found in the makefile. The program calls parse_options to process the command line arguments. parse_options copies the user-defined makefile name and target into parse_makefile's buffers.
After calling parse_options, parse_makefile opens the appropriate makefile, then reads and processes each line of the makefile. If the first character is whitespace, it considers the line a command line, otherwise it must be a target. Since a command line must be preceded by a target, the adjacency list must be non-empty, that is, head must not be NULL. If head is NULL, the program reports a fatal error, then exits. If head is not NULL, the program adds the command to the current target's command list by a call to insert_cmd.
The program processes a target line by repeated calls to strtok and inserts the resulting token into the appropriate list. The first call to strtok returns the target, which is inserted in the adjacency list by a call to add_target. Before performing an insertion, add_target checks for duplicate targets. A duplicate target results in a fatal error. The target token should be followed by a colon token, then (optionally) by a list of dependencies. Dependencies are added to the target's dependency list by a call to insert_dependency. When finished parsing the makefile, parse_makefile returns the address of the head of the dependency list.
Executive
The executive, Listing 6, builds the target. process_dependency starts the execution process by calling search_target_list to return a pointer to the target in the adjacency list. search_target_list, a simple linked-list search routine, returns NULL if it doesn't find the target. The program will then report a fatal error and exit. If search_target_list finds the target, the program calls build_target.build_target, the heart of this make program, is a recursive algorithm based on the Depth-First Search (DFS) algorithm. DFS is used to search both graphs and digraphs. (See [1] in references.)
In the structure definitions in Listing 3, the TARGET structure has a enum member, color.build_target uses this field to determine which vertices have been visited. WHITE indicates that a vertex has not been visited; GRAY that the vertex is in the process of being visited; BLACK that the vertex has been visited sometime in the past. add_target initializes this field to WHITE when building the adjacency list. build_target changes the color to GRAY and eventually to BLACK.
The build_taget visit to a vertex can be logically divided into four parts: initialization, checking the vertex's dependency list, executing the command list, and cleanup. During initialization build_target sets the color of the vertex it is visiting from WHITE to GRAY. It also saves the timestamp of the vertex's target in two variables: newest_t and current_vertex_t.
build_target then checks the vertex's dependency list, a linked list, by stepping through from the head of the list until it reaches a NULL pointer. It then searches the list of targets to determine whether the dependency is another target or simply a file. As noted in the case of proto.h above, some files may appear as dependencies and not as targets when no commands are needed to build them. If it is a dependency, it must have one of three colors: WHITE, GRAY, or BLACK. If r->color is WHITE, build_target visits r by a recursive call. If r->color is GRAY, then r is currently being visited by an earlier call to build_target, meaning the current vertex is a dependency of r. GRAY indicates a recursive dependency, since r is a dependency of the current vertex, by being in the current vertex's dependency list. This causes the program to report a fatal error, then exit. (Listing 7 is an example of a makefile with a recursive dependency.) If r->color is BLACK, r has been visited before and its timestamp is taken by a call to stat.
build_target must ensure that the current vertex's file is newer than the files of its dependents and all their dependents, etc. If not, it must rebuild the current vertex. It must then compare the timestamp for the current vertex, saved in current_vertex_t, to that of its dependents. At the beginning of the visit, it assumes the current vertex is newer than its dependents, so its timestamp is also saved in newest_t. During a recursive call to build_target, a newer dependent could be found and build_target writes this dependent's timestamp into dependent_t. After a return from a recursive call to build_target, make compares dependent_t to newest_t. If it is newer, the program assigns
newest_t = dependent_t;The make program repeats this comparison after every recursive call to build_target, ensuring that the final value of newest_t is the timestamp of the newest file among the current vertex and its dependents. Using this information, build_target performs the comparison
if (newest_t > current_vertex_t) { exec_cmd_list(p->cmdlink); if (FNFERR == (stat(p->target, &statbuf)) fatal_error("target not built"); newest_t = statbuf.st_mtime; }In other words, if any of its dependents are newer than the current vertex, the program rebuilds the current vertex, takes a timestamp of the rebuilt file, and assigns that to newest_t. Otherwise, the current vertex is already the newest file, so the program assign its timestamp to newest_t.During the cleanup phase the program sets the color of the current vertex to BLACK, indicating it has been visited, and it reports its timestamp to the calling routine.
Extensions
The implementation of make presented here is a small subset of make functionality. Simple extensions to this program include: macro definition and expansion, continuation of target line to allow long dependency lists, printing (not executing) the commands, printing the dependency list, and allowing make to search for files along a directory path if not found in the current directory.This program was developed using Borland C++, MS-DOS 5.0 and Windows 3.0. However, the code is portable to systems with ANSI C compilers. The header file stat.h may be in a different location on other systems, which requires a change to the #include directive in the executive (Listing 6) of
#include <sys\stat.h>Listing 8, Listing 9, and Listing 10 comprise the source code for the hello.exe program that has been used as an example throughout this article.
References
[1] Carmen, Leiserson, and Rivest. Introduction to Algorithms. The MIT Press, 1990.[2] Knuth, Donald E.. The Art of Computer Programming, Vol. 1. Addison-Wesley, 1973.
[3] The SunOs Reference Manual, Vol. 1., Sun Microsystems, Inc., 1989.