July 2002/Identifying and Managing Cyclic Physical Dependencies/Sidebar

Graphs and Graph Traversal

A graph is “a set of nodes together with a set of edges that connect pairs of distinct nodes with at most one edge connecting any pair of nodes” [5]. In a connected graph, a path exists between every node. In a directed graph, the edges are drawn as arrows to indicate that one node logically follows another node. A cycle is a path in a directed graph that starts and ends at the same node.

A graph can be represented as data in two ways: 1) as a two-dimensional matrix, and 2) as a set of lists. As a matrix, the node identifiers are presented as the row and column indices. Any two nodes are connected if the intersection of the row and column is non-zero. (The diagonal is non-zero to indicate that every node is connected to itself). Figure 4 shows a matrix graph of the files composing the project as it exists in Figure 3. The matrix is asymmetrical to indicate that it is a directed graph. In this representation, a file in a row includes (depends on) a file in a column if the intersection is non-zero. (Note: UML diagrams typically depict logical components with each node representing all of the files that make up the translation unit. However, the output of this program is such that each file has its own list of dependencies and its own paths, so the graph nodes are individual files instead of translation units.)

As a set of lists, each node is presented as a member of a set, and each of these nodes is associated with a list of the nodes to which it is connected. (The output of the make depend utility uses this representation in a makefile to show all of the dependencies of each of the object files.) The matrix representation is more efficient with dense data, and the set/list representation is more efficient with sparse data. I chose the set/list representation because it lends itself more easily to recursive traversal, and the concept of directed edges is more intuitive than in a matrix. Figure 5 shows the set/list representation of the project as it exists in Figure 3. Each file in the set is shown, followed by a list of files that are included in it.

A graph is traversed recursively using either a breadth-first search or a depth-first search. Both methods eventually result in all nodes in the graph being visited. The breadth-first search starts with a node and visits all of the nodes to which it is connected and then performs the same on each of connected nodes. A depth-first search starts with a node, visits one of its connected nodes, then visits one of that node’s connected nodes, etc., until it reaches a node with no other connection. The unvisited connecting nodes are visited on the way back up the path. I chose a depth-first search because the path is printed as the subsequent nodes are visited from its start node all the way up the path until it reaches a dead end or a cycle.