Edward has been developing software professionally since 1983. Currently he is working as a team leader for a company that develops software for the financial community. Ed can be reached at 4830 S. Salida Ct., Aurora, CO 80015; or through CompuServe 72617,2624.
The graph is one of the more versatile data structures. In its simplest form, a graph consists of a collection of vertices connected by edges, and a cost or weight is often associated with each edge. An example use of such weighted graphs can be found in the classic "Shortest Network" (also known as the "Traveling Salesman") problem whereby the goal is to find the shortest way to connect all of the vertices of a graph. Given a list of cities, for example, design the shortest possible highway system that will connect them.
The key to solving this problem is to create imaginary vertices in the midst of the real ones. These imaginary vertices are called "Steiner" points after their inventor, the nineteenth-century mathematician Jakob Steiner. The equilateral triangle in Figure 1(a) provides a good demonstration of the value of Steiner points. Assuming that each edge is 100 units long, without Steiner points it takes a total of 200 units to connect the three vertices. However, it takes only 175 units to connect the vertices if a single Steiner point S is placed in the middle of the triangle and the edges are redrawn so A, B, and C each have a single edge to S. Figure 1(a) shows the original graph, and Figure 1(b) shows the graph after adding the S vertex and redrawing the edges.
With a bit more work, a solution can be found that connects the vertices using only 173.2 nits. (It can, in fact, be shown that for any equilateral triangle the shortest path is N multiplied by the square root of 3 where N = length.) Graphs have a wide variety of other applications. Their uses range from modeling complicated network-flow systems to simply representing sets of related elements.
An important property of graphs is that it is not necessary for all of the vertices to be connected. A collection of vertices that are connected to each other is called a "connected component," while a series of edges that allow one vertex to reach another is called a "path." Thus, a graph is made up of a collection of connected components, which are in turn made up of a collection of connected vertices. Figure 2(a), Figure 2(b), and Figure 2(c) each show a single graph.
From these figures, it is apparent that connected components provide a natural representation for disjoint sets (none of the elements in one set appear in any other set). In addition, indirect relations between set elements are also represented. Figure 2(a) provides a good example of this. In this graph, both vertex 1 and vertex 7 are directly related (that is, have an edge to) vertex 5. Although vertices 1 and 7 are not directly related, it is possible to determine that both belong to the same set because a path exists that connects them.
A variety of common questions can be asked of disjoint sets:
At my workplace, graphs are used in just this fashion. Our processing system creates databases where sets of related objects are grouped together. The first section of the system generates a large, complex graph. It does this by comparing two objects at a time, outputting the pair of object numbers if the objects are determined to match. In this fashion, a graph is generated one edge at a time. Because the databases often have over 1,000,000 objects, the graphs generated by the system often have over 1,000,000 vertices and are comprised of several million edges. The only way the system can represent such a huge graph is by saving in a file the list of the graph's edges. We then use this huge graph to determine what sets of records should be grouped together (question 8 above). This is where a problem arises.
The problem is to find an efficient algorithm for determining if a path exists that connects a pair of vertices for all possible pairs of vertices (and do it before the sun burns out or the entire universe collapses into a black hole).
Using the graph shown in Figure 2(c) as an example, it is easy to answer this question for vertices 1 and 2 because the vertices are directly related by the edge (1, 2). However, answering this question for the next pair of vertices, 1 and 3, is much more difficult. To determine if a path exists that connects this pair of vertices, you must this time "traverse" several of the edges in vertex 1's connected component. In the case of the next pair of vertices, 1 and 4, verifying that a path does not exist requires one to traverse all of the edges in vertex 1's connected component. Algorithms used to answer this and other questions of connectivity are known as "union-find" algorithms.
Warshall1 developed an algorithm that requires only a single operation to determine if a path exists that connects a pair of vertices. It works by loading the graph into a Boolean "adjacency matrix" and then finding the "transitive closure" of the matrix. (Refer to Sedgewick (1988) for a complete discussion of adjacency matrices and transitive closure.) Figure 3(a) illustrates the adjacency matrix for the graph shown in Figure 2(c). Figure 3(b) shows the adjacency matrix after the transitive closure has been found.
111 111
123456789012 123456789012
1 X 1 XX XX
2X XX 2X X XX
3 X 3XX XX
4 XX 4 XX
5 X X 5XXX X
6 XX X 6XXX X
7 X X 7 X X
8 X X 8 X X
9 9
10 10
11 X 11 X
12 X 12 X
3a 3b
With the finished matrix, each vertex has an edge to all of the other vertices in the same connected component. For example, the adjacency matrix shown in Figure 2(b) now depicts the edge (1, 3). Thus, it now only takes a single operation (if table [1,3] = X ... ) to determine if two vertices are in the same connected component.
There are two significant drawbacks with this algorithm, however. The first is that there is a tremendous amount of overhead in finding the transitive closure of the graph. This is done with the brute-force method shown in Example 1.
for y:= 1 to N do
for x:= 1 to N do
if matrix[x,y] = TRUE then
for i:= 1 to N do
if matrix[y,i] = TRUE then
matrix[x,i] = TRUE;
From the three nested loops, it is apparent that this process is O(N3) (where N indicates the number of vertices). (Refer to Sedgewick (1988) for a complete discussion on "Big O-Notation".) The second drawback is that the matrix requires an enormous amount (N2) of memory. Even if a matrix of bits is used, the quadratic expansion of the matrix rapidly makes it impractical for use with larger graphs. Table 1 illustrates this.
Number of Verticies Memory Requirements
----------------------------------------
12 18.00 Bytes
1 thousand 125.00 Kbytes
10 thousand 12.50 Megabytes
100 thousand 1.25 Gigabytes
1 million 12.50 Terabytes
A second common solution is to convert the graph into a "forest" of "spanning trees." Each spanning tree represents a single connected component of the graph. Both Prim2 and Krusk l3 have invented algorithms for finding "minimum" spanning trees in a weighted graph. Simpler forms of either of these algorithms can be used for finding the spanning trees of a conventional graph. Worst-case traversal times of the trees can be substantially reduced by using techniques of weight/ height balancing and path compression on the trees. Although the overhead operations of these algorithms are more complex than Warshall's, both execute significantly faster. In addition, the forest of spanning trees will require much less memory. Even so, graphs 1,000,000 vertices in size could easily consume tens of megabytes of memory with this solution.
Because of the tremendous size of the graphs involved, it was apparent that both of these existing solutions would require a virtual memory mechanism. It was also apparent that, even with a cache mechanism installed, neither solution would have an acceptable execution time. It was time to develop a new algorithm.
For the time being, I decided to ignore implementation issues and focus solely on the high-level algorithm. In doing so, I had a working solution relatively quickly. The new algorithm I developed requires only one pass through the file. In addition, neither sorting nor ordering of the vertices is required. Example 2 shows the high-level pseudocode. As is often the case, the algorithm itself is disarmingly simple.
1. Read the edge (A,B)
2. Determine if the A vertex has been seen before.
3. Determine if the B vertex has been seen before.
4. BRANCH
a. Neither seen before = create a new set.
b. Only A seen before = append B to A's set.
c. Only B seen before = append A to B's set.
d. Both seen before = BRANCH
1. Determine if both vertices are already in the same
set.
2. BRANCH
a. Both in same set = do nothing.
b. Each in different set = merge the two sets.
5. If not EOF, goto 1
The key section of this algorithm is determining if both vertices are already in the same set. The entire graph must be kept in memory if this is to be done with any kind of efficiency. Thus, my new algorithm was faced with the same issue as the existing ones -- that is, how to efficiently represent the graph in memory. The challenge of this approach ended up being in the implementation of the algorithm, not in its development.
One of the most common data structures used to represent a graph is the adjacency list. With adjacency lists, each vertex is the head of a linked list. The linked list contains all of the vertices that are adjacent to (for instance, have an edge directly to) the vertex at the head of the list. Arrays are often used to store the heads of these linked lists, thus allowing for very efficient lookups for a given vertex. Figure 3 illustrates the adjacency lists for the graph shown in Figure 2(c).
The appeal of this structure, in addition to its simplicity, is its efficiency in some cases. To determine if two vertices are adjacent, you just have to follow the first vertex's adjacency list until either the second vertex or the end of the list is found. However, this structure is inefficient for the more general task of determining if two vertices are in the same connected component. This is because the adjacency list of each vertex encountered must also be searched. In this case, spanning trees offer better efficiency both in terms of lookup times and memory usage. This fact only added to my surprise when I found a solution based not on spanning trees but, rather, on the humble adjacency lists.
I redrew the adjacency list for one of the connected components of the graph in Figure 2(c), this time vertically lining up the vertices, see Figure 5(a). Then I realized that if each linked list was "overlaid" on the other and the Nil pointers canceled out, a circular linked list resulted (see Figure 5(b)).
The significance of this is that a circular linked list can be represented as an array. Further, because each overlaid adjacency list represents a disjoint set, all of them can be implemented in the same, single array. Figure 6(a) shows the entire graph from Figure 2(c) represented as a series of overlaid adjacency lists. Figure 6(b) illustrates the array implementation of the structure.
Now that the entire graph array could be efficiently represented in memory, the only issue left was how to determine which vertices had been seen before. Once again, the array provided the answer. Note that array[9]and array[10] are empty. Because vertices 9 and 10 have not been encountered, their array positions have never been filled in. Thus, it can be determined if a vertex has been seen before by simply seeing if the vertex's array position is empty or not. With this final mechanism in place, a detailed design of my algorithm's implementation could now be done.
Armed with this new data structure, designing the implementation of my algorithm was almost trivial. Example 3 shows the detailed pseudocode. Because of the loop in section 4d1, it is apparent that this algorithm is O(N2) in the worst case.
ASSUME: Max Vertex Value = 8
"Empty" = 0
array[1..Max Vertex Value] has been filled with "Empty"
1. ReadEdge(A,B)
2. if (array[A] <> "Empty") then A_Seen = TRUE else A_Seen = FALSE.
3. if (array[B] <> "Empty") then B_Seen = TRUE else B_Seen = FALSE.
4. BRANCH
a. if (NOT(A_Seen OR B_Seen)) then create a new set.
array[A] = B
array[B] = A
b. if (A_Seen AND (NOT B_Seen)) then append B to A's set.
array[B] = array[A]
array[A] = B
c. if (B_Seen AND (NOT A_Seen)) then append A to B's set.
array[A] = array[B]
array[B] = A
d. if (A_Seen AND B_Seen) then
1. Determine if both vertices are already in the same set.
temp = array[A]
while ((temp <> A) AND (temp <> B)) do
temp = array[temp]
if (temp = B) then Same_Set = TRUE else Same_Set = FALSE
2. BRANCH
a. if (Same_Set) then do nothing
b. if (NOT Same_Set) then merge the two sets
temp = array[A]
array[A] = array[B]
array[B] = temp
5. If not EOF, goto 1
To illustrate this algorithm's use, I have included a program that uses the algorithm to determine how many connected components a graph is comprised of (see Listing One). The program accomplishes this by simply incrementing a counter when a new set is created and decrementing the counter when two sets are merged. Although the implementation actually used at my workplace is written in assembly, the listing shown with this article is in Pascal for the sake of clarity and space. For these same reasons, the Pascal version always allocates an array 10,000 vertices in size instead of calculating the size and dynamically allocating the array from the system. Thus, this version of the program can handle only graphs 10,000 vertices in size and smaller.
The assembly language version of the program (see Listing Two) was written with Phar Lap's 386 | ASM compiler and linked with Phar Lap's 386 | DOS Extender. Because of the DOS Extender, this version of the program is capable of allocating arrays of several megabytes, and thus is capable of processing graphs millions of vertices in size. Interested readers can obtain this source listing from DDJ's CompuServe Forum or the listings disks. All of the empirical analysis discussed in the next section was done using the Phar Lap version of the program.
To test my implementation of the algorithm, I created a "worst-case" data set. By far the most expensive section of the algorithm lies in merging two sets. The worst-case data, therefore, should exercise this section as much as possible. The most convenient way I have found to do this is to read a two-way tree from the bottom up. After the bottom level of the tree has been read, each edge of every remaining level will cause two sets to be merged. Conversely, the best case is where a two-way tree is generated from the top down. Figure 7(a) and Figure 7(b) illustrate the worst case and best case, respectively, for a max vertex value of eight.
For my test, I used a max vertex value of 220 (1,048,576). The resulting two-way tree generated 220-1 (1,048,575) edges, 219-1 (524,287) of which would cause sets to be merged. In order to isolate the algorithm's performance from that of the disk I/O, I did two things: I wrote an I/O routine that generated the edges directly instead of physically reading them from disk, and I timed the execution of just this phony I/O routine. By subtracting this time from the total execution time of the program I was able to isolate the execution time of the algorithm.
I ran the test on four different 80386 machines, using the 32-bit protected-mode version of the program. Table 2 summarizes the machines' configurations and my algorithm's performance on each (all times are in seconds).
Brand Speed Cache I/O Algorithm Total execution time ------------------------------------------------------------ Generic 16MHz 64K 6.48 29.06 35.54 seconds Generic 20MHz 0K 6.37 15.82 22.19 seconds ALR 25MHz 64K 3.74 11.97 15.71 seconds Compaq 33MHz 64K 2.72 9.37 12.08 seconds
As a final test, I created an actual file of the two-way tree of 220-1 edges and timed the execution of the entire program. For this test I chose the 20-MHz machine, which had 8 Mbytes of RAM and a 16-millisecond Wren VI hard disk. Even including all disk I/O time, the program processes over 1,000,000 vertices in less than 30 seconds. Although my new algorithm was O(N2) in the worst case, it proved to be a fast N2.
I named the algorithm Graph Array Decomposition because an array is key to the implementation. With this algorithm, you can quickly find answers to common connectivity questions such as those identified earlier. It differs from most other union-find algorithms in several respects:
M.W. Bern and R.L. Graham, "The Shortest Network Problem," Scientific American, January, 1989, 84-89. Bern and Graham provide an excellent introduction to the Shortest Network problem in this article. They include a brief history of the developments in this area as well as discussing derivations of the problem.
R.E. Sedgewick, Algorithms, second edition, Addison-Wesley, Reading, Mass., 1988. Sedgewick manages the impressive feat of covering many classic algorithms and problems without burying the material in academia. His text is as readable as it is thorough. I highly recommend it to anyone serious about software development.
G. Brassard and P. Bratley, Algorithmics: Theory & Practice, Prentice-Hall, Englewood Cliffs, N.J., 1988. Brassard and Bratley take a much more mathematically oriented perspective in their descriptions and analysis of algorithms. With this perspective, algorithms and their analysis are pursued in the depth and detail one usually finds in formal papers in the field of computer science.
_GRAPH DECOMPOSITION_ by Edward Allburn[LISTING ONE]
(***************************************************************************
* GAD.Pas
* Program: GAD.Exe
* Author: Edward Allburn (September, 1990)
* Compiler: Turbo Pascal 5.0
*
* Description: This program demonstrates the Graph Array Decomposition
* algorithm described in the JAN '91 issue of "Dr. Dobb's Journal".
* It uses the algorithm to determine how many connected components
* a graph is comprised of. Both this program and the algorithm it
* demonstrates are released into the public domain.
*
* Usage: GAD NNN
* Where: NNN = Max vertex value of graph (ok if > than actual max val).
*
* IN Files: "GAD.In" - List of edges that make up the graph. Each vertex
* of an edge is a 4-byte DWORD. Thus, the total
* record length of each edge is 8 bytes.
* OUT Files: None.
*
* Abbreviations:
* "Garray" - Refers to the array used to hold the graph.
* "OAlist" - Refers to a single "Overlayed Adjacency List" in the Garray. Each
* OAlist corresponds to a single connected component of the graph.
***************************************************************************)
Program Graph_Array_Decomposition_Demo;
uses Dos;
const
cMaxVertex = 10000; (* Graph must have less than 10,001 vertices.*)
cEmpty = cMaxVertex + 1; (* Use invalid vertex value as "empty" flag. *)
type
tVertex = longint; (* Vertices are 4-byte values. *)
tEdge = record (* An edge is comprised of 2 vertices. *)
a,
b :tVertex;
end;
var
in_file :file of tEdge;
edge :tEdge;
Garray :array[0..cMaxVertex] of tVertex;
max_vertex,
temp,
a,b :tVertex;
A_Seen,
B_Seen,
Same_Set :boolean;
total,
result, i :integer;
begin
(* Print the title. *)
writeln('GAD.Exe 1.0 Copyright (c) 1990 by Edward Allburn');
writeln('------------------------------------------------');
(* Get the max vertex value from the command line. *)
val(paramstr(1), max_vertex, result);
if (result <> 0) or (paramcount <> 1) then begin
writeln(' This program demonstrates the Graph Array Decomposition ');
writeln('algorithm described in the NOV ''90 issue of "Dr. Dobb''s Journal".');
writeln('It uses the algorithm to determine how many connected components ');
writeln('a graph is comprised of. Both this program and the algorithm it ');
writeln('demonstrates are released into the public domain. ');
writeln;
writeln('Usage: GAD NNN ');
writeln('Where: NNN = Max vertex value of graph (ok if > actual max val).');
halt(255);
end
else if max_vertex > cMaxVertex then begin
writeln('Max vertex valued allowed is ', cMaxVertex);
halt(255);
end;
(* Initialize array & open file. *)
total := 0;
for i:=0 to cMaxVertex do Garray[i] := cEmpty;
assign(in_file, 'GAD.In');
reset(in_file);
(* Use Graph Array Decomposition to determine if the graph is connected. *)
repeat
(* Read next edge & determine if vertices have been seen before *)
Read(in_file, edge);
with edge do begin
if (Garray[a] <> cEmpty) then A_Seen := TRUE else A_Seen := FALSE;
if (Garray[b] <> cEmpty) then B_Seen := TRUE else B_Seen := FALSE;
if NOT(A_Seen OR B_Seen) then begin {create a new set.}
Garray[a] := b;
Garray[b] := a;
total := total + 1;
end
else if A_Seen AND(NOT B_Seen) then begin {append B to A's set.}
Garray[b] := Garray[a];
Garray[a] := b;
end
else if B_Seen AND(NOT A_Seen) then begin {append A to B's set.}
Garray[a] := Garray[b];
Garray[b] := a;
end
else begin
{Determine if both vertices are already in the same set.}
temp := Garray[a];
while ((temp <> a) AND (temp <> b)) do
temp := Garray[temp];
Same_Set := (temp = b);
if NOT Same_Set then begin
(* Merge the two sets into a single set *)
temp := Garray[a];
Garray[a] := Garray[b];
Garray[b] := temp;
total := total - 1;
end;
end;
end;
until eof(in_file);
close(in_file);
writeln('Total connected components = ', total);
if total = 1 then
writeln('The graph is CONNECTED.')
else
writeln('The graph is NOT connected.');
end.
(**************************** End of GAD.Pas ********************************)
[LISTING TWO]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; GAD.Asm
; Program: GAD.Exe
; Author: Edward Allburn (September, 1990)
; Compiler: Phar Lap 386|ASM with Phar Lap 386|LINK
; Build Cmds: 386asm GAD.Asm -FullWarn -80386P
; 386link GAD -FullWarn -80386 -maxdata 0
;
; Description: This program demonstrates the Graph Array Decomposition
; algorithm described in the JAN '91 issue of "Dr. Dobb's Journal".
; It uses the algorithm to determine how many connected components
; a graph is comprised of. Both this program and the algorithm it
; demonstrates are released into the public domain.
;
; Usage: GAD NNN
; Where: NNN = Max vertex value of graph (ok if > than actual max val).
;
; IN Files: "GAD.In" - List of edges that make up the graph. Each vertex
; of an edge is a 4-byte DWORD. Thus, the total
; record length of each edge is 8 bytes.
; OUT Files: None.
;
; Abbreviations:
; "Garray" - Refers to the array used to hold the graph.
; "OAlist" - Refers to a single "Overlayed Adjacency List" in the Garray. Each
; OAlist corresponds to a single connected component of the graph.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
ASSUME CS:_CodeSeg, DS:_DataSeg, ES:_DataSeg, SS:_StackSeg
_StackSeg SEGMENT PARA STACK USE32 'STACK'
db 10240 dup (?) ; A 10K stack is more than enough.
_StackSeg ENDS
_DataSeg SEGMENT PARA PUBLIC USE32 'DATA'
; Global Constants
cEmpty equ 0ffffffffh ; Signifies that a Garray element is empty.
cEOF equ 1 ; Signifies that current rec is last in file.
cFatalError equ 255 ; Return code of program when error occurred.
cBell equ 7 ; Misc. string constants...
cCR equ 0dh
cLF equ 0ah
cEndOfStr equ '$'
; Global Data
maxVertexVal dd 0 ; Max vertex value of graph.
inFile dw ? ; File handle of input file.
buffSize dd ? ; File buffer size, in bytes.
bytesInBuff dd ? ; Num bytes of input data in file buffer.
buffEOF db cEOF - 1 ; Contains cEOF when End Of File reached.
_DataSeg ENDS
_CodeSeg SEGMENT PARA PUBLIC USE32 'CODE'
Main PROC NEAR
call ProcCmdLine ; Get the max vertex value from the command line.
call AllocMemory ; Allocate memory for the Garray and file buffer.
call OpenInputFile
call ReadGraph ; Read/Decompose the graph from the input file.
call PrintResults ; Print the total # of connected components in graph.
Main ENDP
mPrn MACRO msg
; Outputs the "$" terminated string to std out.
; in: msg - Is a "$" terminated string.
; out: nothing.
; dstryd: ax, edx
mov ah, 09h
mov edx, offset &msg&
int 21h
ENDM
mHalt MACRO msg, returnCode
; Displays the halt message to the user and halts the program.
; in: msg - Is a "$" terminated string containing the message.
; returnCode - Return code the program is halted with.
; out: nothing.
; dstryd: ax
mPrn &msg& ; Print the halt message.
mov ah, 4Ch ; Halt the program
mov al, &returnCode& ; with the specified return code.
int 21h
ENDM
ProcCmdLine PROC NEAR
; Processes the command line, displaying a help screen if the command line is
; not valid.
; in: nothing.
; out: maxVertexVal - Contains max vertex value of the graph in "GAD.In"
; dstryd: eax, ebx, ecx, edx
jmp #lStart ; Jump past local data to the start of the procedure.
titleMsg db cCR
db'GAD.Exe 1.0 Copyright (C) 1990 by Edward Allburn '
db cCR, cLF
db'------------------------------------------------ '
db cCR, cLF, cEndOfStr
helpMsg db cCR
db'Desc: This program demonstrates the Graph Array Decomposition '
db cCR, cLF
db' algorithm described in the NOV ''90 issue of "Dr. Dobb''s Journal".'
db cCR, cLF
db' It uses the algorithm to determine how many connected components '
db cCR, cLF
db' a graph is comprised of. Both this program and the algorithm it '
db cCR, cLF
db' demonstrates are released into the public domain. '
db cCR, cLF
db cCR, cLF
db' Use: GAD NNN '
db cCR, cLF
db'Where: NNN = Max vertex value of graph (ok if > than actual max val). '
db cCR, cLF
db cCR, cLF
db' IN: "GAD.In" - List of edges that make up the graph. Each '
db cCR, cLF
db' vertex of an edge is a 4-byte DWORD. Thus, the '
db cCR, cLF
db' total record length of each edge is 8 bytes. '
db cCR, cLF
db' OUT: Nothing. '
db cCR, cLF, cEndOfStr
cmdEnd dd ?
#lStart:
; Find the command line in the Disk Transfer Area (DTA).
mPrn titleMsg ; Display the title.
mov ah, 2Fh
int 21h ; Get the current DTA.
mov ecx, 0
mov cl, es:[ebx] ; Determine how many chars were entered on command line.
cmp cl, 2 ; Were less than 2 chars entered on command line?
jl lShowHelp ; Yes, obviously wrong so show help screen.
; Verify that a single, unsigned value was entered on the command line.
add ecx, ebx ; Determine the end of the command line.
mov ds:[cmdEnd], ecx
mov eax, 0
inc ebx ; Skip 1st blank of the command line.
mov ecx, 0
lNextDigit:
; Get the next char & verify that it is a valid digit ['0'..'9'].
inc ebx ; Advance to the next char in the command line.
mov cl, es:[ebx] ; Get the next char.
sub cl, '0' ; Convert the char to a digit.
cmp cl, 9 ; Is this a valid digit?
ja lShowHelp ; No, show the help screen.
; Append the digit to the running total.
mov edx, 10 ; Make room for new digit in 1's position,
mul edx ; by multiplying the total by 10.
add eax, ecx ; Append the digit to the total.
cmp ebx, ds:[cmdEnd] ; Was this char the last one of the command line?
jl lNextDigit ; No, process the next digit.
; Save the max vertex value of the graph & return.
mov ds:[maxVertexVal], eax
ret
lShowHelp:
mHalt helpMsg, 0
ProcCmdLine ENDP
AllocMemory PROC NEAR
; Allocate & initialize memory for the Garray. Then allocate all of the
; remaining memory for use as a file buffer.
; in: maxVertexVal - Contains max vertex value of the graph.
; out: buffSize - Size of file buffer, in bytes.
; FS - Points to the file buffer.
; GS - Points to the Garray.
; dstryd: eax, ebx, ecx, edx, es, edi
jmp #lStart ; Jump past local data to the start of the procedure.
allocStartMsg db ' Allocating/Initializing memory...', cEndOfStr
allocFinishMsg db ' Finished.', cCR, cLF, cEndOfStr
allocFailMsg db ' FAILED! Not enough memory.', cBell, cCR, cLF, cEndOfStr
#lStart:
; Calculate the size of the array needed to hold the entire graph.
; Garray size in bytes = (maxVertexVal + 1) * 4
mPrn allocStartMsg
mov ebx, ds:[maxVertexVal]
inc ebx ; +1 to allow for 0..maxVertexVal.
inc ebx ; +1 again to allow for sentinel.
shl ebx, 2 ; *4 each vertex needs 4 bytes of memory.
; Allocate the array.
mov ah, 48h
shr ebx, 12 ; Convert the array size into 4K (4096) pages.
inc ebx ; Add 1 more page as a safety margin.
int 21h
jc lAllocFailed
; Save a pointer to the Garray & initialize it with "cEmpty".
mov gs, ax ; Save a pointer to the Garray.
mov ecx, ds:[maxVertexVal] ; Load the counter with # vertices to scan.
inc ecx ; +1 to allow for 0..maxVertexVal.
inc ecx ; +1 again to allow for sentinel.
mov es, ax ; Set up ES & EDI to scan from
mov edi, 0 ; the start of the Garray
cld ; forward,
mov eax, cEmpty ; filling it with "cEmpty".
rep stosd ; Fill the Garray.
; Allocate all of the remaining memory for a file buffer.
mov ah, 48h ; Determine how much memory is left.
mov ebx, 0ffffffffh
int 21h
mov ah, 48h ; Claim all of it for the file buffer.
int 21h
jc lAllocFailed
; Save pointer to file buffer as well as determine/save its size (in bytes).
mov fs, ax ; Save a pointer to the file buffer.
shl ebx, 12 ; Convert the 4K pages to bytes.
sub ebx, 8 ; Leave room for 1 extra record.
mov ds:[buffSize], ebx ; Save for future reference.
; Announce that this routine has finished & return.
mPrn allocFinishMsg
ret
lAllocFailed: mHalt allocFailMsg, cFatalError
AllocMemory ENDP
OpenInputFile PROC NEAR
; Opens the input file & loads the first buffer's worth of input data.
; in: Nothing.
; out: inFile - Contains handle of opened input file.
; dstryd: ax, cx, edx
jmp #lStart ; Jump past local data to the start of the procedure.
cNormalFile equ 0
cWriteOnly equ 1
cReadWrite equ 2
inFileName db 'GAD.In', 0
openStartMsg db 'Loading first input file buffer...', cEndOfStr
openFinishMsg db ' Finished.', cCR, cLF, cEndOfStr
openFailMsg db 'Could not open input file "GAD.In".'
db cBell, cCR, cLF, cEndOfStr
#lStart:
; Open the input file & save its handle.
mPrn openStartMsg
mov ah, 3dh
mov al, cNormalFile
mov edx, offset inFileName
int 21h
jc lOpenFailed
mov ds:[inFile], ax ; Save the file handle
; Load the first block of input data into the file buffer & return.
call BuffLoad
mPrn openFinishMsg
ret
lOpenFailed: mHalt openFailMsg, cFatalError
OpenInputFile ENDP
ReadGraph PROC NEAR
; Read the graph from the input file, decomposing it while keeping count of the
; total connected components along the way.
; NOTE: Vertex values greater than the "maxVertexVal" specified on the
; command line are NOT checked for.
; in: buffEOF - Does not equal cEOF.
; GS - Points to the "cEmpty"-initialized Garray.
; out: buffEOF - Equals cEOF.
; GS - Points to the Garray containing the decomposed graph.
; edi - Total number of connected components of the graph.
; dstryd: eax, ebx
jmp #lStart ; Jump past local data & macros to the start of the procedure.
readStartMsg db ' Reading/Decomposing graph...', cEndOfStr
readFinishMsg db ' Finished.', cCR, cLF, cEndOfStr
mReadEdge MACRO
; Reads the next edge (i.e., record) from the file buffer.
; in: bytesInBuff - Contains the number of bytes in the file buffer.
; esi - Buffer pointer to the next edge.
; FS - Points to the file buffer.
; out: buffEOF - Equals cEOF if edge being returned is last in file.
; esi - Buffer pointer to next edge.
; eax - A vertex value of edge.
; ebx - B vertex value of edge.
; dstryd: Nothing.
; Read the next edge, advancing the buffer pointer.
mov eax, fs:[esi]
mov ebx, fs:[esi + 4]
add esi, 8
; If this edge is the last one in the buffer, load the next buffer.
cmp esi, ds:[bytesInBuff] ; Last edge in file buffer?
jb lReadEdgeEnd ; No, just return.
call BuffLoad ; Yes, load the new buffer.
cmp ds:[bytesInBuff], 0 ; Is the new buffer empty?
jg lReadEdgeEnd ; No, just return.
mov ds:[buffEOF], cEOF ; Yes, last edge in file, so set EOF flag.
lReadEdgeEnd:
ENDM
mNeitherSeen MACRO
; Neither A nor B seen before. Create a new OAlist.
; in: GS - Points to the Garray.
; eax - A vertex (NOT seen before).
; ebx - B vertex (NOT seen before).
; edi - Total connected components of the graph thus far.
; out: GS - The Garray has been updated with the new OAlist.
; edi - One more connected component has been added to the total.
; dstryd: Nothing.
mov gs:[eax], ebx ; Point A to B.
mov gs:[ebx], eax ; Point B back to A.
inc edi ; Increment total connected components.
ENDM
mOnlyAseen MACRO
; A seen before, so append B to A's OAlist by doing a standard linked-list
; insertion.
; in: GS - Points to the Garray.
; eax - A vertex (seen before).
; ebx - B vertex (NOT seen before).
; out: GS - The B vertex has been appended to the A vertex's OAlist.
; dstryd: ecx
mov ecx, gs:[eax]
mov gs:[ebx], ecx ; Point B to what A is currently pointing to.
mov gs:[eax], ebx ; Point A to B.
ENDM
mOnlyBseen MACRO
; B seen before, so append A to B's OAlist by doing a standard linked-list
; insertion.
; in: GS - Points to the Garray.
; eax - A vertex (NOT seen before).
; ebx - B vertex (seen before).
; out: GS - The A vertex has been appended to the B vertex's OAlist.
; dstryd: ecx
mov ecx, gs:[ebx]
mov gs:[eax], ecx ; Point A to what B is currently pointing to.
mov gs:[ebx], eax ; Point B to A.
ENDM
mBothSeen MACRO
; If A & B aren't already in the same OAlist, merge their OAlists.
; in: GS - Points to the Garray.
; eax - A vertex (seen before).
; ebx - B vertex (seen before).
; edi - Total connected components of the graph thus far.
; out: GS - The 2 OAlists have been merged into a single OAlist.
; edi - One connected component has been subtracted from the total.
; dstryd: ecx, edx
; Determine if vertex B is already in vertex A's OAlist.
mov ecx, eax ; Save starting place in OAlist.
lTestNextVertex:
mov eax, gs:[eax] ; Get the next vertex in A's OAlist.
cmp eax, ebx ; Is B already in A's OAlist?
je lDoNothing ; Yes, so just return.
cmp eax, ecx ; Have we come full circle thru A's OAlist?
jne lTestNextVertex ; No, so see if the next vertex is B.
; Merge the 2 OAlists by swaping the pointers.
mov ecx, gs:[eax]
mov edx, gs:[ebx]
mov gs:[eax], edx
mov gs:[ebx], ecx
dec edi ; Decrement total connected components.
lDoNothing:
ENDM
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Start of procedure ReadGraph
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
#lStart:
mPrn readStartMsg ; Anounce that the routine has started.
mov edi, 0 ; Initialize total connected components.
lReadNextEdge:
mReadEdge ; Returns A in EAX and B in EBX.
shl eax, 2 ; *4 to convert vertex values to their
shl ebx, 2 ; actual byte offsets into the Garray.
; Determine if A and/or B have been seen before & call appropriate routine.
cmp dword ptr gs:[eax], cEmpty ; Was A seen before?
jne lBothPossible ; Yes, it's possible both have been seen.
cmp dword ptr gs:[ebx], cEmpty ; No, but was B seen before?
jne lOnlyB ; Yes, so only B was seen before.
mNeitherSeen ; No, neither has been seen before.
jmp lTestEOF
lOnlyB:
mOnlyBseen ; Only B was seen before.
jmp lTestEOF
lBothPossible:
cmp dword ptr gs:[ebx], cEmpty ; Was B seen before?
jne lBoth ; Yes, so both A & B were seen before.
mOnlyAseen ; No, only A was seen before.
jmp lTestEOF
lBoth:
mBothSeen
lTestEOF:
cmp ds:[buffEOF], cEOF ; Are we at EOF?
jne lReadNextEdge ; No, so process the next edge.
; Anounce that the routine has finished & return.
mPrn readFinishMsg
ret
ReadGraph ENDP
BuffLoad PROC NEAR
; Loads the next buffer's worth of data from disk into the file buffer.
; in: inFile - File handle of the input file.
; buffSize - Size of buffer, in bytes.
; FS - Points to the file buffer.
; esi - Buffer offset of next record location.
; eax - A vertex of last record in previous buffer.
; ebx - B vertex of last record in previous buffer.
; out: bytesInBuff - Contains the number of bytes actually in file buffer.
; esi - Points to the first rec in the file buffer.
; dstryd: Nothing.
jmp #lStart ; Jump past local data to the start of the procedure.
readFailMsg db 'Disk read error.', cBell, cCR, cLF, cEndOfStr
#lStart:
; Load the next buffer from disk.
pushad
mov ah, 3fh
mov bx, ds:[inFile]
mov ecx, ds:[buffSize]
mov dx, fs
push ds
mov ds, dx
mov edx, 0
int 21h
pop ds
jc lReadError
; Resore the current record & return.
mov ds:[bytesInBuff], eax ; Save number of bytes actually in buffer.
popad ; Restore registers.
mov esi, 0 ; Re-set the pointer to start of the buffer.
ret
lReadError:
mHalt readFailMsg, cFatalError
BuffLoad ENDP
PrintResults PROC NEAR
; Closes the input & output files.
; in: inFile - Contains handle of opened input file.
; edi - Total number of connected components of the graph.
; out: Nothing.
; dstryd: ax, bx
jmp #lStart ; Jump past local data to the start of the procedure.
totalMsg db cCR, cLF
db 'Total connected components = ', cEndOfStr
connectMsg db cCR, cLF
db 'The graph is CONNECTED.', cCR, cLF, cEndOfStr
notConnectMsg db cCR, cLF
db 'The graph is NOT connected.', cCR, cLF, cEndOfStr
mPrnEDI MACRO
; Prints the number in EDI.
; in: edi - Number to be printed.
; out: edi - Number to be printed.
; dstryd: eax, ebx, ecx, edx
; Push each digit onto the stack.
mov eax, edi
mov edx, 0
mov ebx, 10
mov ecx, 0
lGetNextDigit:
inc ecx
div ebx ; Determine the next digit.
push edx ; Save the digit.
mov edx, 0
cmp eax, 0 ; Was the last digit just processed?
jg lGetNextDigit ; No, get the next one.
; Pop each digit off the stack & print the ASCII version of it.
lPrnNextDigit:
pop edx ; Get the next digit.
add dl, '0' ; Convert the digit to ASCII.
mov ah, 02h ; Print it.
int 21h
loop lPrnNextDigit ; If any digits left, print the next one.
ENDM
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Start of procedure PrintResults
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
#lStart:
; Close the input file.
mov ah, 3eh
mov bx, ds:[inFile]
int 21h
; Print the results.
mPrn totalMsg
mPrnEDI
cmp edi, 1
jg lNotConnected
mHalt connectMsg, 0
lNotConnected:
mHalt notConnectMsg, 0
PrintResults ENDP
_CodeSeg ENDS
END Main
;;;;;;;;;;;;;;;;;;;;;;;;;;;;; End of GAD.Asm ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[GRAFDUMP.PAS]
(***************************************************************************
* GrafDump.Pas
* Program: GrafDump.Exe
* Author: Edward Allburn
* Compiler: Turbo Pascal 5.0
*
* Description: Displays list of edges in the file "GrafSimp.In".
*
* Usage: GrafDump
* Where: Nothing.
*
* IN Files: "GAD.In" - List of edges that make up the graph. Each vertex
* of an edge is a 4-byte DWORD. Thus, the total
* record length of each edge is 8 bytes.
* OUT Files: None.
*
* Abbreviations:
* None.
***************************************************************************)
Program GrafDump;
uses Dos;
type
tVertex = longint; (* Vertices are 4-byte values. *)
tEdge = record (* An edge is comprised of 2 vertices. *)
a,
b :tVertex;
end;
var
in_file :file of tEdge;
edge :tEdge;
begin
(* Print the title. *)
writeln('GrafDump.Exe 1.0 Copyright (c) 1990 by Edward Allburn');
writeln('-----------------------------------------------------');
(* Print the list of edges contained in the file. *)
assign(in_file, 'GAD.In');
reset(in_file);
repeat
Read(in_file, edge);
with edge do writeln(a, ',', b);
until eof(in_file);
close(in_file);
end.
(************************** End of GrafDump.Pas *****************************)
[GAD.DOC]
----------------------------------------------------------------------------
- GAD.Doc
- Program: GAD.Exe
- Author: Edward Allburn (September, 1990)
-
- Description: This program demonstrates the Graph Array Decomposition
- algorithm described in the JAN '91 issue of "Dr. Dobb's Journal".
- It uses the algorithm to determine how many connected components
- a graph is comprised of. Both this program and the algorithm it
- demonstrates are released into the public domain.
-
- Usage: GAD NNN
- Where: NNN = Max vertex value of graph (ok if > than actual max val).
-
- IN Files: "GAD.In" - List of edges that make up the graph. Each vertex
- of an edge is a 4-byte DWORD. Thus, the total
- record length of each edge is 8 bytes.
- OUT Files: None.
----------------------------------------------------------------------------
FILES IN "GAD.Zip":
------------------------
GrafDump.Exe - Displays all of the edges in "GAD.In" to StdOut.
GrafDump.Pas - Pascal source (Turbo Pascal 5.0) for GrafDump.Exe.
GAD.Asm - Assembly source (Phar Lap 386|ASM) for GAD.Exe.
GAD.Doc - This file.
GAD.Exe - Compiled from Turbo Pascal 5.0 version of program.
GAD.In - Sample input file of graph shown in figure 7a of article.
GAD.Pas - Pascal source (Turbo Pascal 5.0) that accompanied article.
USE OF "GAD.Exe":
-----------------
The file "GAD.In" must exist in the current directory. The max vertex
value in this file must be specified on the command line. This value is used
to determine how large of an array to allocate. No harm is done if a value
greater than the actual max vertex value is specified. For example, the sample
input file supplied is the graph shown in figure 7a of the article which has a
max vertex value of 8. To process this graph, the command line would read:
"GAD 8"
A help screen is displayed if no/invalid paramaters are specified.
The program has an overhead of about 100K of memory. Beyond that, the
memory requirements for this program depend on the max vertex value specified
on the command line. Because each vertex is 4 bytes in size, the amount of
memory required for the array is (max vertex value * 4) bytes. For example,
the total amount of memory required by the program for the previous example is:
100K + (8 * 4) = 100.032K
The total memory required by a graph 1 million vertices in size will be about
4.1 megabytes. After memory for the array has been allocated, all of the
remaining memory on the system is allocated for use as a file buffer.
FILE RECORD STRUCTURE:
----------------------
The input file "GAD.In" contains the list of edges that make up the
graph. Each vertex of an edge is an unsigned DWORD (four bytes). Thus, the
total record length of each edge is eight bytes. The following type
definitions were taken from the file "GAD.Pas" and illustrate the input
file layout:
type
tVertex = longint; (* Vertices are 4-byte values. *)
tEdge = record (* An edge is comprised of 2 vertices. *)
a,
b :tVertex;
end;
YOUR FEEDBACK:
--------------
I am interested in hearing any suggestions or observations you have. You
can contact me via my CompuServe account 72617,2624 or by my mailing me at:
4830 S. Salida Ct.
Aurora, CO 80015
-------------------------------- End of GAD.Doc ------------------------------
Copyright © 1991, Dr. Dobb's Journal