Wilbon Davis holds a B.S. and an M.S. in physics and mathematics from Southwest Texas State University, and is a professor in the Computer Science department of SWTSU. He also teaches a wide range of computing topics in industry. He can be reached by mail at SWTSU, 601 University Drive, San Marcos, TX 78666-4616, or by Internet at wd01@WTEXAS.bitnet.
Introduction
Precise formulas for estimating run time, memory consumption, I/O traffic, and other values of interest would make choosing among algorithms simpler, but such formulas are rarely available when needed. Fortunately, there is a way to make crude but useful estimates to guide these choices. You estimate the cost function associated with each algorithm to see how its execution time grows with problem size.By way of example, I will present the evolution of a sorting program and the performance gains that arise from several refinements. Then I will discuss a general strategy for estimating performance differences among choices of algorithms.
The Bubble Sort
One of my favorite instructional films, "Sorting Out Sorting" [4], points out that bubble sort is an algorithm whose fame exceeds its virtue. However, two things have kept bubble sort from disappearing from my computing lexicon bubble sort is easily understood and provides an excellent example when discussing efficiency of algorithms.Bubble sort works simply. You repeatedly scan a list to be ordered, comparing adjacent items. When the items in a pair are out of order, swap them. When a scan results in no swaps, the list is sorted. C code for a simple version of bubble sort appears in Listing 1.
I compiled the code in Listing 1 without optimization and for 16-bit operation with the Zortech C compiler. The second row of Table 1 contains the results of timing the program as it sorted various sizes of arrays of randomly generated integers.
Sorting the smaller arrays went quickly, but the array of 8,000 elements took 277.6 seconds, a bit over 4.5 minutes and a bit beyond my patience. Look at the second row of Table 1. Note that the time increased by a factor of 4.08 going from 1,000 to 2,000 items and increased by a factor of 3.99 in going from 2,000 to 4,000 items. When going from 4,000 to 8,000 items, the time increased by a factor of 277.60/67.95, or 4.09. It seems reasonable to guess that doubling the size of the array produces about a four-fold increase in run time.
As a matter of fact, a good predictor of bubble sort's performance is to increase sort time by the square of the ratio of the numbers of items in the arrays. The data in the table nicely fits a quadratic, giving
Bubble_time(n) = .878 - .972n + 4.445n2where n is the list size in thousands as an approximation of the sort time in seconds.This formula is an example of a cost function. That is, Bubble_time is a function defined on the natural numbers and is never negative. In this case, the cost of sorting was measured in seconds, but many other measures such as bytes of memory, dollars, etc., might be appropriate.
If you step through the code in Listing 1, you will soon see that after the first scan, the largest item in the list will be in its permanent position at the end of the list. Likewise, after the second scan, the next largest element will be in the next-to-last position, and so on.
The code in Listing 2 incorporates this idea of scans of decreasing lengths, and yields a program that is almost twice as fast as the first version. However, the results in Table 1 show the same quadrupling of execution time with each doubling of array size. The data movements have not changed only some unneeded comparisons are eliminated.
Tuning the Bubble Sort
To test the effectiveness of improved compiler technology, I invoked Zortech's global optimization feature and recompiled the code in Listing 2 to run under a 32-bit DOS extender. The resulting program is indeed faster as the fourth column of Table 1 indicates. The optimized version consistently runs in slightly less than half the time required by the unoptimized 16-bit version.So far, the various combinations of code and compilation strategies have yielded ratios of 1 : .49 : .20 for the run times of the three versions discussed.
Other ideas come to mind. Use a faster machine, switch to assembly language, even employ a better programmer. But no matter which of these ploys is used, the formula for the time will always contain a term with a factor of n2. Thus, while these measures can affect the constant factors in the formula, none will prevent a doubling of problem size from causing an approximate quadrupling of computation time.
The difficulty is that without a fundamental change in the algorithm, the same items are swapped in the same sequence. Experimenting with paper and pencil shows one problem to be that items which are far out of position may require a large number of passes to finally come to rest.
The Shell Sort
The code in Listing 3 modifies bubble sort using Shell's idea of giving elements that are far apart special consideration. The code first rearranges the array so that items which are n positions apart are in the correct order. Then, items that are (n+1)/2 apart are considered. After each execution of the inner bubble-sort loop, the distance between items is halved. After several passes, the distance between compared elements becomes 1 and the algorithm becomes identical to the original bubble sort. However, using Shell sort, most of the large data movements already have taken place.Clearly, the code in Listing 3 involves extra overhead. Is the overhead worthwhile? Judge for yourself by examining the results of repeating the bubble sort experiments with Shell sort in Table 2. (Using the formulas in the last row of Table 1, compute the values omitted in the table and you will understand my reluctance to fill in the blank spots experimentally.)
As you can see, the speed increase is remarkable. Exchanging a pair of items which are far apart takes no longer than exchanging an adjacent pair because a single distant exchange replaces many adjacent exchanges. Thus, Shell sort has a fundamentally different response to changes in problem size than bubble sort. This change is reflected in the approximation formula for the run-time:
Shell_time(n) = .210n1.351Cost Functions
A single term in each of the cost functions considered so far dominates the function's values for large values of n. In the expression
Fastest_bubble_time(n) = .941 - .672n + .959n2for the fastest of the bubble sort versions, the term .959n2 is already over 100 times the size of the next largest term when n equals 70. So, using
Fastest_bubble_time(n) = .959n2will not be terribly inaccurate for large values of n.Consider 4.57n13, the ratio of the approximate times for the fastest bubble sort and Shell sort. Whether the coefficient is 4.57 or any other positive value, the ratio becomes greater than 1 for large enough n. No matter how fast your machine, no matter how bright your programmer, no matter what language you use, if you use bubble sort for large enough problems, you will lose a sorting race to someone using Shell sort programmed in interpreted BASIC on a 4.77MHz PC.
Fortunately, you have ready access to information to take advantage of this lesson. Simplified cost functions can be derived from a pseudocode design as easily as from code. The cost functions for most well-known algorithms are easily found in references. These simplified cost functions can give guidance about choosing among alternative algorithms very early in the design process as well as later when code is written.
The topic of estimating cost functions for algorithms is usually called the study of algorithmic complexity. (Remember that time complexity is the context of this article, but the principles apply equally well to other measures of cost.) Other phrases that are associated with the study are "asymptotic complexity," "order analysis," and "order arithmetic."
Functional Dominance
The answer to isolating the portion of the cost function that is related to the nature of the algorithm from the other factors lies in the idea of functional dominance. Simply put, given a cost function f, by what simpler function can it be replaced while ensuring that the replacement does not under- estimate the cost?Suppose an algorithm X has as its cost function
12n2 + 3n + 5Can this cost function be replaced by a term Cn2 where C is a constant? Simplicity is in the eye of the beholder, but assume that this single term satisfies that condition. A little algebra applied to the inequality
12n2 + 3n + 5 < Cn2yields
12 + 3/n + 5/n2 < Cand lets us see how C depends on n. If I substitute 1 for n, I find that choosing C = 21 will make the inequality
12n2 + 3n + 5 < 21n2true whenever n, the problem size, is 1 or greater.So 21n2 is an approximation to the cost function of x for any positive n, and the approximation never underestimates. Of course, the approximation overestimates the runtimes somewhat. Suppose we say that "large enough" starts at n = 100 instead of at 1. Then the inequality will be true for C > 12.0305 provided n > 100. For larger problems, 12.0305n2 is certainly a better approximation than 21n2. However, even approximate values of C and n are rarely needed. For my purposes, the important portion of the simplified cost function is the n2 part.
"Big O" Notation
Formally, the discovery of values for C and n in the previous paragraph has satisfied the definition of functional dominance. A cost function f is dominated by a cost function g if and only if for some positive constant C and for all sufficiently large n, f(n) < Cg(n). The notation O(g), (read "big Oh of g") denotes the set of all functions that are dominated by the cost function g. So,
12n2 + 3n + 5 e O(n2)expresses the notion that n2 dominates the left hand side. Often,
f e O (g)is read as "f is of order g," or "f is of order O(g)."When two cost functions f and g have the reciprocal property that f e O(g) and g e O(f), either of the functions can be used to estimate the response of the other to an increase in problem size. For example,
n2 e O(12n2 + 3n + 5)and
12n2 + 3n +5 e O(n2)Thus for large n, the ratios of the squares of the problem sizes estimate the ratios of the costs of executing an algorithm. If n1 = 1000 and n2 = 1500, the ratios of the respective run-times should be about
15002/10002 = 2.25The problems of the precision of the approximation and the problem sizes covered seldom cause much trouble. Furthermore, the exact value of the constant C is not important. In the first place, C is dependent on the environment rather than solely on the algorithm. And in the second place, when the approximations are used to estimate the effects on ratios of run-times for problems of various sizes, the constant C divides out anyway.
Handy Rules
Several theorems, or handy rules if you prefer, make simplifying a cost function easier. For example, the order of any polynomial whose highest power is k is O(nk). That is, aknk + ak-1nk-1 + ... + a1n + ao is of order nk. So, when the cost function for an algorithm is a polynomial of degree k, then for "large" problems, an m-fold increase in problem size will result in about an mk-fold increase in run time.Given two cost functions f and g, if f dominates g or vice versa, denote the dominant function of the pair by dom(f,g). Then some other helpful rules are:
1. The order of f + g is O(dom(f,g)).
2. If k is a constant, the order of kf is 0(f).
3. The order of fg is O(f)O(g).
These theorems can be used to simplify cost functions readily.
The expressions in the above theorems occur naturally in the analysis of code or pseudocode. For example, suppose two sections of code S1 and S2 are executed in sequence, the time to execute S1 is f(n), and the time to execute S2 is g(n). Then, the time to execute
S1; S2;is simply f(n) + g(n). By rule 1, the order of f(n)+g(n) is dom(f(n),g(n)) so the order of the above sequence is O(dom(f(n),g(n)).
Rules for Code Sequences
Other rules of deriving complexity estimates for code are illustrated in the following:
Rule: if (cond) /* assume time to evaluate cond doesn't depend on n */ S1; /* S1 e O(f) */ else S2; /* S2 e O(g) */O(dom(f,g)) is order of if-then statement.
Rule: while(cond) /* loop is executed O(g) times */ S1; /* O(f) is order of body of loop */O(fg) is order of entire loop.Now for a real example. Consider the following pseudocode for a selection sort, where x is an array of n values:
for i = 1 to n-1find the index j of the smallest
value in the array betweenpositions i and n-1;interchange the items at positions j
and i; endfor;Note that the loop is executed n-1 times, which is O(n). The section that searches the array for the smallest value will search differing size lists with each iteration. The longest segment searched is (n 1) elements long and the shortest has length 2. So, the average length of the segments searched is (n + 1)/2 which is of order O(n).Since code of order O(n) is inside a loop that is executed O(n) times, the order of the algorithm is O(n2).
Best and Worst Case Behavior
The previous analyses have assumed "average" behavior. However, algorithms can also be analyzed for worst-case or best-case behavior. For example, bubble sort is superb if used to sort an already sorted array. The algorithm discovers no swaps on its first pass and exits, so its order in that case is O(n). Quicksort with naively chosen pivots is of order O(n2) for previously sorted arrays and so can be a very poor choice in some circumstances.Some common algorithms and their average case orders are given in Table 2.
As a final example, look at the flowchart in Figure 1 of a module that contains four other modules. The numbers on the lines indicate the number of times that path is traversed. The question is, where should limited resources be concentrated to improve the performance of the code for large problems?
To find the bottleneck, compute the contribution of each module to the overall order of the code. The contribution to the if-then-else construct by A is O(nxn) = O(n2). B contributes O(n2xn) = O(n3) to the if-then-else. The dominant of the two orders is
dom(n2n3) = 0(n3)so the order of the if-then-else can be traced directly to B. C is executed O(n) times and contributes O(n1.5) to the process. But, C's contribution is dominated by B. D is executed O(n) times and has order O(nlog(n)) so its contribution is O(n21og(n)). So, since
dom(n3,n2log(n)) = O(n3)the culprit is B. To improve the efficiency of the program, work to decrease the order of B and to decrease the number of times B is executed.
When to Analyze
You have two opportunities to influence the execution time of a software product. The first, and perhaps most important, is during the design phase when you choose the data structures and their accompanying algorithms to represent the objects in the application environment. Your primary tool at this point is complexity analysis as discussed here.Your second opportunity is during the coding phase. Wisely choosing languages, hardware, and other such factors can have a profound effect on software performance. But my best advice is to resist the temptation to write code that sacrifices simplicity for speed. Strive instead for understandable, correct code. Then, when you have working code, use a profiler to determine the sections of code that contribute most to run time and work on those sections. Numerous studies have shown that the majority of the time to execute a program is spent in a small minority of the lines of code. Clearly, time spent recoding small sections of critical code for speed brings greater rewards than time used optimizing rarely executed code.
Remember that the phrase "for large problems" was used throughout the discussion. How large must a problem be before the above discussion is relevant? The answer depends on the problem, the algorithms considered, the machine, and the user of the program. The information gleaned from the order analysis of time complexity is almost always of great value, but there are occasional exceptions. For example, I suspect that if I had to maintain a small, sorted table of ten or so items in an operating system, I would find a simple insertion sort to be faster than Quicksort because of the overhead.
A brief example, admittedly extreme, will illustrate the relative benefits of compiler improvements, hardware speed improvements, recoding, and other such pursuits versus finding better algorithms. Suppose a problem of size 64 could be solved in 24 hours by each of two algorithms A and B and that the order of A is O(n2) and that the order of B is O(2n). A machine is available that is 100 times faster than current equipment. How large of a problem can now be solved in 24 hours? Using algorithm A, a problem of size 640 can be solved in a day. Using algorithm B, the capacity goes from 64 to only 70.
So a final lesson to be learned from this article is that improvements in hardware will never solve all computing problems. In some cases, only new algorithms will do.
References
1. Algorithms, Robert Sedgewick, Addison-Wesley, Reading, Ma., 19842. Discrete Mathematics in Computer Science, Donald Stanat and David McAllister, Prentice-Hall, Edgewood Cliffs, NJ, 1977
3. Handbook of Algorithms and Data Structures, G. H. Gonnet, Addison-Wesley, Reading, Ma., 1984
4. Sorting Out Sorting, Computer Systems Research Group, Toronto, 1981
5. Writing Efficient Programs, Jon Bentley, Prentice-Hall, Edgewood Cliffs, NJ, 1982