ALGORITHM ALLEY

NP-Completeness

Bruce Schneier

A traveling salesman has to make sales calls in ten different cities, but only has one tank of gas. Is there a route that will allow him to visit all ten cities on that single tank of gas? See Figure 1.

This variation on the classic traveling-salesman problem is formally known as the "Hamiltonian circuit problem." You can think of the map as a connected graph, where the cities are nodes and roads between them are edges. Is there a single path connecting all the nodes, where the total of all the edges is less than a given amount?

The data structure for this problem is a matrix (see Figure 2), where element (A,B) is the value of the distance between City A (Albuquerque) and City B (Bakatapur). The challenge is to find a set of array elements, such that there is exactly one element in each row and one element in each column, and such that the sum of all the elements is less than a given amount.

While this problem is easy to solve with only ten cities, it rapidly approaches the impossible as the number of cities increases because there's no efficient algorithm to solve the problem. The best algorithm is to guess at the solution, and then refine the guess based on the results. In short, mathematics can't give an efficient algorithm for this problem.

Unfortunately, mathematics can't prove that there isn't an efficient algorithm, either. The best a mathematician can say is, "I can't find an efficient algorithm, but neither can all these famous mathematicians."

Complexity of Algorithms

Complexity theory provides a methodology for analyzing the computational complexity of different programming algorithms to solve problems such as that of the traveling salesman. Using complexity theory, you can compare the efficiency of different algorithms and determine which is faster.

The computational complexity of an algorithm is expressed in what is called "big O" notation, the order of magnitude of the computational complexity. The order of magnitude of the complexity is just the term of the complexity function that grows the fastest as n gets larger; all constant and lower-order terms are ignored.

For example, if the computational complexity of a given algorithm is 4n²+7n+12, then the computational complexity is on the order of n², expressed as O(n²).

The advantage of measuring complexity in this way is that it is system independent. You don't have to know the exact timings of various instructions, the number of bits used to represent different variables, or even the speed of the processor. One computer might be 50 percent faster than another and a third might have a data path twice as wide, but the order-of-magnitude complexity of an algorithm remains the same. This isn't cheating. When you're dealing with algorithms as complex as those here, this is usually negligible compared to the order-of-magnitude complexity.

What this notation allows you to see is how the time and space requirements are affected by the size of the input. For example, if T=O(n), then doubling the size of the input doubles the running time of the algorithm. If T=O(2ⁿ), then adding one bit to the size of the input doubles the running time of the algorithm.

Generally, algorithms are classified according to their time or space complexities. An algorithm is constant if its complexity is independent of n, namely, O(1). An algorithm is linear, O(n), if its complexity grows linearly with n. Algorithms can also be quadratic, cubic, and so on. All these algorithms are polynomial; their complexity is O(n^t), where t is a constant. Algorithms that have a polynomial complexity class are called "polynomial time" algorithms.

Algorithms that have complexities of O(t^f(n)), where t is a constant and f(n) is some polynomial function of n, are called "exponential." Exponential algorithms quickly get computationally impossible, so they are often unusable. Cryptographers like to base the security of their ciphers on these algorithms, though.

As n grows, the complexity of an algorithm can make an enormous difference in whether or not the algorithm is practical. Table 1 shows the running times for different algorithm classes for various values of n. Notice how fast the complexity grows in the exponential case. If n is equal to 1 million, and if a computer can perform one iteration per microsecond, it can complete a constant algorithm in a microsecond, a linear algorithm in a second, and a quadratic algorithm in 11.6 days. It would take 32,000 years to complete a cubic algorithm; not terribly practical, but a computer built to withstand the next ice age would eventually deliver a solution. Solving the exponential algorithm is futile, no matter how well you extrapolate computing power, parallel processing, or the heat death of the universe.

Consider the problem of trying to break a cryptographic algorithm. The time complexity of a brute-force attack (trying every possible key) is proportional to the number of possible keys, which is exponential in the key length. If n is the length of the key, then the complexity is O(2ⁿ). For example, DES has a 56-bit key. Against a 56-bit key it will take 2⁵⁶ attempts (2285 years, assuming 1 million attempts per second); against a 64-bit key, it's 2⁶⁴ attempts (580,000 years, with the same assumptions); and against a 128-bit key, 2¹²⁸ attempts (10²⁵ years). The first is on the edge of possibility; the last is ridiculous to even contemplate.

Complexity of Problems

Complexity theory also classifies problems according to the algorithms required to solve them. The theory looks at the minimum time and space required to solve the hardest instance of the problem on a Turing machine--a finite-state machine with an infinite read/write tape of memory. It turns out that a Turing machine is a realistic model of computation on real computers.

Problems that can be solved with polynomial-time algorithms are called "tractable" because they can usually be solved in a reasonable amount of time for reasonably sized inputs. (The exact definition of "reasonable" is dependent on circumstance.) Problems that cannot be solved in polynomial time are "intractable," because calculating their solution becomes infeasible. Intractable problems are sometimes just called "hard" because they are.

It gets even worse. Alan Turing proved that some problems are undecidable. It is impossible to write an algorithm to solve them, let alone a polynomial-time algorithm.

Problems can be divided into complexity classes, depending on the complexity of their solutions. On the bottom, class P consists of all problems that can be solved in polynomial time. Class NP consists of all problems that can be solved in polynomial time on a nondeterministic Turing machine. This is a variant of a normal Turing machine that makes guesses. The machine guesses the solution to the problem and checks its guess in polynomial time. Mathematically, you assume the nondeterministic Turing machine always guesses correctly. In practice, of course, this doesn't happen.

Class NP includes class P, because any problem solvable in polynomial time on a deterministic Turing Machine is also solvable in polynomial time on a nondeterministic Turing Machine. If all NP problems are solvable in polynomial time on a deterministic machine, then P=NP. Although it seems incredibly obvious that some problems are much harder than others (a brute-force attack against a cipher versus encryption of a random block of plaintext), it has never been proven that P<>NP. Mathematicians suspect that this is the case, though.

Stranger still, if you find a polynomial-time solution for several specific NP problems, then you've found a polynomial-time solution for a whole class of NP problems. A famous mathematical result proves that no NP problem is harder than one particular NP problem (the satisfiability problem). But some NP problems have been proven to be just as hard. This means that if the satisfiability problem is solvable in polynomial time, then all NP problems are solvable in polynomial time, and if any NP problem is proved to be intractable, then the satisfiability problem is also intractable. Since then, numerous problems have been shown to be equivalent to the satisfiability problem. This club of maximally hard NP problems is NP-complete. If any NP-complete problem is in P, then every NP-complete problem is in P, and P=NP.

NP-complete problems are considered the hardest problems in NP. Currently, the fastest known algorithms for solving any of them have exponential worst-case complexities. When (or, more realistically, if) someone finds a polynomial-time solution to any of them, it will be a major breakthrough in mathematics.

Further out in the complexity hierarchy is PSPACE, in which problems can be solved in polynomial space, but not necessarily polynomial time. PSPACE includes NP, but there are problems in PSPACE that are thought to be harder than NP. Of course, this isn't proven either. There is a class of problems, called "PSPACE-Complete" such that if any one of them is in NP, then PSPACE=NP; if any one of them is in P, then PSPACE=P. Finally, there is the class of problems called EXPTIME, problems solvable in exponential time.

NP-Complete Problems

In Computers and Intractability: A Guide to the Theory of NP-Completeness (W.H. Freeman, 1979) Michael Garey and David Johnson compiled over 300 NP-complete problems, among them:

3-Way Marriage Problem. There is a roomful of n men, n women, and n members of the clergy (priests, rabbis, and the like). There is also a list of acceptable marriages, which consists of one man, one woman, and one clergyman willing to officiate. Given this list of possible triples, is it possible to arrange n marriages such that everyone is either marrying one person or officiating in one marriage?

Knapsack Problem. Given a pile of items, each with a different weight, is it possible to put some of those items into a knapsack such that the knapsack weighs a given amount?

3-Way Satisfiability. There is a list of n logical statements, each with three literals: if (x and y), then z; (x and w) or (not z); if ((not u and not x) or (z and (u or not x))), then (not z and u) or x), and so on. Is there a truth assignment for all the literals that satisfies all the statements?

Like the traveling-salesman problem, the best algorithms for all these problems are exponential. No one knows any polynomial-time algorithm to solve these problems. No one knows that there isn't a polynomial-time algorithm to solve these problems. However, if you find a polynomial-time algorithm to solve one of these problems, then you've found a polynomial-time algorithm to solve all of them and proven that P=NP. Start looking.

Figure 1 The traveling-salesman problem.
Figure 2 Data structure for the traveling-salesman problem.

Table 1: Running times of different classes of algorithms.

Complexity  Class    n=10      n=20        n=30        n=40        n=50        n=60

Constant    O(1)   .000001 s. 000001 s.  .000001 s.  .000001 s.  .000001 s.  .000001 s.
Linear      O(n)   .00001 s.  .00002 s.  .00003 s.   .00004 s.   .00005 s.   .00006 s. 
Quadratic   O(n²)  .0001 s.   .0004 s.   .0009 s.    .0016 s.    .0025 s.    .0036s.
Cubic       O(n³)  .001 s.    .008 s.    .027 s.     .064 s.     .125 s.     .216 s.
Exponential O(2ⁿ)  .001 s.    1.0 s.     17.9 min.   12.7 days   35.7 years  366 cent.
Exponential O(3ⁿ)  .059 s.    58 min.    .5 years    3855 cent.  2*108 cent. 1.3*10¹³ c.

s. = sec.
c. = cent.