For as long as I can remember, I've been fascinated by the issue of understanding.
As long as I don't ask myself to define my terms too finely, I can convince myself that I sort of understand how we understand. Surely, I tell myself, understanding is a process of reducing surface complexity without losing the information in the depths. Identifying minimal-surface-area packing schemes for information. Something like that. And I even think I know some of the techniques we use when we attempt to understand something.
There's abstraction, there's seeing parallels, there's organization. In particular, we discover organization in -or impose organization on -the data: We rank and file the raw phenomena, we identify the properties and impose a taxonomy. This month's column is about an effort to taxonomize algorithms for parallel processing.
Two years ago computer scientists from several universities organized a workshop in Santa Fe, New Mexico, to discuss the possibilities of developing a taxonomy of parallel algorithms. What the organizers came away with from the workshop was not a taxonomy but an increased respect for the size of the taxonomic task.
A neat array of pigeonholes in which to put parallel algorithms, the workshoppers concluded, was an edifice that the field of computer science was not yet ready to erect. First some spadework would have to be done. There were at least these preliminary problems: 1. Identifying the properties or attributes of algorithms that actually had something to do with parallelism. 2. Untangling the relation ships among problem, algorithm, and architecture in parallel processing. 3. Clearing away some of the inconsistencies in terminology that arose from the fact that parallel processing was really several different paradigms.
Some workshops produce proceedings. ( And some publish the proceedings before the workshop has taken place; these documents should perhaps be called precedings. But I digress.) This workshop brought forth, after a gestational hiatus, an interesting book, The Characteristics of Parallel Algorithms, Leah H. Jamieson, Dennis B. Gannon, and Robert J. Douglass, editors; The MIT Press, Cambridge, Mass., 1987. The book is not the patchwork proceeding of a workshop, but a well organized building plan for a taxonomy of parallel algorithms.
The book consists of three sections: The first attempts to identify characteristics of algorithms relevant to parallel processing, the second examines the characteristics of concurrency from the point of view of applications, and the third looks at how a taxonomy of parallel algorithms would affect the organization of software engineering tools for the programmer. I'll discuss here some of what the authors attempt in the first section.
The book opens at a paradigmatic level. In the past this column has generously given out several definitions for the word paradigm, and will not hold back on offering another: For present purposes let's let the word paradigm - or better, the expression the paradigmatic challenge-refer to the mapping of problems onto algorithms and architectures. That's a little ethereal, but the book is not all theory. It digs into the analysis of parallel algorithms, the characterization of parallel algorithms and architectures, and the benchmarking of parallel architectures.
Philip Nelson and Lawrence Snyder of the University of Washington open the book with a discussion of the importance of the paradigmatic perspective for getting some understanding of parallel processing. They say "In the case of parallel computation... paradigms generally encapsulate information about useful communication patterns." In otherwords, the paradigm selected specifies the useful patterns of communication, and efficient communication is precisely the problem in parallel processing architecture.
Nelson and Snyder consider three broad programming paradigms: divide-and-conquer, compute-aggregate-broadcast (CAB), and the systolic and pilpelined paradigm.
Divide-and-conquer was probably first cogently described as a programming paradigm by Robert Floyd in his Turing Award lecture, "The Paradigms of Programming." (You can read Floyd's description in ACM Turing Award Lectures: The First Twenty Years: 1966-1985, ACM Press/Addison-Wesley, 1987.) The idea is that the problem is divided up into two or more smaller problems to be solved independently, and the solutions are combined. If the smaller problems are just smaller versions of the original problem, the paradigm is recursion. Nelson and Snyder shed new light on divide-and-conquer as a parallel processing paradigm and attempt to identify just how it differs from other parallel processing paradigms.
One conclusion they draw is that the divide-and-conquer paradigm is often (although not always) characterized by a binary n-cube communication structure. ( Of course, given a predisposition to see communication structure as the problem in parallel processing, it is not surprising that it is communication structures they pull out as the distinguishing marks of individual parallel processing approaches. I'm not suggesting that Ne!son and Snyder are wrong in this, but just pointing out one example of how your paradigm conditions the questions you think to ask.)
Algorithms in the CAB paradigm have three phases: a compute phase; an aggregate phase, in which local values are combined from processes into (fewer) global values; and a broadcast phase, which returns global values to the processes. The authors look at several such algorithms, but I was unable to find a general conclusion that they drew about the paradigm. Maybe I just missed it.
Nelson and Snyder lump systolic and pipelined approaches into one paradigm, characterized by "the decomposition of the problem into subcomputations that are assigned to dedicated processes with the data 'flowing' through the processes, visiting all or an appropriate subset of processes to complete the computation for that input." One example algorithm is the band matrix multiplication algorithm of Kung and Lieserson. The "systolic and pipelined paradigm" may sound more like an architecture than a paradigm. Indeed, parallel paradigms are tied tightly to architectures as well as to algorithms, and sometimes you wonder which is the cart and which is the horse. As we'll see momentarily, some of the book's contributors directly attack the problem of the relationships among paradigms, algorithm, and architectures in parallel processing.
The distinguishing characteristic of the systolic and pipelined paradigm seems to be the idea of flow, a (there you go jumping ahead of me again) communication property. The authors present a vigorous test of the "flow property," which, unfortunately, does not always identify putatively systolic algorithms as having the flow property. Nevertheless, it appears to test an interesting property common to many such algorithms.
The flow test works like this: You select an arbitrary communication edge and "radioactively tag" a single communication across that edge. As the computation progresses... let all values computed with one or more radioactive values become radioactive. Then define the "contaminated region" as the set of processes and communication channels touched by some radioactive value. An algorithm passes the flow test if the edge over which the initial value was transmitted is on the boundary of the region; otherwise it fails the test.
The idea is that the tagged value flows out to define the contaminated region.
It seems that if we could neatly map algorithms to architectures, we could reduce the dimensionality of the paradigmatic challenge. Because we had defined the word paradigm to mean the mapping of problems to algorithms and architectures, the challenge then would be one of mapping the problem to the combined algorithm/architecture rather than one of solving a three-body problem. In fact, Leah Jamieson, a Purdue University computer scientist and the book's lead editor, talks about just this: how to map parallel algorithms to parallel architectures.
Jamieson introduces an interesting model, involving the algorithm life cycle and virtual algorithms. And she identifies the characteristics of parallel algorithms that she thinks are worthy of attention in taking a paradigmatic approach to parallel algorithms:
If as Nelson and Snyder maintain, communication is the parallel processing problem, then the analysis of parallel algorithms requires some means of measuring communication performance in parallel implementations --both existing implementations and those implementations we'd build if we knew that the payoffs in performance would justify the costs. But the communication performance of a parallel algorithm will depend more (and in more complex ways) on the machine architecture on which it is realized than is the case for sequential algorithms on sequential machines.
Steven Levitan of the University of Massachusetts says that there exists no general theory of communication complexity, meaning that we can't predict performance in parallel processing. We can observe performance in real implementations, but we can't predict performance in architectures not yet built. We can't model; we must experiment. That's why the literature on parallel processing contains so many experimental case studies: "We implemented the following algorithm on the following architecture and obtained the following performance speedup over the baseline performance of the best-known sequential algorithm on a single-processor version of the same architecture. The experimental subject was then sacrificed and its cortex weighed Excuse me; a momentary rat lab relapse.
Communication complexity combines components of algorithm complexity and architecture complexity. Levitan wants to separate the warp from the woof to characterize the complexity component attributable to the architecture so that he can factor it out. To do this he needs to develop a metric for an architecture's communication complexity, and that's what he sets out to do in his chapter.
He begins by presenting several candidate metrics, including:
What he does next is interesting: He runs the algorithms themselves as metrics, taking care not to include the case in which an algorithm is used both as the measuring device and the object measured, and finds the algorithms uniformly better average predictors of machine performance than any of the abstract metrics. He concludes that the best metric for the problem is a good suite of benchmarks, covering fundamental parallel processing tasks, and he presents one.
His suite of benchmarks consists of these tasks:
Can we, Nelson and Snyder ask (or seem to me to be asking, because it's a question very much on my mind as read their chapter), develop a computer science discipline that deals with the analysis of paradigms in a way that is analogous to the analysis of algorithms? They think so and suggest that such a discipline would teach techniques such as contraction analysis, which enables you to solve a problem for the paradigm and have the solution work for all algorithms that follow the paradigm. Here's Jamieson, talking about algorithms, and in the process arguing that paradigms can be identified by shared characteristics of algorithms, and therefore can be analyzed:
The rationale behind the "characteristics-based" approaches is that many algorithms do possess an identifiable structure. For example, at the data dependency level, many algorithms share similar communications patterns. At the process level, algorithms based on the same paradigm - that is, divide-and-conquer -- may exhibit similar communications requirements. Algorithms that operate on similar data structures may lend themselves to execution on similar architectures. In the area of digital signal processing, it is possible to identity a set of canonical algorithm structures (that is, second order section, FFT, autocorrelation, convolution) and that algorithms that can be expressed in terms of these structures can be constructed from a set of basic building blocks (e.g., multiplications, complex multiplications, butterflies, sums-of-products, address arithmetic). A concise description of the algorithm in terms of a basic set of features allows selection of an appropriate machine configuration and can facilitate the mapping process by relating the characteristics of the current algorithm to known layout patterns.
The paradigms examined in the book are mostly familiar. Nelson and Snyder acknowledge that those they discuss do not provide the parallel programmer with a full toolkit of paradigms to guide him or her in developing new algorithms. But they conclude their chapter of the book by saying, "We expect more paradigms to be discovered." But that's an issue of a different context.
Just fifty years ago the philosopher of science Hans Reichenbach drew a distinction between what he called the context of discovery and the context of justification in science. The former deals with the origins of theories; the latter takes theories as existing artifacts and investigates their properties.
The fact that Kepler arrived at his views on the structure of the solar system from a consideration of parallels with the Holy Trinity, Reichenbach maintained, was a fact in the domain of history or possibly of psychology, but definitely not in the domain of science or the philosophy of science. It belonged to the context of discovery, and science (actually he used the word epistemology) is concerned only with the context of justification.
As difficult as it may be to develop a rigorous computer science course on the analysis of paradigms, it's at least a challenge within the context of justification. Much less likely to be rigorously codified in the near future is the issue of how we discover a paradigm, map the paradigm to the problem, see the applicability, find the new rules. This ability resides in the context of discovery, the domain of serendipity and sudden insight. It's the art of the science.
It's easy to convince yourself that there are no rules for discovering rules, that seeing the connections is all luck or unanalyzable genius, that the art of the science is outside science, that this art is really artlessness. It's convenient to say that perhaps the Holy Trinity is as reasonable a place as any to look for inspiration in developing programming paradigms.
But this is a mistake. The context of discovery is accessible to the study of psychology, and although there may be some debate about the extent to which psychology is a science, it is at least a discipline with rules. As George Polya says, "There are rules and rules."
Polya has written the definitive book on discovery in problem solving (Mathematical Discovery, two volumes, John Wiley & Sons, 1965). In this extraordinary work Polya discusses very broad paradigms for problem solving and sets down the ten commandments for nurturing serendipity.
"Solving problems," Polya says, "is a practical art, like swimming, or skiing, or playing the piano: you can learn it only by imitation and practice." Polya presents in his book several examples for imitation, and discusses how to discover the pattern in example solutions, so that you can apply it to similar problems. He presents several such useful patterns, and by pattern he means nothing more or less than paradigm. He identifies several mathematical paradigms, but also such broad problem solving paradigms as problem reduction ( two kinds) and guess-and-test.
But Polya goes beyond merely identifying problem-solving paradigms; he also enters the context of discovery and gives rules for discovering solutions to problems. These are rules of the sort found in practical arts: general guidelines that make intuitive sense and seem to work well. They include, somewhat rephrased here, the following rules of problem solving. (If they seem too obvious to mention, I suggest you examine your own methods the next time you attempt to solve a problem; perhaps you're ignoring the "obvious"!)
Imposing a Property Taxonomy
Three Paradigms
Mapping Algorithms to Architectures
But the performance of an algorithm depends on the architecture on which it is implemented.Benchmarking Parallel Architectures
He tests the metrics on some typical algorithms and gets mixed results. Some algorithm/architecture combinations are bandwidth limited, some diameter limited, and so on. Some perform in ways not well predicted by any of the metrics.
The Analysis of Paradigms
The Context of Discovery
If it happens that you find yourself designing the paradigm requirement for a computer science department curriculum for the 1990s, I suggest that you consider putting mathematical discovery on the reading list.