PROGRAMMING PARADIGMS

Parallel Processing, Object-Oriented Programming, and a Reading List

Michael Swaine

Several of the key difficulties of my original text cluster about the concept of a paradigm.. .One sympathetic reader.. .prepared a partial analytic index and concluded that the term is used in at least twenty-two different ways.

----Thomas Kuhn

Authors who use the word paradigm seem to permit themselves to mean by it a large number of different, and perhaps contradictory. things; and as I am committing myself to dealing with paradigms each month, I will honor this wise and liberal tradition.

A column called Programming Paradigms should, though, at least, deal with the issues Robert Floyd brought up in his 1978 Turing Award Lecture, "The Paradigms of Programming" (even though Floyd didn't define paradigm either). Floyd deplored the segmentation of computer science into narrow communities, "each speaking its own language and using its own paradigms. . .well-defined schools of LISP programming. APL programming. ALGOL programming, and so on." Such communities develop not just around languages but around any broadly applicable problem-solving technique, including such diverse techniques as recursion, stepwise refinement, generate-and-test, and data-flow design.

Programmers advance in their careers by learning a community's shared techniques, terminology, values, prejudices, model problems, and concrete examples of how to solve such problems what Thomas Kuhn, who investigated scientific paradigms, calls puzzle solving. This is the process of adopting a paradigm, and it prepares the programmer to answer an ad for a "C programmer" or a "software engineer" or a "knowledge engineer" or a programmer "with experience in object-oriented design."

A column called Programming Paradigms should offer some insight to the atypical programmer for whom this is not enough, the programmer who wants to add to his or her repertoire of paradigms.

This column will attempt to do that by exploring techniques from parallel processing, programming in logic, functional programming, object-oriented programming, and other alternative models. The focus will be on pointing out the alternatives to familiar, conventional methods. I can't think of a better place to begin the exploration than in the wilderness of MIMD parallel processing.

Parallel Paradigms

When I conducted an informal survey through the pages of the magazine last year, parallel processing came out at the top of the list of topics readers would like to see DDJ cover more often. It's not surprising; parallel processing promises increased throughput independent of other speedup techniques, and parallel processing raises new problems to be explored---a lot of them---the parallel world is bigger than the sequential world. There are many paradigms of parallelism, some better mapped than others. One of the least explored is MIMD parallelism.

The term MIMD means multiple instruction, multiple data. Logically, there are four such terms, representing the application of parallelism to instructions, data, neither, or both. The terms can be applied to both architectures and algorithms. SISD (single instruction, single data) is the familiar sequential Von Neumann model for computer architecture and programming; and MISD, in which multiple instructions are applied to single data items, turns out to be in practice indistinguishable from SISD. That leaves two broad paradigms of parallel processing: single instruction, multiple data (SIMD) and multiple instruction, multiple data (MIMD).

SIMD breaks down into two lowerlevel paradigms, each with its community of practitioners, traditions, model problems, and typical solutions. These are array processing and vector processing.

The array-processing paradigm (see Figure 1, page 102) involves a sequence of instructions applied concurrently to disjoint sets of data. The ICL Distributed Array Processor, which consists of a control unit and a 64 x 64 array of processors, each with its own local memory, is one example of this paradigm. In it, the control unit drives the parallel processors, which all perform the same operation in lockstep on the data in their local memory. This paradigm is natural for matrix mathematics, graphics processing, and other uniform operations on arrays of data---hence the name array processing.

The vector-processing paradigm, which is also called pipelining, involves instructions overlapped on disjoint sets of data (see Figure 2, page 103). With pipelining, additional operands can be fed into an operation before it has finished with the first because the operations are broken down into stages and the stages are processed in parallel.

You can see architecture designed for pipelining in supercomputers such as the Cray-1, which are often called vector processors because they process a vector of operands in parallel. You can also see pipeline architecture in the microprocessors of desktop computers. The Motorola 68030, for example, can fetch both instructions and data from on-chip caches, refresh the caches from offchip memory, and ready the address for an off-chip fetch, all in parallel. Pipelining in CPU architecture and operating system design is a well-established technique. Extending this technique to a computer system as a whole presents a new paradigm, presently seen chiefly in supercomputers.

Both vector processing and array processing are single-instruction, multiple-data parallelism, and consequently they exhibit the traits of SIMD. SIMD approaches typically involve a high degree of parallelism; that is, many array or vector elements are processed at once. They restrict this parallelism to a low level, such as the level of the operation. They exhibit central control and strong synchronization. As a result of these traits, SIMD approaches concentrate most of the need for special algorithms at the system level and present no special problems in communication among the parallel elements.

Figure 1: Array processing applies the same operations to many data sets at the same time, achieving a lockstep parallelism.

MIMD Parallelism

In contrast, MIMD approaches usually involve fewer but more powerful processing elements, with a medium degree of parallelism and parallelism at a higher level-the level of the task. Control is distributed, and synchronization is only occasional. As a result, MIMD paradigms raise difficult problems in communication among the parallel tasks as well as higher-level problems in the design of algorithms.

In an MIMD architecture, different parts of an application program are given to different, independent, networked processors, each with its own instruction set, each capable of performing a sequence of instructions without supervision.

At the algorithmic level, MIMD parallelism means decomposing the problem into components that can be attacked in parallel via separate processes. This is a true paradigm shift; you envision the problem differently in MIMD parallelism from the way you envision it in SIMD parallelism or in a sequential paradigm. SIMD is a paradigm shift away from SISD, but MIMD is a shift away from SIMD and it's a bigger shift.

If Henry Ford's assembly line is the metaphor for vector processing, then the array-processing metaphor is all those WACs plugging away at elementary arithmetic operations in lockstep back during World War II. MIMD parallelism is closer to the model of research within a scientific community, where, in pursuit of a common goal, individual researchers handle their own tasks, sharing information when it seems mutually beneficial-ARPAnet, including the networkers.

MIMD architectures (see Figure 3, below) are to date mostly experimental. Shared memory and nonshared memory architectures have been tried, as have been various topologies for the linking of the processors. The Denelcor Heterogeneous Array Processor and Cray X-MP are two commercial MIMD machines, and CalTech's Hypercube is a model that has been generating a lot of interest this year. But MIMD architectures are moving onto the desktop, and it is already possible to experiment with MIMD parallelism for the price of a fully loaded personal computer.

Figure 2: Vector processing, or pipelining, achieves parallelism by overlapping the components of a multicomponent process, much as automobiles are built in parallel on a Detroit assembly line.

Figure 3: MIMD (multiple instruction, multiple data) parallelism is typically asynchronous, with independent processes communicating with one another to achieve occasional synchronization.

Foreseeing this possibility, Les Hecord, of Hound Hock, Texas, sent me the following suggestion:

"Your discussion of problems in parallel computing brought to mind an idea for a major DDJ project: using minimal hardware, say 6 to 8 Z80s, interconnected in a simple network, present a series of articles using parallel techniques to solve simple (?!?) problems. If you don't want to present a hardware article (one every 2-3 years doesn't hurt), then perhaps some OEM might contract to build a minimal parallel system. Does this sound feasible? I think the hardware could be low-cost. The pay off would be exposure to software---hands-on! I might even be able to help."

I especially like that last sentence. I'm passing Les' suggestion along to Tyler and the gang. But if the goal is exposure to the programming problems of MIMD parallelism rather than the architectural issues, wouldn't it make sense to buy the hardware off the rack? This is now possible with the INMOS transputer.

Transputers

Interest in the transputer is growing. Last summer, the Macintosh SIG of the Software Entrepreneur's Forum polled its members for topics for future meetings, and the handsdown winner was the INMOS transputer.

The transputer is a 32-bit RISC chip designed by INMOS Ltd, for parallel processing. It includes a processor, local memory, and four dedicated I/O ports. Several transputers in a simple network can be used to implement MIMD parallelism. The transputer can switch between parallel tasks in a microsecond.

Transputers can be linked in a network, with a great deal of flexibility in the topology of the network and in the physical location of the nodes: transputers on the same network need not occupy the same circuit board, the same system bus, or the same city. Inter-transputer links are point to point, so the size of the network is not limited by contention problems as it would be on a common bus. And the number of transputers in the network can be increased or decreased without altering the parallel program running on the network. (See Figure 4, page 105.)

Figure 4: One transputer processor. Independent processes can run in parallel on individual transputers, communicating with one another via the four channels each transputer has.

The transputer is already being used in commercial products. Two companies have announced transputer-based boards that will significantly speed up laser printers. Both CSS Laboratories and Eidolon will be showing laser printer controller boards this year, each employing one T414 transputer as a coprocessor for throughput in the 820 page-per-minute range. Eidolon has also announced a two-transputer controller that uses parallelism to get 40 ppm throughput.

HRC Micro Organization Ltd, has a CAD product called Intervision that uses transputers to achieve a 10 to 40 times speed improvement. Atari demonstrated a prototype of a transputer-based product, the Abaq computer. Last November (at Comdex, a terrible place to introduce genuinely new ideas). The Abaq will use T800 transputers (it has space for a dozen) and run Helios, a Unix-like operating system, with an MS-DOS emulation mode that should run DOS programs faster than an AT. Penguin Software is developing Unix-based transputer development tools. And Levco is selling transputer boards for personal computers.

It's Levco's boards I had in mind when I alluded to building a parallel-processing system from off-the-rack parts. Levco allows you to turn a Mac II or SE into a parallel-processing system with a package it calls TransLink. TransLink consists of a bus card. transputer modules, and either MPW-compatible development software (C compiler, transputer assembler, loader, linker) or the occam development system from INMOS. The transputer modules include 256K to 4 Mbytes of RAM in four SIMM sockets and one transputer (T414 15 MHz, T414 20 MHz, or T800 20 MHz with 64-bit IEEE floating-point support). The bus card for the Mac II can hold up to four transputer modules, whereas the SE card can hold only two.

Fully loaded with five TransLink Nubus-compatible boards, holding a total of 20 20-MHz T-800 transputer modules, each with from 256K to 4 Mbyte of its own EAM, a Mac II would reach a throughput of nearly 200 MIPS. (If only I weren't already in debt up to my neck for the Mac II. ...)

Coming down from the clouds, I should explain that Levco's boards provide an opportunity for developers to explore the largely unknown territory of MIMD algorithms.

MIMD Programming

The domain of MIMD algorithms is still wide open. There are many hard problems to be solved, including questions of appropriate topology, general techniques for functional decomposition, and specific MIMD algorithms for familiar problems. Small-scale functional decomposition, with just a few processes running in parallel, can produce performance unattainable in any other way.

It is only fair to point out, though, that there are those who doubt the benefits of large-scale MIMD parallelism. Their arguments rest on a strict upper bound on those benefits: the limit set by Amdahl's law.

Sources

I am looking forward to my first issue of a journal devoted to LISP,called LISP and Symbolic Computation: An International Journal, edited by Richard Gabriel and Guy Steele. You can order it from Jan Zubkoff, IASC, Lucid Inc., 707 Laurel St., Menlo Park, CA 94025; 415-329-8400. Steele and Gabriel are, among other things, the authors of two important books on LISP-Common LISP: The Language by Guy L. Steele, Jr. (Digital Press, 1984) and Performance and Evaluation of LISP Systems by Richard P. Gabriel (MIT press, 1985).

The Prolog language gets its name from the exaggerated notion that you are programming in logic when you use it. A new book removes some of the exaggeration by showing how to use and extend Prolog to do true logic programming; it's Computing with Logic: Logic Programming with PROLOG by David Maier and David S. Warren Benjamin/Cummings, 1988).

A journal on neural network technology is available from (TK). A conference on neural nets will also be held in San Diego, Calif., from July 24-27. It's sponsored by the IEEE and you can get more information from Nomi Feldman at 619-453-6222.

The big conference for the artificial-intelligence community is AAAI, held this year in St. Paul, Minn., from August 22-26. Call 415-328-3123.

Philosopher of science Thomas Kuhn made the term paradigm mean so much in his landmark book The Structure of Scientific Revolutions (University of Chicago Press, 1962). The disclaimer quoted at the beginning of this column is from the postscript in the second edition of this book, published in 1970. Robert Floyd's Turing Award lecture appears in ACM Turing Award Lectures, The First Twenty Years: 1966-1985 (ACM Press, 1987)---a wonderful book. Most of the lectures in it not only deserve reading but also reward rereading. ---M.S.

Amdahl's law states that parallel speedup (the ratio of the speed of processing using any parallel technique whatsoever to the speed of the strictly sequential approach) is bounded above by 100/[(100-f)+f/p], where f is the percent of the total work that can be done in parallel and p is the number of processors. The value of f cannot be 100 because some communication and control overhead is necessarily not parallelizable. This means that, with five processors and an algorithm that is 75 percent parallelizable, the maximum speedup attainable is only 2.5. Adding more processors will only increase the speedup gradually toward a final limit (for this particular algorithm( of 4.0. Given that communication overhead can increase rapidly with the number of processors in an MIMD system, the value of additional processors drops quickly to nothing in this scenario.

Nevertheless, even large-scale MIMD parallelism is worth looking into. The problem with the scenario just discussed was the 75 percent parallelizability. As the value of fin Amdahl's equation approaches 100, the entire expression approaches linearity; that is, n processors yield an n-fold speedup. There are experimental results showing that in interesting cases near linearity is in fact attainable. For one example, see McBurney's comments in PAJILE, Parallel Architectures and Languages Europe (Springer-Verlag. 1987).

Relative speedup, which is the speedup due to parallelization minus overhead, approaches linearity as the number of processes (that is,tasks, not processors increases (but it has to increase a lot). Of course, it is exactly the nonparallelizable overhead that sets the limit in Amdahl's equation; but when, as is the case in a wide variety of problems discussed by McBurney, the overhead cost becomes negligible with increasing problem size, then even largescale MIMD parallelism suddenly makes sense.

MIMD puts more burden on the application programmer to partition the program into appropriate components for parallel execution. Languages have been developed to facilitate this partitioning, some of them deriving from a model developed by C.A.R. Hoare called Communicating Sequential Processes, or CSP.

CSP

Hoare's model is of two or more independent processes, each consisting of a sequence of instructions executed sequentially. The processes are no different from programs written in any conventional sequential programming language, except when they must communicate with one another. In Hoare's model, this can only happen when each of two processes desires to communicate with the other process, and he calls this synchronization of desires a rendezvous.

Hoare's rendezvous is a strict synchronization, and by itself it would defeat one of the canons of parallelism, which is to keep all processes busy as much of the time as is possible. To free a process that might otherwise be locked up waiting for a rendezvous, Hoare adopted nondeterministic alternation and repetition constructs from Dijkstra.

These nondeterministic constructs require something called a guard.which is a Boolean expression that precedes a command. Only if the guard is true is the command executed, but its truth does not guarantee execution. The guarded alternation construct causes at most one of a set of guarded commands to be executed, with that one selected at random from those commands with true guards. The guarded repetitive construct causes all commands with true guards to be executed until no guards are true.

Guarded alternation and repetition allow a process, for example, to pass on data to any other available process or to get data for processing any time any other process has data to deliver.

Together, the concepts of sequential processes communicating via rendezvous and nondeterministic alternation and repetition make up a model for MIMD parallelism that is both provable and a solid basis for a programming language. One such language, and the natural one for developing transputer-based software, is occam.

Occam

"Occam's Razor slices things down to simplest causes. Single causes have a fair chance of being right."---from "Occam's Scalpel" by Theodore Sturgeon (Worlds of IF, August 1971).

Occam's razor (or sometimes Ockham's razor is the principle of ontological economy, attributed to William of Ockham (or Occam), circa 1285-1349, an English Franciscan, heretic, and philosopher best known among philosophers for his antirealist interpretations of universals. To the rest of us, if he is known at all, it is for Occam's razor, which states that, in explaining nature, entities should not be multiplied beyond necessity: the simplest explanation is the best.

The programming language occam (with a lowercase o) was designed by people at INMOS for writing parallel programs to run on the INMOS transputer processors. Occam supports parallel processing with only a few new programming entities. So clean is the occam implementation that major portions of the occam development system have been formally proved correct.

Occam implements the key concepts of CSP directly by the use of constructors. Sequences of commands (also called primitive processes) are grouped into sequential processes using the constructor seq. concurrency among such processes is indicated by the constructor par. and nondeterminism is introduced with the constructor alt. Nondeterministic alternation and repetition are implemented using the alt constructor in conjunction with Pascal-like alternation and repetition constructs such as if and while. Communication between processes follows the CSP rendezvous model, with two processes choosing a common channel on which to signal or pass data.

Although occam is a high-level language, it is also in some sense the machine language of the transputer, which was designed using occam. The two are closely linked. In particular, occam processes are run on individual transputers, and occam channels map directly onto the physical transputer links.

In Example 1, page 115. I give a taste of occam code. The three processes INPROC, BUFFPROC, and OUTPROC run in parallel, with INPROC gathering data and passing it to BUFFPROC, which in turn passes it to OUTPROC for output. Communication between processes in CSP occurs at a rendezvous and there are two in this example: when BUFFPROC calls INPROC and vice versa, and when BUFFPROC calls OUTPROC and vice versa. Note that indentation is not optional in occam.

I hope this taste of parallel paradigms has been useful and entertaining. Thinking through parallel approaches can be enlightening even if you never expect to work on a parallel system. Several programmers have pointed out the possibilities for learning new sequential approaches from studying parallel techniques. And it adds to your paradigmatic breadth.

WHOOPS, or What is Object-Oriented Programming?

One programming paradigm that has become extremely popular of late is object-oriented programming.

Object-oriented is the current vogue word, displacing structured as synonymous with whatever is good and true and beautiful in programming. Smalltalk, Actor, Simula67, and C++ are described as object-oriented; there have been claims that Ada, LOOPS, and APL are object-oriented languages; and there is much talk of object-oriented design in Pascal, Modula-2, C, and Forth. Just what is the object-oriented paradigm? What features does a language need to have in order to be said to support the paradigm? And what does it mean to say that one is programming in the paradigm?

Bjarne Stroustrup has come up with a set of definitions that help to clarify the issue, at least if you accept them. I do, chiefly because they give a clear picture of object-oriented programming as a paradigm indicating what kinds of puzzles the universe sets an object-oriented programmer.

Stroustrup takes pains to distinguish the object-oriented paradigm from the paradigms of data hiding and data abstraction. Data hiding, he says, boils down to using modules. You can achieve the effect of data hiding in C, but Modula-2 makes the module a fundamental language construct. Stroustrup says that Modula-2 supports data hiding but C only enables it.

Data abstraction according to Stroustrup means programming with user-defined types. Any programming language that provides the means for doing this supports the data abstraction paradigm-Ada and C++, for two examples.

One thing that the data-abstraction paradigm does not permit is expressing a distinction between the properties of a type and the properties of instances of the type. Object-oriented languages. Stroustrup says, are those that support expressing this distinction, such as Smalltalk. The mechanism for doing this is inheritance.

Object-oriented programming is, for Stroustrup, just programming using inheritance, and it is, roughly, a superset of the other paradigms. Smalltalk is certainly an object-oriented language, or rather an object-oriented environment. Strictly speaking, no language can be object-oriented by itself; object-oriented programming requires support from a programming environment as well as support from a language. I'm not sure where this leaves HyperTalk, which seems to have an inheritance structure but which does not allow creation of new types of objects. In any case, he breaks down the object-oriented paradigm further, as follows:

decide which classes you want

provide a full set of operations for each class

make commonality explicit by using inheritance

The benefits of the object-oriented paradigm come from the exploitation of the commonality among types, and identifying commonality in the problem is the chief task that the paradigm sets the programmer.

Example 1: Communicating processes in CSP.

INPROC::
    ...
    (* input X*)
SUFFPROC : X
    ...
OUTPROC::
    ...
    BUFFPROC ? X
    (* output X *)

Stroustrup's definition of object-oriented programming is not the same as David Robson's in Robson's classic Byte article of August 1981, in which he presents the class/instance distinction (and consequently inheritance) as optional. But Robson I seems to be defining a concept rather than a paradigm. It also does not match Brad Cox's view of the centrality of the software IC metaphor to object-oriented programming. And it is at variance with Geoffrey Pascoe's view, expressed in Byte in August 1986 that information hiding, data abstraction, dynamic bin ding, and inheritance are all defining features of object-oriented programming. But Stroustrup argues that information hiding and data abstraction are subsumed within inheritance.

One thing that Stroustrup's definition does provide is a picture of the kind of puzzles the object-oriented programmer faces as a result of being an object-oriented programmer.

The object-oriented programmer selects a set of classes, provides operations for each class, and then sets out to identifying commonality in the problem in order to make it explicit by using inheritance.

It's explicitly a definition of object-oriented programming as a paradigm.

Vote for your favorite feature/article. Circle Reader Service No. 6.

Sources

A column such as this can't begin to deal with its subject matter in the depth that, say, a column on C programming for MS-DOS can. I have only touched on many important issues in this installment. So here are some places where you can find the issues treated in greater depth.

You can find a good overview of parallel programming techniques and architectures in Parallel Programming by R.H. Perrott (Addison-Wesley, 1987). David Harel's excellent Algorithmics: The Spirit of Computing (Addison-Wesley, 1987) has a good section on some of the issues in parallel processing.

I also found several books from Springer-Verlag very useful, including WOPPLOT 86, Parallel Processing: Logic, Organization, and Technology (Springer-Verlag, 1987) and PARIE, Parallel Architectures and Languages Europe (Springer-Verlag, 1987), which report on conferences held, respectively, in Neubiberg, Federal Republic of Germany, in 1986 and in Eindhoven, the Netherlands in 1987.

The best source of information about the INMOS transputer is INMOS itself, through various technical notes. The U.S, contact is INMOS Corp., P.0. Box 16000. Colorado Springs, CO 80935; 303-630-4000.

Levco has video training and a developer group for learning about parallel processing, the transputer, and Levco's Macintosh boards. Levco is located at 6160 Lusk Blvd., Ste. C-100, San Diego, CA 92121; 619-457-2011. Levco's boards are not the only transputer implementation you might want to investigate; Definicon is another company that is doing interesting things with the devices. Definicon is located at 1100 Business Center Circle, Newbury Park, CA 91320; 805-499-0652.

Dick Pountain and David May give a very readable introduction to the transputer's high-level machine language, occam, in their A Tutorial Introduction to Occam Programming (INMOS/BSP Professional Books, 1987). Occam is still nearly undiscovered; this was one of only four books devoted to occam that I was able to find in a search of the entire University of California library system. Perrott's book also gives a clear introduction to Hoare's CSP and to occam as an implementation of CSP's central concepts as well as showing how parallel processing is implemented in a number of other languages, including Ada and Pascal Plus.

For explicit definitions of object-oriented programming, I drew upon "What Is Object-Oriented Programming," an invited lecture by Bjarne Stroustrup reprinted in ECOOP '87: European Conference on Object-Oriented Programming (Springer-Verlag, 1987); Object-Oriented Programming: An Evolutionary Approach by Brad Cox (Addison-Wesley, 1986); and the August 1981 and August 1986 issues of Byte.

But object-oriented programming is better defined by example in such sources as Smalltalk-80, the Language and its Implementation by Adele Goldberg and David Robson (Addison-Wesley, 19831; The C + + Programming Language, by Bjarne Stroustrup (Addison-Wesley, 1986); and the documentation for Digitalk's Smalltalk-V and The Whitewater Group's Actor. Digitalk Inc, is located at 9841 Airport Blvd., Los Angeles. CA 90045; and The Whitewater Group is at Technology Innovation Center, 906 University Place, Evanston, IL 60201.

Among the sources of continuing education for object-oriented programmers are OOPSLA and the new Journal of Object-Oriented Programming, which you can get from SIGS Publications, 310 Madison Ave., Ste. 503, New York, NY 10017; 212-972-7055. OOPSLA, the main conference on object-oriented programming, takes place this year in San Diego, Calif., September 25-29. The contact person is Barbara Noparstak, Digitalk Inc.; 213-645-1083.

This column had nothing to say this month about artificial-intelligence paradigms, such as functional programming, logic programming, and neural nets. The least I can do is to mention some good sources in these areas.