"IBM Neural Network Chips Make Everything Obsolete" blared the headline. In the April 4, 1988, InfoWorks, columnist John Gantz blew the lid off IBM's new strategy for eliminating the need for programmers with trainable neural network computers.
Of course, it was an April Fools' gag. You just knew it had to be, even though it was three days late for April Fools', because Gantz is too skeptical to fall for such wild claims. Sort of like DDJ readers. When it comes to choosing between blue-sky prognostication and solidly grounded technique, DDJ readers typically put ground over sky.
So, why (in an informal survey I conducted last year) was parallel processing the topic most readers wanted to see more of in the magazine? Sort of blue-sky, isn't it? This was hardly a topic of great immediate practical value to a professional programmer, considering how few parallel architecture machines there are.
The trick, I think, is the word "immediate." After all, to a long-distance runner, the only activity of immediate practical value is performing well in races. But no runner spends as much time competing as he or she does practicing and working out. I try to address that need in this column: to point out the new exercise equipment, and (when I can) to give some suggestions for its use.
In the first two installments, I touched on (superficially) several programming paradigms: object-oriented programming, logic programming, and communicating sequential processes. Although the intent here is to explore new intellectual tools rather than to develop mastery of familiar ones, I hope to get deeper into each of these and other paradigms in subsequent columns.
This month I'm going to show an algorithm for a new paradigm of programming. But first, I have to talk about current research in cognitive psychology.
Well, first, to show what's entailed in a clash of paradigms. I did graduate work in cognitive psychology a few years ago, and since that time, a new paradigm has come into prominence in that field. This new paradigm challenges some operating assumptions of the field, assumptions that I took seriously back then. I believe that, in the small resolution that is going on today in cognitive psychology, there is a useful example of what happens when paradigms collide-useful because there are programming paradigms on a collision course today.
Second, to give an application-oriented perspective on the aforementioned algorithm. You see, the new psychological paradigm is also the new programming paradigm, neural networks. As it happens, from a paradigmatic point of view, Gantz's joke may not be so funny after all.
Before the psychology, a reminder is in order about paradigms and why we should care about them. In 1962, philosopher of science Thomas S. Kuhn published a book titled The Structure of Scientific Revolutions, which enjoyed a great vogue in the late sixties and early seventies in undergraduate and graduate curricula. DDJ associate editor Ron Copeland says he read Kuhn for four different classes. I think I only read him three times, but I read him carefully.
Kuhn advanced the then---controversial thesis that the kind of science a community of scientists will produce depends on their shared values, terminology, techniques, model problems, and concrete examples of how to solve such problems. He subsumed these things under the term "paradigm," and described how disorienting it was to move from one paradigm to another, and how difficult it was for those working within one paradigm to communicate with those working in another.
With the pain came gain. Kuhn said, "Paradigm shifts---troubled periods in which basic assumptions of a discipline are being upset-are one of the means by which science progresses."
It was Kuhn's insight to apply this concept of paradigms to the growth of science. Its applicability to an engineering discipline such as software development is more obvious. It's not really controversial to note that those working with different tools and assumptions and languages will see different problems to solve and will create different kinds of artifacts. It's not controversial, but it is important.
The differences will be no less real if they are invisible to the user. If one programmer cooks a spaghetti-code general-ledger program in Basic and another programmer models the elements of a pre-existing paper general ledger in Smalltalk, the delivered goods may look identical to the user. (Not likely, but it is possible.( But the program's result will, in fact, be very different things, as will be evident to other programmers hired to maintain them.
The fact that the program that you create depends on the paradigm within which you create it should matter to you. I'm sure it does. But it also implies that you should know what paradigms are available, understand them well enough to know what problems they solve, and have the flexibility to move from paradigm to paradigm at will.
Another reason to understand other paradigms is communication. As I have mentioned before, I think that we are seeing a paradigmatic broadening of the discipline of programming. If programmers contInue to be educated (and to educate themselves) within narrow paradigm boundaries, it will become increasingly difficult for programmers to learn from one another.
Paradigm differences run deeper than just different programming languages and algorithms. Learning object-oriented programming, for example, is not just a matter of picking up some new techniques. If you've spent your professional life thinking that programming is really a matter of finding the right algorithm and implementing it efficiently, object-oriented programming will seriously warp your thinking.
This is something similar to what some psychological researchers are facing today.
The text for this month's exploration into psychology is Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volumes 1 and 2, by David E. Rumelhart, James L. McClelland, and the PDP Research Group (A Bradford Book; MIT Press, 1986). I'll synopsize it, along with the mainstream of cognitive psychology against which it flows.
Rumelhart, et al., present a class of models (called parallel distributed processing or PDP, models) in which the mechanism of information processing is assumed to be the interaction of large numbers of simple processing elements, each sending excitatory and inhibitory signals to other units. The units may correspond to various mental entities (hypotheses, goals, or potential actions) or aspects or features of such entities.
PDP is more or less equivalent to neural networks. The models the authors subsume under the name PDP all have the following:
But Why, Mike?
What's This Paradigm Stuff?
A Crash Course in Cognition
This is really a model of the network of real neurons that make up the brain, although the neurons of the models are highly idealized (see Figure 1, page 132). One of the important ways in which the individual models differ is in the mechanism of learning. But they do agree in this: learning in a PDP model is a purely local phenomenon. No "supervisor" rewards the elements for good choices. All levels of learning are built from simple, strictly local feedback mechanisms.
One problem that any theory of cognitive processes must face is the apparently vast gap of levels of processing from the cognitive down to the neurological.
On the one hand, it doesn't make much sense to develop models of thought that are in conflict with what little we know about the functioning of neurons in the central nervous system.
But on the other hand, to assume that the same principles of organization that explain the interaction of neurons will also explain how thoughts interact when we read a novel seems like wishful thinking (like imagining that colleges could replace chemistry courses with courses in physics). Chemists may accept that their science is in some sense implicit in the science of physics, but they still feel that chemical problems require chemical solutions.
No research psychologist today rejects the reductionist principle that complex cognitive behavior is based on the functioning of neurons in the central nervous system. But in the 1970s, many scientists began to question whether there was any practical significance in that principle. Cognitive psychology in the past two decades was increasingly likely to seek purely cognitive models for cognitive processes.
This is tough. It's tough to bootstrap a science. As a graduate student, I designed an experiment to distinguish between storage and retrieval techniques in the comprehension of structured information. The experiment was cleanly designed, I think. Because of necessary temporal sequences, its results should have clearly established whether certain things were happening during the storage or the retrieval of the information from memory.
What my experiment lacked (I discovered too late) was a solid theoretical foundation. Yes, it distinguished some sort of storage effect from some sort of retrieval effect, but without some accepted model of what was stored and retrieved, I couldn't communicate any useful information to other researchers. There were models in the light of which I could have interpreted my results. Since I had not designed my experiment with any of those models in mind, my results were hard to compare or connect with the results that those models produced. I was in a giddy state of "paradigmlessness. The words "skew" and "incommensurable" come to mind.
Fortunately others had better luck than I with cognitive psychology. But my experience is suggestive of what some researchers may be feeling today.
Figure 1: Idealized neuron of a neural net model.
The PDP model does assume the same principles of organization that explain the interaction of neurons will also explain how thoughts interact when we read a novel. In this way, it is in conflict with the cognitive psychology I learned. But conflict is the wrong word. Some of the kinds of experiments that the PDP researchers are doing produce results that are incommensurable with other results, in several ways:
All of this makes it hard for those working within a different paradigm even to talk with PDP researchers.
Programming Paradigms, right. So how does all this tie into programming? Aside from the putative benefit of understanding what paradigm clashes mean in a different context, do neural networks have any real interest for programmers? I think they do, and I think they may one day raise some disturbing questions about just what it means to be a programmer.
Many people are now beginning to take neural networks seriously as a programming technique. A 1986 conference sponsored by the American Institute of Physics in Snowbird, Utah, drew 160 people. A friend told me while I was writing this column that his chief programmer had just quit to develop neural networks.
Figure 2: A NAND gate can be simulated by a neural network node with a threshold -3 and tow inputs of weight -2.
What kinds of problems can and can't be solved by neural network techniques? That's pretty straightforward. Neural networks can solve any conventional computational problem. The proof of this is also straightforward. First, note that a computational problem can be represented by a set of Boolean functions. Second, recall that any Boolean function can be built entirely from two-input NAND gates. If a NAND gate can be simulated by a component of a neural network, it follows that neural networks are formally capable of solving any problem solvable by computer.
This happens to be the case. Figure 2, on page 134, shows how to model a NAND gate in a neural network.
All right, but what kinds of problems can a neural network approach solve efficiently?
The first limitation is an absolute one. Neural networks are instances of parallel processing. They can be expected to produce benefits in those areas in which parallel processing is potentially beneficial, but they can produce no gain where parallelism is not beneficial. If processing power is more precious than time, a sequential solution is the right solution.
When time is a factor, the best that a neural network (or any parallel approach) can do is to reduce processIng time by a factor equal to the number of processors. For structured problems (i.e., problems with relatively short algorithms) Abu-Mostafa argues that the efficiency of neural networks is likely to be much less than this. For problems requiring long algorithms (what are called random problems) the efficiency may be reasonable (i.e., polynomial in the number of processors).
A tentative conclusion is that neural networks are more useful for large, random problems.
This is supported by the fact that neural networks are built of very simple processing units, so the processors should ultimately be cheap and plentiful. They could be particularly cheap and plentiful if implemented in optical-device technology, for which neural networks look like an ideal candidate. The size-of-problem issue gets more definite when you consider the algorithm embedded in a neural network solution. Abu-Mostafa says that the time complexity of the problem is accommodated by the number of steps, the space complexity by the number of processing units, and the Kolmogorov complexity (or complexity of algorithm) is accommodated by the degrees of freedom (or information capacity) of the synaptic connections, These measures of complexity are not independent, and, in fact, the Kolmogorov complexity will be very large if the space complexity is large.
Thus, a problem that is demanding in terms of space complexity, but modest in terms of Kolmogorov complexity, will waste a great deal of information capacity. Exponential-time problems (like the Traveling Salesman Problem) similarly waste capacity in neural nets.
What this means is that problems requiring a lot of computation time or memory, but having simple algorithms, will use neural networks very inefficiently. Problems that require very large algorithms will make better use of neural networks. Pattern recognition in natural environments is one example of the latter kind of problem.
Another area in which neural networks could be useful is in the unsupervised search for solutions to open problems. The hope of neural network research in psychology is that very complex and high-level processes (i.e., complex algorithms) can be built up from simple mechanisms without direction or supervision.
In psychology, this amounts to a very strong claim about the nature of human learning. In programming, it brings us to the edge of a potential paradigm clash.
To many of us, computer programming has long meant selecting or developing the algorithm or algorithms to solve the problem. The object-oriented programming and declarative programming paradigms challenge that view by putting the emphasis on modeling the problem accurately. But how would we feel if we only had to state the problem and wait while the system developed an algorithm? And what if we couldn't understand the algorithm?
Remember the reaction among mathematicians to the unorthodox computer-assisted solution to the four-color theorem? Neural network technology will never make programmers obsolete, but it may one day present them with an identity crisis. There's one elementary point about the efficiency of neural networks that I haven't stated explicitly: neural network techniques require parallel architectures. There are few computers capable of supporting parallel processing today, and neural net algorithms like that presented here are of absolutely no practical value without true parallelism.
Well, there is one "practical" value. They're good workout room equipment for learning about new paradigms and flexing your intellectual muscles.