PROGRAMMING PARADIGMS

A Case Study in Paradigm Clash.

This is really a model of the network of real neurons that make up the brain, although the neurons of the models are highly idealized (see Figure 1, page 132). One of the important ways in which the individual models differ is in the mechanism of learning. But they do agree in this: learning in a PDP model is a purely local phenomenon. No "supervisor" rewards the elements for good choices. All levels of learning are built from simple, strictly local feedback mechanisms.

Levels of Processing

One problem that any theory of cognitive processes must face is the apparently vast gap of levels of processing from the cognitive down to the neurological.

On the one hand, it doesn't make much sense to develop models of thought that are in conflict with what little we know about the functioning of neurons in the central nervous system.

But on the other hand, to assume that the same principles of organization that explain the interaction of neurons will also explain how thoughts interact when we read a novel seems like wishful thinking (like imagining that colleges could replace chemistry courses with courses in physics). Chemists may accept that their science is in some sense implicit in the science of physics, but they still feel that chemical problems require chemical solutions.

No research psychologist today rejects the reductionist principle that complex cognitive behavior is based on the functioning of neurons in the central nervous system. But in the 1970s, many scientists began to question whether there was any practical significance in that principle. Cognitive psychology in the past two decades was increasingly likely to seek purely cognitive models for cognitive processes.

This is tough. It's tough to bootstrap a science. As a graduate student, I designed an experiment to distinguish between storage and retrieval techniques in the comprehension of structured information. The experiment was cleanly designed, I think. Because of necessary temporal sequences, its results should have clearly established whether certain things were happening during the storage or the retrieval of the information from memory.

What my experiment lacked (I discovered too late) was a solid theoretical foundation. Yes, it distinguished some sort of storage effect from some sort of retrieval effect, but without some accepted model of what was stored and retrieved, I couldn't communicate any useful information to other researchers. There were models in the light of which I could have interpreted my results. Since I had not designed my experiment with any of those models in mind, my results were hard to compare or connect with the results that those models produced. I was in a giddy state of "paradigmlessness. The words "skew" and "incommensurable" come to mind.

Fortunately others had better luck than I with cognitive psychology. But my experience is suggestive of what some researchers may be feeling today.

Figure 1: Idealized neuron of a neural net model.

The Agony of Incommensurability

The PDP model does assume the same principles of organization that explain the interaction of neurons will also explain how thoughts interact when we read a novel. In this way, it is in conflict with the cognitive psychology I learned. But conflict is the wrong word. Some of the kinds of experiments that the PDP researchers are doing produce results that are incommensurable with other results, in several ways:

1. Although the book identifies many high-level mental phenomena as appropriate for PDP modeling, conclusive research results in PDP research are likely to come at the lower levels of processing. If clusters of neurons can be shown to act in accord with the models, the research will provide a base of accepted results on which to build experiments addressing higher levels of processing. Until then, the cleanest results in PDP research are likely to be at a level incommensurable with results in more traditional cognitive research. PDP research may produce excellent oranges to compare with the apples of cognitive psychology.

2. Many of the studies cited in the book are computer simulations. The founders of PDP models did not introduce simulation as a psychological research tool, but they do have to deal with the problems of the approach. A simulation is a theory; it makes predictions that are confirmed or refuted by examining the real-world process being simulated. Ideally, if the simulation doesn't match (on the appropriate measures) what it is supposed to simulate, the theory should be rejected. In practice, when a new paradigm is being explored, a faulty simulation will just get tweaked until it works. In the case of PDP models, you could tweak the following:

The pattern of connectivity

The initial state of activation

The activation rule

The output function

The learning mechanism

With so much freedom, you could make any simulation work. PDP researchers will have to nail down which of the variables in the simulation are free parameters and which are fixed. Essentially, they have to sort out the neural software from the firmware. Only then will they be able to build testable theories. It is typical in the development of a theory to try various versions to see which is fruitful. But this makes PDP (like a political candidate who changes his positions on the issues) a moving target for alternative candidates.

3. Finally, if PDP work proves fruitful, and if I am right about the need to work upward from low-level processes in PDP research, traditional researchers may have to abandon their accustomed research techniques, tools, and subject matter. A psychologist who had been studying attention and perception at a fairly high level could find himself focusing his career in on saccadic eye movements. He might not find this a comfortable move.

All of this makes it hard for those working within a different paradigm even to talk with PDP researchers.

But The Title of The Column Is...

Programming Paradigms, right. So how does all this tie into programming? Aside from the putative benefit of understanding what paradigm clashes mean in a different context, do neural networks have any real interest for programmers? I think they do, and I think they may one day raise some disturbing questions about just what it means to be a programmer.

Many people are now beginning to take neural networks seriously as a programming technique. A 1986 conference sponsored by the American Institute of Physics in Snowbird, Utah, drew 160 people. A friend told me while I was writing this column that his chief programmer had just quit to develop neural networks.

Figure 2: A NAND gate can be simulated by a neural network node with a threshold -3 and tow inputs of weight -2.

What Are Neural Nets Good For?

What kinds of problems can and can't be solved by neural network techniques? That's pretty straightforward. Neural networks can solve any conventional computational problem. The proof of this is also straightforward. First, note that a computational problem can be represented by a set of Boolean functions. Second, recall that any Boolean function can be built entirely from two-input NAND gates. If a NAND gate can be simulated by a component of a neural network, it follows that neural networks are formally capable of solving any problem solvable by computer.

This happens to be the case. Figure 2, on page 134, shows how to model a NAND gate in a neural network.

All right, but what kinds of problems can a neural network approach solve efficiently?

The first limitation is an absolute one. Neural networks are instances of parallel processing. They can be expected to produce benefits in those areas in which parallel processing is potentially beneficial, but they can produce no gain where parallelism is not beneficial. If processing power is more precious than time, a sequential solution is the right solution.

When time is a factor, the best that a neural network (or any parallel approach) can do is to reduce processIng time by a factor equal to the number of processors. For structured problems (i.e., problems with relatively short algorithms) Abu-Mostafa argues that the efficiency of neural networks is likely to be much less than this. For problems requiring long algorithms (what are called random problems) the efficiency may be reasonable (i.e., polynomial in the number of processors).

A tentative conclusion is that neural networks are more useful for large, random problems.

This is supported by the fact that neural networks are built of very simple processing units, so the processors should ultimately be cheap and plentiful. They could be particularly cheap and plentiful if implemented in optical-device technology, for which neural networks look like an ideal candidate. The size-of-problem issue gets more definite when you consider the algorithm embedded in a neural network solution. Abu-Mostafa says that the time complexity of the problem is accommodated by the number of steps, the space complexity by the number of processing units, and the Kolmogorov complexity (or complexity of algorithm) is accommodated by the degrees of freedom (or information capacity) of the synaptic connections, These measures of complexity are not independent, and, in fact, the Kolmogorov complexity will be very large if the space complexity is large.

Thus, a problem that is demanding in terms of space complexity, but modest in terms of Kolmogorov complexity, will waste a great deal of information capacity. Exponential-time problems (like the Traveling Salesman Problem) similarly waste capacity in neural nets.

What this means is that problems requiring a lot of computation time or memory, but having simple algorithms, will use neural networks very inefficiently. Problems that require very large algorithms will make better use of neural networks. Pattern recognition in natural environments is one example of the latter kind of problem.

Programming Paradigms Clash

Another area in which neural networks could be useful is in the unsupervised search for solutions to open problems. The hope of neural network research in psychology is that very complex and high-level processes (i.e., complex algorithms) can be built up from simple mechanisms without direction or supervision.

In psychology, this amounts to a very strong claim about the nature of human learning. In programming, it brings us to the edge of a potential paradigm clash.

To many of us, computer programming has long meant selecting or developing the algorithm or algorithms to solve the problem. The object-oriented programming and declarative programming paradigms challenge that view by putting the emphasis on modeling the problem accurately. But how would we feel if we only had to state the problem and wait while the system developed an algorithm? And what if we couldn't understand the algorithm?

Remember the reaction among mathematicians to the unorthodox computer-assisted solution to the four-color theorem? Neural network technology will never make programmers obsolete, but it may one day present them with an identity crisis. There's one elementary point about the efficiency of neural networks that I haven't stated explicitly: neural network techniques require parallel architectures. There are few computers capable of supporting parallel processing today, and neural net algorithms like that presented here are of absolutely no practical value without true parallelism.

Well, there is one "practical" value. They're good workout room equipment for learning about new paradigms and flexing your intellectual muscles.