PROGRAMMING PARADIGMS

Background on Backprop

Michael Swaine

Last month I reported on a conversation with engineer Hal Hardenbergh about his interest in neural networks. I've also been talking with Hardenbergh's software cohort in neuraldom, Tom Waite, as well as with neural net algorist Dave Parker and engineer/programmer/writer Jurgen Fey about neural nets, transputers, and the Occam language. Next month I'll report on those conversations and take a more algorithmic look at neural nets.

This month I'm stepping out of the interview format to present some background that I hope will put last month's and next month's columns in perspective. Last month's column touched only tantalizingly on some issues, such as the present-day practical uses of neural net technology. There are some remarkably mundane as well as some cutting-edge uses to which nueral net insights have been put in the past twenty-five years.

Also, the interview format may have made it hard to be sure, in reading last month's column, just what was historical or technological fact and what was Hardenbergh's perspective (however valid and interesting that perspective might be). This month's neural net backgrounder should clear that up. The very fact that a hardware engineer would get interested in what has been referred to as "the parapsychology of AI," a fact that I presented as somewhat surprising, in fact has a history of its own, and in the perspective of that history is not so surprising after all.

Lately in this column I have been asking software developers (and last month an engineer) why they are pursuing certain approaches to software development rather than other approaches. This month I guess I'm asking myself "Why neural networks?" You'll find only one reference listed at the end of this column, because all the articles I drew upon can be found in the omnibus collection by Anderson and Rosenfeld. I recommend it to anyone interested in the history and present state of neural network research and development.

Where Did It Begin?

The perception model proposed by F. Rosenblatt in 1958 was the beginning of all modern neural network research. It already contained most of the interesting elements present in today's neural nets.

It described an artificial nervous system. It combined cells into several connected layers: A "retina" where input signals arrived, an "association layer" where retinal cells connected, and a "response layer." Connections between association-level and response-level cells were bidirectional, permitting feed-back that allowed the perceptron to learn. The goal of the operation of the perceptron was to learn to activate the right response layer for the given input.

Rosenblatt also focused on the kind of problem that occupies most neural networks today: The classification of interesting patterns of inputs. It was a sufficiently difficult problem that it still challenges neural net developers; it was also sufficiently difficult to cause serious problems for Rosenblatt's perceptron when two AI experts subjected the perceptron to rigorous analysis.

Where are the Practical Applications?

As I was working on this column, I got a call from Hal Hardenbergh. You know what new explosives detector they're using at JFK International Airport? he wanted to know. Yep. Well, it's a neural net. Coming as it did two days before I would be standing in a line at JFK to board a flight to Europe, Hardenbergh's bulletin took on a personal significance for me.

There are more neural net devices in use than meet the eye. A fair amount of neural net research and development is DoD work, and we either don't hear about it or hear about it only obliquely. Sometimes presentations at neural network conferences seem to be delivered in an obscure code. What is this about recognizing faces in the fog? Wait, if you substitute "smoke" for "fog" and "tank" for "face," does it begin to make more sense?

One of the biggest success stories to come out of neural network research is adaptive switching circuits, described by Bernard Widrow and Marcian E. Hoff in 1960. Hoff is Ted Hoff, the inventor (if you live west of the Rio Grande) of the microprocessor.

The perceptron learned by changing its coupling coefficients in response to error feedback regarding its immediate past classifications. But many of the proposed perceptron learning rules were impractically slow in converging to the coefficients that would give correct classifications. Widrow and Hoff developed what they called an adaptive neuron, related to perceptrons, that converged to correct classification quickly. One novel feature of Widrow and Hoff's neuron was that it continued to learn even when it was emitting correct responses.

Widrow and Hoff built a lunchbox-sized adaptive pattern classification machine to demonstrate their adaptive neuron's learning behavior. They originally called the box Adaline, which stood for either Adaptive Linear Neuron or Adaptive Linear Element, depending on how comfortable they felt about neural net research when they were discussing it.

But it was not learning lunchboxes that proved the value of Widrow and Hoff's technique. The error correction algorithm they used is called "least mean squares," or LMS, because it involves minimizing the square of the error, and LMS has been used extensively in signal processing and has seen wide use in error correction in modems.

Widrow and Hoff also laid some groundwork for current neural net development. Neural net research focuses on how to hook up networks of artificial neurons so they can learn from experience. The best-known current algorithm for implementing "experience" in neural nets is back propagation, which is a generalization of the Widrow/Hoff rule. As Hardenbergh pointed out here last month, back propagation had to be discovered three times before the discovery stuck.

Why Three Times?

Rosenblatt first described the perceptron in 1958, but the machinery he first proposed is still in use in neural network research and development. For years it was a hot area of research, with hundreds of papers published.

Then, suddenly, the bottom dropped out, and so did the funding. The very use of the word "neuron" became unpopular in AI work. The reason was the fundamental inability of elementary perceptrons to classify certain kinds of patterns. Psychological researchers called the relevant kind of pattern classification problem concept attainment, and the interesting patterns in concept attainment were those involving discontiguous sets, exclusive ors, and problems that could not be solved by partitioning an input space with planes. This limitation of simple perceptrons was brilliantly spelled out in Minsky & Papert's 1969 book Perceptrons.

Minsky and Papert made their point clearly and emphatically. Inability to handle problems of the concept formation-type was a serious problem, and they treated it as such: They dismissed the bulk of the hundreds of perceptron papers as "without scientific value."

In retrospect, it appears that Minsky and Papert were a little precipitous in concluding that the problem they identified could not be solved. Simple single-layer perceptrons may have been proved to be without scientific value, but the same could not be said for multilayer perceptrons. Adding a couple more layers would allow perceptrons to classify all sorts of problems, although it complicated the learning problem seriously. What was needed was a practical learning algorithm for multilayer perceptrons. But so effective was Minsky and Papert's demolition job that nobody took the discovery of such an algorithm seriously at first. Or at second.

Is Backprop the Algorithm of Choice?

It is if you are doing multilevel neural net work. The only other algorithm that works with multilayer nets, the Boltzmann machine, is much slower.

Backprop is a generalization of the Widrow/Hoff error correction rule. The Widrow/Hoff rule compared the actual output with what the output was supposed to be and used the magnitude of the error to adjust strengths of the connections between cells. For situations in which the correct response was known, it worked well. Adding additional layers introduces a difficulty. How do you compute the correct output for the hidden intermediate layers in order to adjust the connection strengths that led to these outputs? The problem is complicated by the fact that adjusting the connection strengths actually changes the topology of the network.

The solution used in back propagation is to run the connections backward, to ascertain the strengths. Back propagation involves a forward pass through the layers to estimate the error and a backward pass to modify the connection strengths and decrease the error.

Backprop currently looks like one of the most promising, if not the most promising area of neural net research, and could generate some interesting results. One of the intriguing ideas about neural nets, particularly Boltzmann machines and backprop nets, is the notion that deep insights into the nature of the information being processed and the effective representation of it can be derived from looking at the internal layers. A neural net that learns to classify patterns effectively, it is argued, contains in its hidden layers a representation of the input. If the output classifications are adequate to our needs, then the hidden-layer representations are also adequate, and we could send only these representations, dispensing with the input.

One benefit could be the use of backprop neural nets to develop new data compression algorithms.

Why is This of Interest to a Hardware Engineer?

It's not so odd that Hardenbergh, an engineer, was attracted to this domain of artificial intelligence. Perceptron research and neural network research have always had enormous appeal for engineers. "Much of the later work on perceptrons and successors was done by engineers and physicists," Anderson and Rosenfeld say, "a situation still true today in research on neural networks." The perceptron was a learning machine, potentially capable of complex adaptive behavior. It's easier to conceive of it as a device than as an approach to developing AI software systems; and until you see the algorithms spelled out, it's easier to see neural nets as a mathematical or engineering challenge than as a programming problem.

Hardenbergh and Waite see them as all of the above. Although I don't mean this to be a plug for Vicom or its employees, there are several reasons why I am going to keep a journalist's eye on Hardenbergh and Waite and Vicom.

The canonical problem for neural networks is the classification of visual figures, pattern classification. The earliest work in the neural net tradition, Walter Pitts and Warren S. McCulloch's research in the 1940s, focused on problems like recognizing squares wherever they appeared in the visual field. The basic problem remains unsolved today. It's particularly apparent to anyone who has to process image data. Vicom is in the image-processing business.

Image-processing companies are at something of an impasse: They all have the same algorithms, nobody has any technological edge. The time is ripe for a new approach.

Vicom is not exclusively wrapped up in DoD work, so that smokescreens will not obscure their results.

The amount of money required to fund a real breakthrough in neural nets for image processing is probably not enormous, not beyond the reach of potential customers of a company like Vicom.

I like the way Waite and Hardenbergh are approaching this. They are pragmatic enough to be discussing using neural nets as a component of an image processing system, not over-loading the network, not forcing it to solve problems for which there are already good image processing solutions. They are bringing hardware and software knowledge into the process at the start. And they seem focused, which is good if their approach is the right one.

Next month: Tom Waite (and others) on back propagation (and other topics).

Reference

Anderson, James A. and Rosenfeld, Edward, Neurocomputing: Foundations of Research. MIT Press, Cambridge, MA, 1988.

Copyright © 1989, Dr. Dobb's Journal