PROGRAMMING PARADIGMS

Two Early Neural Net Implementations

Michael Swaine

There is exciting work being done in neural networks, such as the artificial retina project that Francis Crick is involved with. Unfortunately, most of the readily accessible implementations of neural network algorithms are unimpressive. The reason is that, for most of us, a "readily accessible" implementation of an algorithm is one that runs on a sequential machine, and neural networks are inherently nonsequential. They are, as Dave Parker defined them here last month, parallel implementations of minimization algorithms. The algorithms themselves can be interesting to examine, and last month we looked at some of these algorithms, especially Parker's own PC implementation of the back propagation algorithm. But we did not address the other half of the issue: The parallel implementation.

This month we'll look at two early implementations of neural nets. Both were remarkable successes in their target domains. Both, interestingly, used analog devices to implement parallel algorithms in largely discrete systems. Both addressed the canonical neural net problem of visual pattern classification, but not in the same way. And both pose some challenges for current neural net implementations.

MINOS II

MINOS I and its successors, MINOS II and III, were built at SRI under contract from the U.S. Army over a period from 1958 to 1967. Little was ever published in the general press regarding these machines, but they were described in contemporary documents, all unclassified and available to the curious. That point is worth making because of the controversy over Minsky and Papert's attack on neural net research in their 1969 book Perceptrons. Information on the MINOS project was at least accessible to Minsky and Papert before they wrote Perceptrons, and there is evidence that Minsky had visited the SRI labs and was aware of the project's objectives and results in the 1960s.

The stated objective of the MINOS work was "to conduct a research study and experimental investigation of techniques and equipment characteristics suitable for practical application to graphical data processing for military requirements." I take that to mean "to find workable algorithms and architectures for military image processing." Or: "See if you can build us a machine to spot tanks in aerial recon photos." The SRI team approached the problem by building artificial neurons out of multi-aperture magnetic cores and linking them in a network under the control of a learning algorithm. Or: They implemented a minimization algorithm in a parallel architecture. Or: They built a neural net.

As illustrated in Figure 1, MINOS II most clearly shows the organization of the machines. MINOS II consists of four units: An optical preprocessor, an adaptive unit, a training/comparator unit, and an output unit. The preprocessor (see Figure 2) takes the data in the form of a series of static image frames from slides, film, or a TV camera, and compresses each frame to a 100-bit word, which it passes to the adaptive unit. The adaptive unit, a neural net, performs the classification, and is called adaptive because it learns.

There are two phases in the operation of the machine: Training and classifying. The training/comparator unit is only active during the training phase, when it accepts correct responses as input, compares them to the responses generated by the adaptive unit, and adjusts parameters called weights in the adaptive unit, based on the comparison. The output unit displays the results.

The preprocessor contains a 32 x 32 array of lenses in front of a photographic plate, each lens reading in parallel from a storage tube. The tube gets its data from a television camera (there are also provisions for reading from slides and film), so the preprocessor is essentially reading a digitized real-world image off a "dumb" retina. As Figure 3 shows, the photographic plate has a mask for each of the 1024 images, and a photocell associated with each image/ mask generates a binary signal based on the amount of light transmitted. The preprocessor then employs some algorithm, usually a task-specific algorithm, to reduce the 1024 bits to a single 100-bit code for input to the adaptive unit. The algorithm is generally not very sophisticated, because the goal is not true pattern classification, which comes later, but simply gross data reduction without losing features of the data that will be needed later for the pattern classification.

The adaptive unit is made up of threshold logic units (TLUs) in two layers. Each TLU computes a weighted sum of its binary inputs and generates a binary output, which is 1 if the weighted sum exceeds some threshold, and -1 otherwise. Physically, these TLUs are magnetic cores. The inputs to the 63 TLUs in the first layer are the 100 bits of the code word supplied by the preprocessor; that is, the preprocessed representation of an input image frame. These 63 TLUs feed the nine TLUs in the second layer, each second-layer TLU taking input from seven first-layer TLUs and computing a majority-rule function, outputting a 1 only if a majority of its inputs are 1s. The use of these second-level "committee" TLUs facilitates having the machine respond correctly without requiring that every TLU do so.

The nine second-level TLUs provide for 2⁹, or 512 different outputs, so the machine is capable of classifying its input patterns into any of 512 categories. During the training phase, the correct 9-bit classification is input for each input pattern, and the training/comparator unit compares the output of the adaptive unit with this correct answer. If they are not equal (as they usually will not be early in training) the training algorithm traces back through the adaptive unit's TLU outputs to adjust weights so that the correct response will result the next time this input pattern is presented. It first looks at second-level TLUs, then at the first-level TLUs feeding the incorrect second-level TLUs. The algorithm works so as to change as few weights as possible, and to change those weights that have to be changed by the smallest amount. (The weights, it should be noted, are analog values.)

Testing of the complex MINOS machines must have been trying. The reports on the work mention experiments terminated because of things like malfunctioning slide projectors. The hard-earned results are interesting.

In a test that explored its ability to pick out objects against a noisy background, MINOS II was given aerial photos, some of which showed tanks, and was trained to pick out the photos with tanks. It learned to classify a set of 50 photos after 28 iterations of the set. It was then presented with a new set of photos, and classified 32 out of 34 of them correctly, at least one of its two errors being "reasonable;" -- one a human classifier might have made as well.

Another test employed more categories, requiring MINOS II to classify standard military map symbols, presented in a variety of orientations, into 15 categories. With a total of 30 symbols, MINOS II was trained to infallible performance on this set in 40 iterations. With 75 symbols, which meant five different orientations for each symbol, it had not yet learned the classifications in 75 iterations, at which time a slide projector problem terminated the experiment. How good was MINOS II? It's hard to say. William Huber, who was the project monitor for the Army for the duration of the project, cited one complicating factor in his 1967 paper on the work: The system's tendency to "train around" defective operations, much as the human brain relearns over new pathways after cells are damaged. One day a power plug fell out of the wall and the machine was short one power supply; the only evidence of this was that training took a couple more iterations than usual. Measures of performance are also fairly meaningless when the exact task cannot easily be reconstructed on a conventional computer.

One thing is certain: MINOS I knew, in 1960, how to solve the XOR problem that typified Minsky and Papert's devastating critique of neural networks in Perceptrons. In fact, the MINOS team used a generalization of the XOR problem as a routine test for the machine.

The generalization of the XOR problem was to differentiate a horizontal bar from a vertical one anywhere in the pixel figure, and the retina was taken to be an 8 x 8, toroidally connected field, with the right edge contiguous with the left, the top with the bottom. The toroidal extension allowed eight, rather than five, positions in each orientation.

There's one challenge for any current neural net: Solve the toroidal XOR problem.

ADAM I

The history of the ADAM project is so poorly documented as to be lost in legend; I couldn't find the date on which the project was begun, there are no reliable contemporary descriptions from that period, and there have been acrimonious debates over the genesis of the device. The ADAM I has always been a favorite topic in the popular press, though, and it has been reexamined recently in the computer science literature in the light of the resurgence of interest in neural networks. ADAM I was one of the most successful neural net implementations ever. It was also extremely complex; current neural net implementations tend to be much simpler, and this is probably appropriate, but there is much that can be learned from a study of the ADAM I architecture.

Visual images are generated in ADAM I by a lens that projects scenes onto a grid of photoreceptive cells. These cells, along with two layers of other cells connected in a network, form the receptor net, which in turn connects to a highly parallel central processing system that in a sequential machine would be called the CPU, but in ADAM I is called the CNS (central neural system?).

There are actually two separate and redundant receptor nets to provide two images for further processing. This system of organization provides the raw data for the perception of depth.

There is a high degree of organization and differentiation at the level of the receptor net. There are two kinds of photoreceptive cells, responding to different wave lengths of light, and the receptors have many-to-one and one-to-many connections with cells in the next level. These intermediate cells usually have many-to-one but sometimes have one-to-many connections with the last layer of cells in the receptor net. Then there are cells that carry signals between two photoreceptive cells, allowing interaction and inhibition, and there are also cells that carry signals back to the photoreceptors from lower cells.

This complex organization permits a lot of processing and transformation of the image in the receptor net itself, yet it does not alter the overall topography of the initial image. The main purpose of the receptor net, as with MINOS II's preprocessor, is data reduction. By the time the data gets through the receptor net, the number of bits transmitted has been reduced by a factor of 1000. The receptors are extremely sensitive, responding to one quantum of light energy, and despite the 1000-fold data reduction, the entire system is also very responsive: Six quanta striking six different receptors can be enough to trigger an output response.

ADAM I was not the first device to employ this general kind of receptor-net organization: The earlier-still FROG I showed a high degree of specialization in cells in its receptor net, with four or five kinds, each extracting certain local features from its bitmap input. One such feature was the center-surround pattern, which responds to a pattern consisting of a small dark patch on the image, surrounded by a lighter area. Other specialized cells responded to changes from one image to another; that is, movements of objects in the imaging field. In FROG I the CNS reflected the four or five cell types of the receptor net and allowed the system to recognize four or five classes of visual phenomena. In a sense, that's all the discriminating it could do.

Some such specialized cells were retained at the lower levels in later models, including ADAM I, but in the evolution of the architecture, many such functions migrated higher in the system, to the CNS. In ADAM I, there is a faithful reconstruction at the CNS level of the topology of the received image, but only a few of the cells in the CNS use the information in this form. Most CNS cells, in several levels deep, do more complex processing of features of the image. They do edge detection and center-surround identification, for example.

ADAM I, which has not yet gone out of production, could solve some remarkably difficult classification problems. One problem that the ADAM I architecture solves is to count the number of ADAMs in a complex scene displayed in low resolution. I'll let you solve the problem for yourself.

You might try that recognition task on your favorite neural net.

Then again, maybe you just did.