PROGRAMMING PARADIGMS

Nasal Nets

Michael Swaine

In a short story by Nikolay Gogol, an officer misplaces his nose, which is subsequently spotted here and there about Petersburg, masquerading as an officer of higher rank. While the officer is trying to place an advertisement offering a reward for his nose's return, the press official solicitously but injudiciously offers him snuff. Insulted, the officer retorts that he is missing precisely what is required to appreciate snuff, and besides he would never use that cheap brand.

The point is ... but first I think I need to backtrack a bit.

Slow Learner

Hal Hardenbergh called to clarify a point in the interview with him that was published in the June issue. In that interview, I left the speed issue ambiguous: It's not the execution speed of trained neural nets, but the convergence of the learning algorithm, that is slow. One other correspondent knowledgeable in neural net techniques also caught the slip.

In Hardenbergh's words, "While it's true that multi-level, perceptron-based artificial neural networks are extremely slow, this is only true for the training phase. When you actually run them, they are much faster. This is even true for toy problems. I have been playing with neural nets on my three Atari STs and have achieved speedups from training to practice of one million to one."

Tom Waite, who is working on neural nets with Hardenbergh at Vicom in San Jose, Calif., pointed out that the performance of the system as it learns is dramatic. Unlike adaptive filters, which just get better as they learn, neural nets do a flip-flop to which it is hard not to attach anthropomorphic labels. You see the system spontaneously "gets the idea," and develops "insight" into the problem. "It will be stuck in a rut and all of a sudden go wild and drop into the right answer," Waite said.

But that insight may take a while.

Slow or not, multi-level, perceptron-based artificial neural networks converge to solutions more rapidly than some competitors. Hardenbergh said, "the Boltzmann machine training method is generally regarded as more robust than multi-level perceptrons, but it is also considered to be two orders of magnitude slower than back propagation, so it's not used for practical work." Anderson and Rosenfeld, in their massive and important book Neuro-computing, say "[Backprop] is ... the most popular learning algorithm for working with multi-layer networks. It is considerably faster than the other learning algorithm that is successful with multi-layer networks, the Boltzmann machine."

So, if your response to the interview was that you were missing precisely what was required to appreciate the stuff (algorithms, code, hands-on examples), and besides you would never use such a slow technique, I hope I have corrected the impression I gave about the speed of neural nets. As for the algorithms, code, and hands-on examples, that will have to wait until I can devote a full column to it, because a backprop algorithm doesn't make any sense without a fairly complete description of the network to which it is applied. At the end of this column, though, are references to a book and an article where backprop is well exemplified.

The Explosive Market for Neural Nets

Hardenbergh also brought up to date a reference in that article to a possible development in the commercial neural nets market." At the time of the publication of your interview with me there were no commercial applications of multi-level artificial neural nets. Since then SAIC has signed a multimillion dollar contract to provide explosives detectors to airports.

"The way their device works is that the explosives are subjected to bombardment from a gamma ray source and the molecules making up the explosive substance have characteristic signatures under gamma rays. This results in a diffraction pattern that identifies the explosive. And SAIC uses a multi-level, perceptron-based artificial neural network to detect the characteristic patterns.

"I call a one-hundred-million-dollar contract real activity." So do we, Hal.

Closing the Circle

The management at Vicom is supportive of Waite and Hardenbergh's neural net explorations because Vicom is in the image processing business, and the applications of neural nets to image processing are evident. But there are many powerful techniques already developed for processing images, dating back at least to Fourier's development, in 1807, of the Fourier transform, which has had applications Fourier could not have imagined, such as transforming medical diagnosis by way of the CAT scan.

Waite and Hardenbergh see neural nets as needing to fit in with existing, more conventional algorithms, so they find themselves regularly thinking in different paradigms. Lately, Hardenbergh has been treading the well-trod path of efficient algorithms for circle drawing.

"The past three weekends I've been playing with plotting curves. Back when I was [working on another project] I plotted curves and pulled out three curve-plotting algorithms, all in fact from DDJ. When I looked at them recently, I found to my surprise that all of them used multiplication for each point. Then I came across Bresenham's algorithm."

In fact, DDJ published an implementation and discussion of Bresenham's algorithm in September, 1987. It was by Jim Blinn, a graphics expert at JPL. I used Blinn's version of Bresenham's algorithm in my book on HyperTalk, to show beginning HyperTalk programmers three ways to draw a circle, but of course drawing circles in an interpreted language doesn't tell you a whole lot about algorithmic efficiency. I was interested in Hardenbergh's take on Bresenham's algorithm, because 1. Hardenbergh is interested in algorithmic efficiency and usually has an interesting timing result or two to report, and 2. I knew how he hated wading through academic explanations of technical issues. I suspected that, if he hadn't seen the Blinn piece, he had only seen academic explications, and would have impatiently taken off on his own, possibly interesting tangent.

Sure enough, he told me, "The Bresenham algorithm doesn't use multiplication, but Bresenham is an academician and so the article I read was full of derivations and math. Normally, people doing this kind of work keep it as a trade secret, but Bresenham is an academic, trying to make PhD Brownie points, so he just made it obscure.

"So I put away the book and set about trying to write a good circle-drawing algorithm myself. Then I found out that Vicom does not know how to do fast ellipses, so I worked on ellipses. Now I have a circle algorithm, an algorithm that does ellipses oriented to the axes, and also a tilted ellipse algorithm, all without multiplies."

At that point Hardenbergh went back to the books. "I picked up the book in which I had read the Bresenham algorithm and found to my surprise that I had not reinvented Bresenham; my algorithm is just not Bresenham's. Mine is simpler. It is ridiculously simple. And the ellipse and tilted ellipse algorithms are just extensions of the circle algorithm, so I suspect that my ellipse algorithms are also different from Bresenham's. I tried to check this but could find no book there that contained the Bresenham algorithm [for ellipses]."

Hal didn't give me his algorithms, but he did give me a number to test against: one thousand radius-100 circles drawn in 5.84 seconds with a 0.02 second loop overhead. The hardware was an Atari ST.

The 48th Parallel Processor

In the Bavarian countryside outside Munich, I met with Jurgen Fey, almost exactly on the 48th parallel, roughly the same latitude as Poltava, where the young student Gogol was mercilessly teased for his beak-like nose a century and a half earlier.

Jurgen turned up his nose when I told him that Hardenbergh had criticized the Transputer for its lack of registers (Jurgen has developed a Transputer board). "It's true that it's not a register machine. You can't do parallel processing on a parallel machine. I challenge him to find a register-based machine that supports parallel processing."

But Jurgen did agree that the Transputer's "native" language, occam, was on its way out. There are now, he explained, several more or less viable alternatives in the works, including ".Lisp, SC-Prolog, Parallel Prolog, Modula-2, C, C, C, C, C, and C, Ada, assembler, and Forth." Jurgen is working on something compatible with Turbo Pascal 4.0.

Antisocial Behavior in the Society of Mind

What has been called "the Minsky smoke bomb" continues to smell.

In The Structure of Scientific Revolutions, Thomas Kuhn wrote: "But paradigm debates are not really about relative problem-solving ability, though for good reasons they are usually couched in those terms. Instead, the issue is which paradigm should in the future guide research on problems many of which neither competitor can yet claim to resolve completely ... that decision must be based less on past achievement than on future promise."

And Kuhn wrote: "... the defenders of traditional theory and procedure can almost always point to problems that its new rival has not solved but that for their view are no problems at all.... if a new candidate for paradigm had to be judged from the start by hard-headed people who examined only relative problem-solving ability, the sciences would experience very few major revolutions."

And: "... The member of a mature scientific community is, like the typical character of Orwell's 1984, the victim of a history rewritten by the powers that be."

In Perceptrons, Marvin Minsky and Seymour Papert presented an impeccably reasoned argument showing certain limits on the results you could get out of any single-layer perceptron network, along with something quite different: An "intuitive judgment that the extension [to multi-layer networks] is sterile."

Together, the reasoned argument and the unsupported (and mistaken) intuitive judgment undermined support for this neural net paradigm that was contending for AI research funding with their favored Lisp-based work at MIT. In 1969, Minsky and Papert were the hard-headed establishment of artificial intelligence and neural nets was the new candidate paradigm that got written out of the history of AI work by their efforts. Two decades later, the dramatic comeback of neural nets is one of those heartening success stories, like "Nikolay Gogol, the odd, beak-nosed boy, had turned obscurity and failure into triumph, and was well on his way to becoming one of the most famous authors in Russia" (Dick Penner). Heartening once you get to the success part, but discouraging through the years of obscurity and failure. Were those years unnecessary?

Some interesting public documents recently came into my possession, and anyone interested in the history of invention in this area ought to take a look at them.

The problem that formed the crux of Minsky and Papert's denunciation of perceptron-based neural net research was linearly separable functions; basically, the exclusive-OR problem of classifying unlike objects into the same category. For example, given a collection of red and green squares and circles, you are to classify the red squares and green circles together in one group, and the red circles and green squares in the other group. You can't form the grouping with a straight line through the space whose dimensions are red-green and circle-square. But it's a perfectly logical grouping, expressed by the exclusive-OR expression "red or circle (but not both)." The first group contains all objects satisfying the expression, the second group all objects not satisfying it. The expression perfectly assigns the objects to the required groups. Single-layer perceptrons, and thus neural nets, Minsky and Papert said, couldn't do that.

Although Minsky and Papert clarified the full consequences of the problem with single-layer perceptrons in a way that had never been done, this basic problem had been perceived years earlier, and many people had given thought to its solution.

At the very least, the problem was under attack. Here is what one researcher wrote in 1963: "Two layers were used ... because one layer can only be trained successfully on linearly separable functions [Minsky and Papert's point].... Having two layers of variable weights was our initial intention, but we do not yet know of an algorithm leading to a convergent training process." The writer went on to discuss his group's search for such a convergent algorithm, as well as practical measures they had taken.

How close were neural net researchers to solving the problem? Had the problem been solved, perhaps accidentally, nearly a decade before Minsky and Papert published their book? And was Minsky aware of the solution? I don't have the answers, but the issue is apparently more complex than anyone has yet admitted. I refer you to the work on graphical data processing done at SRI in the early 1960s (references are at the end of this article).

Recommended Reading

Hal Hardenbergh has been buying all the books he can find on neural nets. He called me to recommend strongly Neural Computing: Theory and Practice, by Philip D. Wasserman, Van Nostrand Rinehart, 1989.

"I now know enough about back propagation to judge that the chapter on back propagation is good, so by extension I suspect that the rest is good. I'm particularly interested in the chapter on stochastic methods. I know someone who knows the author and who says he's more interested in the Cauchy machine than in back propagation. He says the training is better with simulated annealing. The Cauchy machine is another stochastic method based on the Boltzmann machine, so I'm currently studying the stochastic chapter in Wasserman's book."

Hardenbergh considers the Wasserman book a good place to start learning about neural nets: The right level and the right size. "It's not a large book. The problem is that when you are starting to get into a new field you need to cover a lot of ground so you need a big book but you're not really ready to handle a big book."

A much more elementary but very broad presentation of this area can be found in Cognizers: Neural Networks and Machines that Think, by R. Colin Johnson and Chappell Brown, John Wiley, 1988. Light reading.

Hardenbergh and Waite have published their own version of the back propagation algorithm in the June issue of Programmer's Journal.

The SRI work mentioned had to do with the development of several pieces of parallel processing hardware (including MINOS I and MINOS II) and both theoretical and practical work on multi-level neural net algorithms. The work is described in several reports under SRI project number 3192. Quarterly Reports 4 and 5 are particularly interesting, showing MINOS I doing exclusive-OR classification. The quote is from the final report, Report No. 12, by A.E. Brain.

Many Russian writers have written stories about noses. Gogol's The Nose, however, may be the only one in which the curious Russian fascination with noses has survived translation. It was published in The Diary of a Madman and Other Stories, Nikolay Gogol, translated by Andrew R. MacAndrew, New American Library 1960; and was reprinted in Fiction of the Absurd: Pratfalls in the Void, edited by Dick Penner, New American Library, 1980. The latter book contains several other works that offer insights into the software development process, including a selection from Catch-22 and Camus' essay on software engineering, The Myth of Sisyphus.

Copyright © 1989, Dr. Dobb's Journal