"Paradigm is a rather unlovely word, which is commonly used in technical writings when the author wants to obscure the fact that there are no facts in his writings. Psychologists and psychiatrists and M&T Publishing's writers are especially fond of the word."
-- Hal Hardenbergh
Hal Hardenbergh's writing is generally strong on factual content or at least on empirical content. He is wont to speak in what philosophers of science call "highly falsifiable assertions." He'll make an outrageously bold claim and phrase it in such specific terms that it is empirically testable on several grounds. Hence falsifiable, though not necessarily false.
Rarely false, in fact. When you press him on one of his points -- or sometimes even without pressure -- he trots out facts and figures, names and dates to support his position. When Hardenbergh fingers his culprit, he usually has a Zapruder film in his pocket.
He's also wont to pounce on errors or excesses in the use of the English language, as I observed with chagrin the last time I used the word "wont" in print. Gremlins had slipped an apostrophe into the word as it went to press, and it was Hal Hardenbergh who called the error to my attention. And he has suggested that I use the word "paradigm" too much. I think he's had his eye on me since I misspelled his name some years back.
His own publication, DTACK GROUNDED, was always good reading and embodied Hardenbergh's philosophy of no-nonsense, pedal-to-the-metal, get-the-HLL-out-of-my-machine computing. He wrote and produced that newsletter continuously from July 1981 to September 1985 (and occasionally thereafter with the byline "The Junk Mail Flyer"). It went out mainly to customers of his company, Digital Acoustics, and carried twenty-some pages a month of incisive industry analysis and juicy gossip, tightly-reasoned technical discussions, and code. The magazine's bias was speed; it was a computer hardware hacker's hot-rodding magazine, and Hardenbergh's pet peeve was application programs written in high-level languages. DTACK GROUNDED was a far cry from this column, and while writing DTACK GROUNDED, Hardenbergh would no doubt have scoffed at the inefficiency of Smalltalk and Prolog and at many of the other topics discussed in this space. Digital Acoustics and DTACK GROUNDED, alas, are history, but Hardenbergh has been gainfully employed for the past year at Vicom Systems, an image-processing company in San Jose, Calif.
When I heard that Hardenbergh had got into neural networks I was surprised. Neural networks represent a branch of artificial intelligence work that some might consider antithetical to the hard-nosed and the hands-on. The whole point of neural nets was to remove from human hands a great deal of the control over what the machine was doing. And demonstrable, practical results of neural network research were hard to find. Furthermore, existing neural net implementations are slow. Agonizingly slow. It seemed anything but an area in which you'd expect to find an inveterate bit-twiddler. I decided to go see Hardenbergh and find out what the attraction was.
The plan, as usual in this column, was to explore a new paradigm by examining the thinking that led one computer professional to embrace that paradigm. Be forewarned that you will probably not agree with everything Hal Hardenbergh says. But there is usually much to be learned from watching a sharp mind slice through an interesting problem. Hardenbergh's view of the history and present value of neural net paradigms is worth examining.
When I arrived at Vicom, Hardenbergh led me to a conference room and offered me the choice of an interview or a dissertation. I told him to start rolling and that I'd break in when I got lost. In translating the resulting discussion to the pages of DDJ I have found it desirable to break in a little more often than was actually the case during the interview. Or dissertation. But to the best of my transcriptive ability, Hardenbergh's words are Hardenbergh's words.
Swaine: What's the attraction of neural networks for a hardware engineer?
Hardenbergh: I can't say that what I'm doing here at Vicom is dull. Realtime video processing is hardly boring. But neural nets let me feel like I'm pushing the envelope a little.
Swaine: But how did you settle on neural nets, rather than some other envelope-pushing paradigm?
Hardenbergh: When it comes to AI and machine learning, you have four paradigms. One is the symbolic approach using, typically, Lisp, that Minsky and Papert championed out of the MIT AI Lab. A lot of money has gone down that rat hole, and now people have stopped pouring because they noticed that it wasn't coming back up. The second is expert systems. If you want to invest some money in AI and have a reasonable expectation of getting something back, that's where you invest it. The third paradigm is both very old and very new, and that's neural networks. And the fourth is fuzzy logic. To the best of my knowledge, these are the four; if you're going into AI, you'll have to tackle one of them. What the three (that are not neural networks) have in common is that they require an enormous amount of programming to do anything. The potential advantage of neural networks is that they program themselves.
Swaine: How did you first get into neural nets?
Hardenbergh: [Vicom co-worker] Tom Waite was looking into neural nets and one day I asked Tom to teach me about them. He put some equations with integrals in front of me. For an engineer I'm a pretty decent mathematician, but I told Tom, "I know how to add and subtract and multiply and divide with a computer, but I don't know what to do with this." But eventually I got the equations into pseudo-Basic so I could understand them.
Swaine: You've been at it less than a year, then. But you've done more than code some integrals.
Hardenbergh: I've been taking classes, reading books, and Tom and I have submitted an article on neural networks to one of your competitors. (At the time of the interview, the Hardenbergh and Waite article was scheduled for the June issue of Programmer's Journal, a magazine that Hardenbergh often writes for.)
Swaine: Neural networks is an exploding area of research and development. There's a lot of information to wade through: I have several rather thick books on neural nets, and there are different models -- the relationships among which I frankly don't understand. You've apparently found a path through it all to the information you want.
Hardenbergh: I recommend an article by Lippmann in the April 1987 ASSP Magazine -- that's the IEEE acoustics and signal processing publication -- it's a tutorial on neural nets that by-passes all the associative memory crap.
Swaine: "Associative memory crap"?
Hardenbergh: Some of what people talk about when they talk about neural networks is of interest from a historical viewpoint, but not from the viewpoint of artificial intelligence as I understand it. One of these things is associative memory. Associative memory maps ones and zeros into ones and zeros, and it doesn't even do that reliably. If you think ones and zeros have a lot of intelligence, you'll love associative memory. Then there's adaptive bidirectional associative memory, or adaptive resonance theory (ART), by Grossberg, who has a patent in this area. There's a story about why he's working on ART, rather than something useflike a multilevel perceptron.
Swaine: I gather that you've concluded that, for your purposes at least, the multilevel perceptron is the only approach worth pursuing.
Hardenbergh: Multilevel perceptrons are my idea of a real-world neural network.
Swaine: Tell me about how you narrowed your own search down to perceptrons.
Hardenbergh: Lippmann does a taxonomy. He talks about Hopfield nets and Hamming nets and ART, which, like the other two, is of historical interest only. And he describes the single-level perceptron.
Swaine: That would be Rosenblatt's perceptron, from back in the late 1950s.
Hardenbergh: The perceptron was the start of all the neural network work. In 1958 Rosenblatt was doing research on natural neural nets, the wet stuff, and he developed a model of a simplified neuron. There were certain things that it could do. There are still certain things that it can do. And it generated a lot of interest in AI in 1958. The people in the MIT AI Lab became disturbed about funding moving over to perceptrons, and Minsky and Papert decided to do something about it. What they did was to start writing papers, culminating in a book called Perceptrons, with a copyright date of 1969. The book demonstrated that there were certain things that a single perceptron cannot do. One of the things that a single perceptron cannot do is the exclusive-OR problem.
Swaine: That's not a trivial limitation. Back when I was doing research in cognitive psychology, studying the process of concept formation, we found that modeling the human ability to form concepts of the nature of "A or B but not both" was very difficult, but we felt that we didn't have a model of concept formation without that exclusive OR.
Hardenbergh: Oh, the difficulties were real ones. In the meantime Rosenblatt had suggested that one solution would be to use many perceptrons, perhaps arranged in layers, but it was only a suggestion, because in 1969 a mathematical method of adjusting the weights did not exist.
Swaine: Explain about adjusting the weights.
Hardenbergh: You have the desired output, call it the target. You compare the actual output to the target and measure the error. Then you propagate the error up through the net and use it to adjust the weights [on the connections].
Swaine: So certain connections get stronger over time, and the network responds more and more appropriately as this training proceeds.
Hardenbergh: But you couldn't train the damn thing, so nobody built one, or if they did, it didn't work, so they didn't write about it. Minsky and Papert's book Perceptrons was then, and is today, highly regarded, except for the last chapter. Because they had proven that perceptrons could not solve certain real-world problems, they concluded that nothing along this line would ever be useful. The book crushingly discredited neural nets and funding dried up completely.
Swaine: What happened next?
Hardenbergh: The next events generally known occurred in the 1980s, but in 1974 an event occurred that was known only to two people. As part of his Ph.D. research a Harvard graduate student, Paul Werbos, developed the mathematical technique required to train multilevel perceptrons. His adviser was Steven Grossberg.
Swaine: This must be the Grossberg story.
Hardenbergh: Right. Grossberg was well aware of Minsky and Papert's work on perceptrons, and he told this student that his work was of no value. And indeed it proved of no value, because it was pigeonholed and that was it.
Swaine: That was the technique of back propagation?
Hardenbergh: Yes, I guess it would be hard to do work on multilevel perceptrons after derailing the discovery of the technique that makes them feasible. But back propagation is known now, and people are doing work on multilevel perceptrons.
Swaine: What happened?
Hardenbergh: In the 1980s, about a dozen years later, things began to happen. One of the things that happened was Hopfield nets. Another was that Rumelhart and others formed the PDP group (at the Institute for Cognitive Science at the University of California at San Diego). The PDP group attracted some interesting people to work with them, including [DNA co-discoverer] Francis Crick. But in 1982, another event took place that nobody knew about. A 22-year-old Stanford student independently invented the mathematical theory of back propagation.
Swaine: That would be David Parker. I interviewed him last year, but I'm planning to talk with him again soon.
Hardenbergh: Parker discovered the theory and went to people who were funding AI activities and asked for funding. They asked, "Is this an expert system?" He didn't get the funding and eventually went off on his own.
Swaine: And then?
Hardenbergh: In 1984 there was the Hopfield net, and in 1985 there was the first public report of a neural net that worked -- barely. That was the Boltzmann machine and its author was G.E. Hinton, and it was slow even for a neural net, and neural nets have a deserved reputation for being slow. Then in 1986 there was the publication by Rumelhart, et al. of "Learning Internal Representations by Error Propagation," the third invention of back propagation. This one led to the current explosion of interest in neural nets. Since then there's been a tremendous amount of activity.
Swaine: So back propagation was independently discovered three different times? What made the difference the third time around?
Hardenbergh: Rumelhart was well known; Parker wasn't.
Swaine: There certainly has been, as you put it, an explosion of interest in neural nets, but to date it looks like a lot of smoke and very little fire. Expert systems really are making money for people and solving real-world problems. That particular AI technology, while it may not deserve all the hype it's received, does have unarguable success stories to tell. Why haven't we seen any breakthrough practical applications of neural net technology?
Hardenbergh: Unfortunately, neural nets are slow. You can't do neural nets on a PC. And nobody's doing the $4,000 parts-cost solution. So you need to get Uncle Sugar to give you the latest Cray full-time for a month.
Swaine: Your mention of a $4,000 parts-cost solution sounds like Vicom has a neural net board in the works.
Hardenbergh: There's interest, but no commitment to a product yet. This is something Tom and I are pursuing on our own. But management doesn't object to my meeting with an editor in the conference room on company time to discuss neural nets. They're supportive.
Swaine: What are the hardware issues?
Hardenbergh: Floating-point chips these days are so good the real problem is the memory system. You can't use static RAM. You need to interleave DRAM, use multiported memory. All the hardware cards for neural nets are from software companies trying to do neural net work. None of them are very good designs.
Swaine: What do you think of the Transputer?
Hardenbergh: It's fundamentally flawed as a concept. It has no register-to-register add. It has only three registers, arranged in a stack. Floating Point Systems and the British government lost a lot of money on the Transputer; Thorne EMI went through about $300,000,000.
Swaine: And the Occam programming language developed for concurrent programming of networks of Transputers?
Hardenbergh: Occam is a failure.
Swaine: I'll be talking to an Occam programmer in a few weeks, so I'll let him defend the language then. But is there no good work going on in neural nets? Are there no success stories?
Hardenbergh: All the good stuff is classified. Chevron owns the seismic research. Parker may be doing something interesting, but he is not publishing.
Swaine: But you think there are things that you could do with neural nets, given the right hardware?
Hardenbergh: In combination with image-processing methods. You pre-process with DSP or whatever, and don't overload the network. Don't make it do what we already have good algorithms for. The company is very interested in the possibilities. I'm enjoying myself.
Swaine: The Lippmann article says that three levels of perceptrons not only solve the exclusive-OR problem but are sufficient for any arbitrary classification problem. But you said there were certain things that single perceptrons could do.
Hardenbergh: One spinoff of Rosenblatt's work was the adaptive linear filter. That was a success, and all of the high-speed modems use it. The reason the Telebit modem is so successful is ALFs. ALFs are used in phone lines to cancel noise. An ALF is just a single perceptron. Wait here. [Hardenbergh got up and left the room. A moment later he returned and put twenty cents into my hand.] It's pair of dimes. I've been wanting to do that for a long time.
Toward the end of the interview, Tom Waite came into the room. Waite and Hardenbergh told me how they thought neural nets could supplement existing image-processing techniques, and Waite shared some ideas about neural net algorithms. It was Tom Waite who gave me the characterization of neural nets as the parapsychology of artificial intelligence, a characterization that he does not agree with. I will look at some of the algorithms next month. I'll also be talking with neural net algorist David Parker again soon, and within the next two months I hope to report on that discussion, as well as on a follow-up interview with Jurgen Fey regarding the transputer board he has designed specifically to support neural networks.
Lippmann, Richard P. "An Introduction to Computing with Neural Nets," IEEE ASSP Magazine, April, 1987.
Minsky, M. and Papert, S. Perceptrons: An Introduction to Computational Geometry, MIT Press, 1969.
Parker, D.B. "A Comparison of Algorithms for Neuron-like Cells" in J.S. Denker (ed) AIP Conference Proceedings 151, Neural Networks for Computing, Snowbird, Utah, AIP, 1986.
Rosenblatt, R. Principles of Neuro-dynamics. Spartan Books, New York, 1959.
Rumelhart, D.E., Hinton, G.E., and Williams, R.J. "Learning Internal Representations by Error Propagation" in D.E. Rumelhart and J.L. McClelland (eds), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Foundations. MIT Press, 1986.
Copyright © 1989, Dr. Dobb's JournalCutting Through the Crap
Multilevel Perceptrons
The Politics of Discovery
Smoke Without Fire
Joke or Not, I Kept the Money
References