HAL's Legacy and Large-Scale Software Design

Dr. Dobb's Journal May 1997

By Gregory V. Wilson

Greg is the author of Practical Parallel Programming (MIT Press, 1995), and coeditor with Paul Lu of Parallel Programming Using C++ (MIT Press, 1996). Greg can be reached at gvwilson@interlog.com.

HAL's Legacy: 2001's Computer
as Dream and Reality
David G. Stork, editor
376 pp., $22.50
MIT Press, 1997
ISBN 0-262-19378-7

Large-Scale C++ Software Design
John Lakos
846 pp., $39.76
Addison-Wesley, 1996
ISBN 0-201-63362-0

The Elements of E-Mail Style
David Angell and Brent Heslop
156 pp., $12.95
Addison-Wesley, 1994
ISBN 0-2-1-62709-4

I recently unpacked a box of science-fiction paperbacks I bought as a teenager, and spent a couple of hours reliving the way the future was. Do you remember when there were going to be hotels in space? When computers were going to be able to talk, but people would still write down instrument readings using pencil and paper? Do you remember 2001: A Space Odyssey?

The contributors to HAL's Legacy certainly do. When Arthur Clarke wrote the novel, he gave HAL's birth date as January 12, 1997 (it was 1992 in the film). To commemorate that "anniversary," HAL's Legacy brings together luminaries from artificial intelligence (AI) and other areas of computer science to look at how close we are to being able to build HAL today. The discussion ranges from a rather dry examination of supercomputer architectures, written by compiler maven David Kuck, through speech recognition and knowledge representation, to musings on electronic emotions and digital morality by philosophers such as Daniel Dennett. The book is illustrated with stills from the motion picture, and each chapter has its own minibibliography.

My favorite chapters were those on computer chess and language comprehension. Murray Campbell, the author of the first of these, is one of the creators of Deep Blue, the only machine that has ever beaten the (human) world champion in a regulation game. The chapter's starting point is the game between Poole and HAL. Campbell concludes that, unlike today's chess-playing programs, HAL cannot be using brute-force search alone, because it lays a trap for its human opponent. This requires more than just tactical skill -- it also requires an understanding of your opponent's mental state, and indeed, of the fact that your opponent has a mental state.

Roger Schank makes this same point at greater length in his chapter on HAL's use of language. Schank first saw the film while in graduate school, and writes:

I felt sure...that computers could eventually do what HAL does...Today I am less sure. Thirty years of research...has taught me what I did not know in 1968: that understanding natural language depends on a great deal more than simply understanding words.

Schank argues that it is impossible to understand what someone is saying without some comprehension of her interests, concerns, and desires. While programs can emulate such comprehension in limited domains, Schank is pessimistic about our being able to write a program that truly understands what is said to it.

Other contributors to this book disagree, of course, and stick to the "soon, but not yet" orthodoxy of mainstream artificial intelligence. Doug Lenat believes that the key is to give computers "common sense," by which he means "a database of facts so trivial that human beings have never written them down." While this seems plausible, he never explains where or how he draws the line between common sense and unprincipled hacks. For example, consider the statement "If people do something for recreation that puts them at risk of bodily harm, then they are adventurous." Was it added to the database as part of some methodical procedure, or was it needed because a particular demo wouldn't run properly without it? CYC (a common-sense based knowledge base developed by Cycorp. See www.cyc.com) might turn out to be a useful resource for building text-retrieval engines, but Lenat's chapter reminds me that when I did my master's degree in artificial intelligence in the mid-1980s, my fellow students measured "bogusness" in milli-Lenats.

Most of the chapters in this book fall between these extremes, and provide accessible surveys of particular domains, including computer lip-reading, fault tolerance, speech synthesis, and planning. There are also interviews with Marvin Minsky, one of the fathers of AI, and with Stephen Wolfram, the author of Mathematica. Overall, the book is an enjoyable stroll through the more speculative reaches of computing, and ought to be enjoyed by anyone who ever snuck one of Clarke's novels into a chemistry lecture.

Large-Scale C++ Software Design, by John Lakos, is much more down-to-earth than HAL's Legacy. Almost all books on object-oriented programming talk about design issues; this is the first I have seen devoted to the problems that arise in actually implementing large programs in C++. For example, most programmers use symbol definition to prevent problems arising from multiple inclusion of header files, as in Example 1(a). Lakos points out that if a C++ source file includes this header file twice, most compilers will still read it twice. If header files contain other header files recursively, the time required by this superfluous scanning can quickly add up. Lakos recommends guarding the file at the point of inclusion as well (see Example 1(b)), so that even if a source file includes a file multiple times (Example 1(c)), the scanning cost will only be paid once.

This example is one of the first, and simplest, in the book. Lakos' other tips are more substantial, but each is clearly described and illustrated with examples. A strength of the book is that Lakos quantifies the benefits of his suggestions using simple measurements on dependency graphs.

Some of Lakos' techniques go against the grain of most object-oriented doctrine. He recommends, for example, that it is sometimes better to repeat code than to introduce a compilation dependency between otherwise separable program components. He also recommends removing private data members from classes where possible: Even if a class's interface doesn't change, he argues, changes in implementation may require changes in header files, which, in turn, may necessitate recompilation and retesting of other program components.

Some people might question why such elaborate program-construction techniques are necessary when modern C++ compilers can churn through thousands of lines of source code in a second. However, just as Moore's Law predicts that chips will keep getting faster, Wilson's Corollary predicts that programs will keep getting bigger and more complex. Keeping build times manageable is therefore likely to become more, rather than less, important as time goes by. Large-Scale C++ is the closest thing I have yet seen to an engineering handbook for C++ programmers; anyone working on a large C++-based project should take the time to go through it.

Another strength of Large-Scale C++ is that, like most of the books in Addison-Wesley's C++ series, it is well written. Many programmers neglect studying the use of English while learning their own discipline. They then find themselves writing memos, documentation, or business plans that look and sound wrong, but are unable to "debug" their prose. A professional proofreader would turn to an authoritative work such as Fowler's Modern English Usage (Oxford University Press, revised by Gowers), but books of this ilk are often too cumbersome or technical for nonspecialists. A popular alternative for over 50 years has been Strunk and White's pithy The Elements of Style (Macmillan). Not only does it explain how to avoid common technical errors, it also shows the way to a clearer, more direct style.

My father (a retired English teacher) and I recently came across a useful companion to Strunk and White for people who record and transmit information electronically: David Angell and Brent Heslop's The Elements of E-Mail Style. Along with information on such things as the structure of messages, the use of ASCII, common acronyms, and smileys, it offers advice on eliminating unnecessary words and clichés, and quick tips on common errors in grammar and punctuation. Judging by much that is on the Internet, many people should also take the time to read what the authors have written about "Tone, Rhythm, Persuasion...and Flame Control." While it is not complete enough to be authoritative, it is small enough to be slipped into a briefcase for quick checking, and the contents are clearly presented. As the authors point out: "By focusing on the 20 percent of English grammar, usage, and mechanics issues that cause 80 percent of the problems in writing e-mail, you can quickly and dramatically improve your e-mail messages."

DDJ