STRUCTURED PROGRAMMING

Thinking Big, Talking Small

Jeff Duntemann, K16RA

There's a definition of the word "legendary" that I favor: Something that everybody talks about but which has never had any basis in fact. (The legendary Loch Ness monster comes to mind, as well as that legendary IBM service and support.) There's a computer language that comes perilously close to being legendary, and that language is the (almost-legendary) Smalltalk.

Amidst the dusty stacks of computer magazines filling my two walls of Hundavad bookshelves is the October 1980 issue of the late and lamented Creative Computing. On the cover is a Halloween witch boarding her broomstick, cackling in a cartoon balloon: "Come with me on a journey to the mysterious world of Smalltalk!" Such is the stuff of which legends are made.

Though it tried gamely, Creative Computing did little to chase the smoke surrounding the language. At best, they made it sound like an infix Forth width a graphics user interface, and that comes closer to the truth than the PARC folks would care to admit. The problem was that Ted Nelson and the other gurus of the time were so taken by the ivory tower PARC mystique and the dazzlingly precocious Xerox graphics workstations that they mistook the user interface for the language, muttered things about animation scripts and notebook-sized Dynabooks, and never really got around to answering the serious question: Why is Smalltalk special?

It is special because it is the ultimate object-oriented language. It was easily 15 years ahead of its time in many ways, and now that a world -- screaming for Object-Oriented Anything (OOA) -- is ready for Smalltalk, the language revealed falls far short of the mystical, magical otherworldliness that ten years of yearning have coated it with. If you've never actually got down and hacked in Smalltalk, what I'd like you to do is adopt the Firesign Theater attitude toward it: Everything you know is wrong. I'm going to try to explain Smalltalk from the other direction -- as a perfectly of language within a perfectly remarkable framework -- and in the process take a stab at showing you what object-oriented programming is about.

At the Language Nudist Park

You'd recognize Smalltalk if you ran into it at the language nudist park, stripped bare of overlapping windows and mouse cursors and all that other folderol. Here's an assignment statement in Smalltalk:

  fudgeFactor:= 42.

Man, it's just like back home in Pascalville! A numeric variable called fudgeFactor takes on a value of 42. The period is the local equivalent of Pascal's notorious (if not legendary) semicolon, and indicates the end of a statement. Like Pascal and C and Basic and all but the most bizarre languages like Prolog, a Smalltalk program is nothing more than a series of statements that do something in sequence.

Smalltalk, it seems, has everything all the other languages have, and most of its parts look familiar in an odd, polyglot kind of way. Generating the character equivalent of an integer is done this way:

  68 asCharacter

Excuse me, Mr. Forth ... no, it really is Smalltalk, and Smalltalk at the lowest level is just another collection of railroad diagrams, like any language you could name. The syntax is new, but the concepts are utterly traditional. If you're looking for a FORloop, look no further:

  4 timesRepeat:[
      Turtle
        go: 100;
        turn: 90]

Smalltalk sets off compound statements within pairs of square brackets, just as in Pascal we use BEGIN/END and in C {/}. Saying 4 timesRepeat: [] is just about precisely equivalent to FOR I := 1 TO 4 DO BEGIN END. Nothing magical or legendary about that at all.

The Message is the Medium

Yet another reason Smalltalk leans toward the legendary is that the PARC people, in designing it, made up new names for many traditional computer science concepts. What in most languages we'd call passing parameters in Smalltalk is called "passing messages." The distraction makes sense once you've grokked the fullness of the language, but for newcomers the term message passing promises more exotica than it delivers, and the result is gross confusion.

Here's why: The Smalltalk expression mentioned earlier, 68 asCharacter, returns the ASCII character 'D.' However, in the Smalltalk jargon, what is happening is that a message called asCharacter is passed to the value 68. The value 68 responds by sending back a message consisting of the value 'D.' Does this confuse you? It sure confused the hell out of me when I first encountered it. In Pascal you'd pass the value 68 to the standard function Chr (as in Chr(68)) and the standard function would return the value 'D.' In Smalltalk, it seems like you pass the procedure to the parameter, rather than the other way around. Bizarre? No more so than Forth, and some of my best friends use Forth all the time.

Forth uses postfix notation ("reverse Polish" -- now there's a legend for you!) because it serves the pathologically stack-centered architecture of the Forth language. Smalltalk uses its message-passing notation because message passing serves the architecture of Smalltalk, which, like Forth, differs from that of Pascal, C, and Basic. The thing to remember is that, like Forth, everything about Smalltalk makes perfect sense if you take it in the spirit of the language.

On a statement level, then, Smalltalk is just a variation on a common theme. The uncommon aspects of Smalltalk appear when it starts to put its clothes back on. The magic is in the framework, not the syntax.

Who's the Boss?

Smalltalk's architecture is not easily described in the legendary 25 words or fewer. I like a challenge, so I'll try. These are Smalltalk's three architectural principles:

  1. Data Is Boss.
  2. Data Knows What To Do.
  3. Data Bequeaths Everything It Has To Its Children.

First of all, Data Is Boss. To Pascal programmers, Data Is Clay, and we spend all our time fiddling up code-ish widgets to squeeze, shape, spindle, and mangle that data. Smalltalk moves data to center stage. We tend to think of a Pascal program as a series of code-clumps passing control from one to another. Data gets passed around as well, passively being beaten about the ears and pounded into new shapes and sizes. In a Smalltalk program, the active parties are not sequences of code but items of data.

Weird? Well, consider ... what's the more valuable and lasting entity: The act of a dog barking or the dog itself? As Confucius might say: There is still a dog even when the dog is silent. Smalltalk leans toward a philosophy which says: Mind the dog, and the bark will take care of itself.

When is Data More than Data?

This is why the statement 68 asCharacter is seen as a message being passed to a data item. The data item is the active party, because Data Knows What To Do. Data in Smalltalk is more than data. For every data item in Smalltalk, there is a list of actions that the data item can take. One of the things that a number knows how to do, for example, is to convert itself to an ASCII character and pass the character back as a message. That's what the number 68 does when it receives the message asCharacter. Other messages (three out of a great many) that an integer value understands, and can respond appropriately to, are:

These may look and sound like procedures to you, and they are in fact the "code" portion of a Smalltalk program. But the critical difference is this: They are considered to be part of the integer value. You cannot somehow reach in and pull a procedure called factorial out of an integer value. The two are welded together at the hip, hand in hand together for all time, forever and ever amen.

This more-than-data concept in Smalltalk has its own name, and that name is object. An object in Smalltalk is a piece of data and the things it knows how to do.

This is only weird for the first 13 minutes you think about it. (I timed it.) Why weld the code to the data? Easy: To keep the code out of trouble. Can a dog whistle? Can a teakettle bark? No. Yet we Pascal guys get in trouble with the compiler all the time, trying to pass character values to the Abs standard function, trying to take the cosine of True, things like that. Matching data types to code that can legally manipulate data of that type is lots of trouble -- so Smalltalk ends the problem by gluing the two together.

Those actions an object can take in response to messages are called "methods." Every object has its crisply-defined suite of methods. And in Smalltalk, everything (bar nothing!) is an object.

This is what I was hinting at when I implied that Smalltalk statements looked normal -- until they started putting their clothes on. The arithmetic expression 17 + 42 looks the same in Basic, Fortran, Pascal ... and Smalltalk. However, through Smalltalk-colored glasses, this is what's really happening: The + message (arithmetic addition) is sent to the value 17. Hot on the heels of the + message is an argument -- in this case -- the value 42. The + message tells 17, "add yourself to the next thing coming your way, and return the sum." The next thing down the pike is the value 42. 17 adds itself to 42, and sends the value 59 back out again.

The value 59 is an object too. So, if you have something like this:

  (17 + 42) * 3

Smalltalk sees it as sending the + message and the argument 42 to the value 17, and then sending the * (arithmetic multiplication) message and the argument 3 to the value 59, which was obligingly returned by the 17.

Cynics might argue that this is all word play, and that an arithmetic expression is an arithmetic expression, not a bunch of numbers playing ping-pong with plus signs. And I'd have to admit, it is word play -- just as any computer language is an interplay of a set of symbols, a set of syntactic rules, and a semantic architecture. Smalltalk uses standard symbols (unlike some truly weird languages like APL) and a familiar set of syntactic rules. What's different is the semantic architecture -- but if you refuse to accept that architecture at face value, you're not playing by the rules, and I can only advise you to go sit in the corner.

My Object All Sublime

A lot of Smalltalk's reputation for weirdness comes from this tendency to anthropomorphize things like integer values. Objects know what to do -- their suite of methods is the collection of things they can accomplish -- and they do what they do in response to messages sent to them. One conjures up visions of a little purple number 7 reading a telegram and doing some quick pocket calculator work before sending a sum back by return wire. Just as we sometimes use the metaphor of a stack of china plates when speaking of stack-oriented languages like Forth, in Smalltalk we use the anthropomorphic zap on inanimate (nay, disembodied) entities like forty-twos and screen windows and text editors. It's a mnemonic device to remind us that data now runs things and takes action through the code, and not the other way around.

The anthropomorphic metaphor was stronger in the old days, when languages like Smalltalk were called actor languages because objects were seen as actors, each performing a script on cue. The term "actor" has fallen into disuse except in academe and in the title of another object-oriented language that I'll deal with in a future column.

No, the term at the center of the maelstrom these days is object. An object is the same concept in Smalltalk, Actor, C++, or the new Quick Pascal and Turbo Pascal 5.5: A data structure consisting of some number of fields (rather like the fields of a record) bound up with a suite of procedures that act on or will those fields in performing the work that the object must accomplish.

Bugs Sealed in Amber

This welding together of code and data is called "encapsulation." In Smalltalk the term is quite literal: An object's fields (called "instance variables" in Smalltalk jargon) are so thoroughly encapsulated within the object that other objects cannot directly perceive them. The closest familiar analog is the implementation section of a Pascal unit, where data can be defined that cannot be perceived from outside the unit, but only accessed by the code contained in the unit.

Smalltalk enforces this as strictly as Pascal units do. Only an object's methods may even know the names of an object's instance variables. To read the value of some instance variable, a method must be defined to return a copy of that instance variable, and a message must be sent to the object requesting a copy of that variable. No method, no copy, no knowledge that the instance variable even exists!

Now that's encapsulation.

Other object-oriented languages, as I'll explain in later columns, do not erect quite such impenetrable walls around their object's inner fields. The reason is pretty simple: Speed. Smalltalk imposes a tremendous amount of overhead in enforcing encapsulation. The benefits are significant -- side effects and "sneak paths" almost literally cannot exist in Smalltalk code -- but the costs in performance are high.

Encapsulation in Smalltalk is rather like potting instance variables in milspec black epoxy resin. You get into and out of the module through its terminal strip, period. C++ and Object Pascal do something a little more like blister-packaging under transparent plastic: The goodies can be seen and felt by the consumer, but direct manipulation is discouraged.

Family Resemblances

Encapsulation is a nice idea, but there's nothing in it (at least in the C++/Object Pascal sense) that can't be accomplished by traditional C implementations and good extended Pascals like Turbo Pascal. Code and data can be combined in Turbo Pascal just by placing procedural types as fields in a record along with data fields. This works well, and I've used it as a means of organizing programs in the past.

What really sets Smalltalk and other true object-oriented languages apart from the old school is that third Smalltalk architectural principle: Data Bequeaths Everything It Has To Its Children. This is the notion of inheritance, and I'd call it the single, most important aspect of object-oriented programming.

Pascal has something a little like inheritance. When you want to limit a data type to some subset of the values of another type, you can define a subrange:

  TYPE
     CharCodes = 0..255;

Here, we've defined a subrange of type Integer that embraces only the first 256 integer values. Values of type CharCode, however, really are integers, in that they may take place in integer calculations and be passed as actual parameters in formal parameters defined as Integer. CharCode variables inherit their integer-ness from type Integer, while taking on a new characteristic specific to type CharCode: The limiting of values to those between 0 and 255.

Now, broaden this notion by an order of magnitude and you'll begin to get the idea. A Smalltalk object can have child objects that inherit everything the parent object has. Typically, however, child objects either add to or somehow modify the instance variables or methods of the parent objects. You literally write code in Smalltalk by choosing an existing object or objects as the foundation of your application and creating child objects that modify the parent objects in a way that gets your work done.

Where Pascal has data types, Smalltalk has object classes, and inheritance works on classes rather than on individual objects. The real work of Smalltalk programming lies in defining new classes and writing their methods. A new class defined on the foundation of an existing class is called "a subclass;" the class from which a subclass is defined is the subclass's "superclass."

A class, like a data type in Pascal, is a template. You create Smalltalk objects by grabbing a class template and whacking out a new instance of that object class. That's Pascal-think, though -- in keeping with Smalltalk's anthropomorphic metaphor, it's more correct to say that new instances are created by sending a message to the class in question, requesting that it create a new instance of itself. Poof! The new instance happens.

Inheritance allows a second-level structure to be imposed on a program. Object themselves are structures, and object classes are related to one another within a structure-of-structures, called "an object class hierarchy." A portion (a small portion) of the Smalltalk object class hierarchy is shown in Figure 1. At the top of the tree is the class Object. Everything in Smalltalk is descended from Object.

Figure 1: A portion of Smalltalk/V's class hierarchy

  Magnitude
    Association
    Character
    Date
    Number
      Floater
      Fraction
      Integer
          LargeNegativeInteger
          LargePositiveInteger
          SmallInteger
    Time

One such something is Magnitude, including all objects that may take values that may be equal to, greater than, or less than other values of a similar class. The children of Magnitude include characters, numbers, times, and dates.

Distributed Functionality

This is all very handy for showing relationships among classes, but what is actually handed down through the hierarchy? The answer is object behavior; primarily methods that dictate what an object may do. "Object behavior is distributed throughout the object hierarchy at appropriate levels." This is a subtle, sneaky concept that won't necessarily make the lights come on until you've done some work in Smalltalk. Generalized behavior is defined early on in the hierarchy, up near the top. Objects modify the behavior of their parent classes as they need to, but modifying only what they need to, leaving general behavior intact where it is still valid.

As an example, consider Magnitude. Its methods define ordering and comparing functions that embrace anything that can be said to take on values that may be greater than or less than one another. One date or time can be greater than another, as can one number. The general behavior that all magnitudes can share is defined for class Magnitude. Behavior specific to dates or times is defined within class Date and Time. Numeric functions such as reciprocal, cosine, tangent, and so on would be meaningless as applied to time or date values, so they are defined in class Number and inherited by the different numeric classes such as Float and Integer.

The idea is not to duplicate any code needlessly. Internally, Smalltalk looks a lot like a threaded-code Forth system. Methods perform specific behavior, and then call parent methods to perform more general behavior, after which the parent methods call their parent methods to perform even more general behavior, and so on. As with Forth, there is a kernel of primitive methods written in assembly language upon which the rest of the language is built.

There are other, even more subtle consequences of inheritance such as polymorphism, which may in fact require a column all to itself. I'll come back to inheritance again and again; it is the backbone of object orientation and has more wrinkles than a cotton shirt in a hot dryer.

Talking Small

With very little fanfare, a product called Methods appeared in 1985 from Jim Anderson's Digitalk Inc. in L.A. Methods was, remarkably enough, a textbased implementation of Xerox's Smalltalk-80 specification. It may have been the first low-cost object-oriented language to ever appear on the PC, and not one programmer in a hundred had ever heard of it.

Methods grew into graphics overshoes and became Smalltalk/V a couple of years later. (The V is a vee, not a five ... ) At $99 it remains the least expensive object-oriented language of which I am aware. (Rumor holds that Quick Pascal will come in at $99, but I have no hard information on it yet.) The Smalltalk/V manual is excellent, and I think that the product represents one of the best ways to come to understand object-oriented programming. It's graphics-based, and the demo programs are very visual and lots of fun, with animated dogs (of which I am inordinately fond) bouncing around the screen in response to messages sent from the keyboard.

The Smalltalk/V product is 86-generic and runs on any DOS machine with CGA, EGA, VGA, or Hercules graphics. Digitalk also has a more advanced Smalltalk product for 286 and 386 machines, Smalltalk/V286, which provides better performance and a richer feature set (including much more room to work) and sells for $199. A Mac version is available, and provides an intriguing portability path between the two hostile camps.

The only other DOS-based Smalltalk that I know of is offered by ParcPlace Systems, a Xerox spinoff that is finally making some effort to put a Xerox Smalltalk implementation in the hands of the DOS developer. The Smalltalk-80 Development System runs steep ($995) and requires a 386. In fairness, I must admit that I've been using Smalltalk/V for two years and the ParcPlace product for only about a month, so I'll refrain from detailed comparisons. The price alone (and I feel price is important) takes the ParcPlace product down a few notches in my esteem. It's good, but it's not a thousand dollars good. It's a workstation product, ported from Unix, that has to stoop a little to make it under DOS, whereas Smalltalk/V was designed from the ground up to run in a DOS environment.

On the other hand, for those who care, Smalltalk-80 is the Real Thing, born out of the primordial soup that Xerox continually cooks but rarely allows others to taste. Its adherence to the Smalltalk-80 books (see product box) is closer than that of Smalltalk/V, and in fact ParcPlace considers those books its "real" user documentation. (The 3-ring binder document, sold with the product, is reference-oriented, heavily technical, and fragmented.)

If you want a taste of Smalltalk, or a taste of OOP, pick up Smalltalk/V. It's cheap and it works like a charm. The 286 product is there if you want more room and more speed. I'm hoping (but not expecting) that ParcPlace will port Smalltalk-80 to Presentation Manager soon, at which point the price becomes less of an issue. Anything that manages the complexity of an API like PM's is valuable, and for PM development Smalltalk-80 would almost certainly be worth the considerable price.

Those Legendary Smalltalk Books

Smalltalk is an anomaly in that it had superb documentation on the market long before there was an implementation that anyone could buy. A series of three books from Addison-Wesley appeared in the early 1980s, and two of the three are required reading for anyone interested in Smalltalk. The first, Smalltalk-80: The Language and its Implementation is the "white book" for Smalltalk, written by Adele Goldberg and David Robson. Beautiful, interesting, literate, and huge, the book defines the language and puts you on a sound theoretical footing. The other book, Smalltalk-80: The Interactive Programming Environment, by Adele Goldberg, describes the standard Smalltalk-80 environment implemented completely by the ParcPlace product and closely by Digitalk's. It's about browsers and editors and form tools, and is essential if you intend to work in the language. The Goldberg/Robson text, on the other hand, is sufficient if you're interested only in familiarizing yourself with the language's principles.

A third book, Smalltalk-80: Bits of History, Words of Advice, by Glenn Krasner, is meta-Smalltalk, that is, smalltalk about Smalltalk. It provides some fascinating history about the origins of the language, and liberal doses of hacker-heavy lore on how to bring up your very own implementation -- which is not something I would try to do in ten thousand years. The book is notable for its photo of the PARC NoteTaker machine, which was a 256K 8086-based spitting image of the Osborne 1, in regular use in 1978. Xerox really did invent and throw away the personal computer ... over and over and over again.

Addison-Wesley has since published a few additional Smalltalk texts, but none of them come close to any of these three in quality or completeness. Highly recommended.

The Downside

There's a lot more to say about Smalltalk, and I'll touch on it from time to time in these columns. I like the language a lot, and I credit it with preparing me technically for the arrival of this crowd of OOP steamships that I cataloged last issue.

On the other hand, Smalltalk will probably never cross the line to become a mainstream language, as the weavers of its legend have been harping for many years. The reason is purely practical: Smalltalk is by nature an interpreter, and unless everybody has the interpreter, the grass-roots critical mass of support among recreational hackers and part-time programmers just won't be there. IBM put Basic on the map by throwing a solid interpreter in the box with every machine they sold. Had they done that with Smalltalk, Smalltalk might be where Basic is today, or close.

Unfortunately, it's tough to write a Smalltalk, just as it's fairly easy to write a Basic. Furthermore, Smalltalk is sluggish on 8088 machines, in part because of its interpreted nature, but mostly because it is inescapably graphics-based. Digitalk might have done well to preserve their Methods product in dry ice for the current OOP craze as a text-based $49 loss-leader to get people knowledgeable about OOPs in general and Smalltalk in particular. (You listening, Jim?) Given a $99 price and some superb documentation from Addison-Wesley, Smalltalk/V is perhaps the best current environment to learn OOP principles ... but to apply those principles broadly you're going to have to move to a mass market language like Turbo Pascal.

Smalltalk does well as a prototyping tool, rather like a thinking man's Dan Bricklin for graphics apps. And if you've got a fast machine and you're in a position to work entirely within the Smalltalk environment, you can create a lot of powerful tools quickly. Still, a mainstream language it isn't, and I caution those of you who have been dazzled by the legend not to expect effort-free programming. Smalltalk is a computer language, really. Tools is tools. The magic, if anywhere, is in you.

Products Mentioned

The Smalltalk-80 Programming System ParcPlace Systems 1550 Plymouth Str. Mountain View, CA 94043 415-691-6700 $995.00 (requires 386)

Smalltalk/V Digitalk, Inc. 9841 Airport Blvd. Los Angeles, CA 90045 213-645-1082 General 86-family version $99 286/386 version $199

Smalltalk-80: Bits of History, Words of Advice by Glenn Krasner Addison-Wesley, 1983 ISBN 0-201-11669-3 Softcover, 344 pp. $19.95

Smalltalk-80: The Interactive Programming Environment by Adele Goldberg Addison-Wesley, 1984 ISBN 0-201-11372-4 Hardcover, 516 pp. $29.95

Smalltalk-80: The Language and its Implementation by Adele Goldberg and David Robson Addison-Wesley, 1983 ISBN 0-201-11371-6 Hardcover, 714 pp. $34.95


Copyright © 1989, Dr. Dobb's Journal