STRUCTURED PROGRAMMING

You Can't Go Home Again

Jeff Duntemann, KG7JF

"I am an American, Chicago-born; Chicago, that somber city."

So said the hero in the opening line of Saul Bellow's novel The Adventures of Augie Marsh -- words that I may say with equal truth. I thought of poor Augie last week as I stumbled out of the State Street subway entrance into a howling 30 MPH wind and a -40 degree windchill factor.

And somber; you don't know somber until you live under one of those garage-floor gray skies that descend in November and don't lift again until May. I lived under those skies for 26 years, and it has now been 12 years since I have spent more than an odd weekend there.

It's a sobering experience, trying to go home again. You can't, of course; nor am I the first one who has said so. The Chicago I had called home was gone, and although familiar pieces were scattered all over the map, the general impression was that the Windy City had simply blown itself away.

Whole blocks of downtown where I used to repair Xerox machines have been razed, to make way for massive postmodern skyscrapers and stupendously ugly government monuments. Familiar stores have vanished, and new strip malls are everywhere. The Jefferson Park subway, which I rode on its very first day in 1970, now looks grimy, battered, and tired.

I ate Bay's English muffins and Salerno cookies, and had lunch at Superdawg on Milwaukee Avenue, and little by little realized that home had better be where you are, or it's nowhere. It isn't just that home is gone, for it is -- but the you that lived there is gone too, continually reshaped into another being by forces that work gradually and never quite show themselves. The little house on Clarence Avenue where I grew up now looks tiny; its entire first floor would fit neatly inside my Scottsdale garage. It's no smaller than it was when I lived there, and I'm no larger ... but my sense of perspective has been forever altered by the hills of Baltimore and the cliffs of Santa Cruz.

Portability Nostalgia

There's an increasingly vocal contingent in our field that's been demanding that we all go home again, where home is that fabled academician's Erewhon, portability. I've fielded some interesting threads on the nets, hollering that if that nasty old Turbo Pascal hadn't messed with the pristine Pascal Standard, all our code would be portable and we'd all be happily Home.

It's characteristic of this argument (which has been cluttering up discussions of programming for many years) that what we want is always referred to as "portability." Nobody ever says that what we want is for one piece of source code to compile and run identically on all compilers for all machines -- even though when pressed, most will admit that that's what "portability" is supposed to represent. I don't know about you, but spelled out it sounds pretty dicey to me.

In this interpretation, portability in Pascal is impossible, period -- unless you limit yourself to programs that don't do much, like the programs you generally write in college. College programming exercises are throwaways that teach a lesson and then become extraneous. Sure, you can write a program in ISO Standard Pascal that creates a linked list, sorts it, and then writes the sorted list to Output. But I dare you to do this: Open files INPUT1 and INPUT2 for input and OUTPUT1 for output, and then merge the two input files to the output file. You can't do it because Standard Pascal has only two logical files, Input and Output. Not to mention the fact that Standard Pascal has no way to associate a physical filename with a logical file once the program has begun running.

Let's not even talk about doing a binary search on a sorted index file on disk. Seek? What's that? Not Standard, mon.

Syntax and Semantics

Forgive me for railing. I just want those two-bit book floggers to put a sock in it and quit praising Standard Pascal as the quick road Home. Portability is an intriguing topic that deserves better treatment than the nutcase discussions I've heard. Let's explore the notion for a bit.

For starters, what would it take to realize the ideal of portability? What would we have to have to allow one single source code file to compile and run identically on all compilers of a given language on all machines? Too many people place all the blame on the language itself, but the problem is much, much bigger than that. In my analysis, what we need are two things: standard syntax across language implementations, and standard semantics across platforms.

Syntax first. The biggest barriers to syntactical portability in structured languages, oddly enough, are often the designers of those languages. I've gotten some nastygrams from the nutcases for criticizing Niklaus Wirth in these columns, but whether he realizes it or not, Dr. Wirth is as much to blame for the lack of Pascal portability as anyone else.

It comes down to this: He designed a language, and stopped there. He did not specify a set of standard libraries. There are a handful of fundamental omissions in Pascal, mostly connected with file I/O. (The addition of Assign, Seek, and Erase to ISO Pascal would quell about 40 percent of my objection to that nonlanguage.) But most of the problems in providing syntactic portability in Pascal lie not with the language itself but with the absolutely essential libraries that provide things like access to the underlying system, string support, and detailed file management.

Wirth has stated that he expects the programmer to develop his own libraries and to recompile them on every platform he moves his application to, and does not see any particular need for any set of standard libraries.

This is unrealistic. String support, time/date support, and file management are so universally needed that forcing every programmer to create them from scratch is a titanic waste of manpower. Language vendors recognize this, and that's why Turbo Pascal comes with its own units such as DOS and Crt.

If Wirth had simply spent a few more weeks and defined a spec for libraries containing the most needed procedures and functions in common programming tasks, Pascal would be a great deal more portable today than it is.

Half a Loaf

Modula-2 people are reminding me inside their heads right now that Pascal was just an exercise for Wirth to prove the value of structured programming, something so ingrained today it seems incredible that anyone would ever doubt it. In defining Modula-2, Wirth did in fact define a few standard libraries, making Modula-2 infinitely more amenable to syntactic portability than Pascal. However, the emphasis here is on few. Modula-2's standard libraries are strictly half a loaf. What we need, in fact, is something on the order of the standard function libraries defined for ANSI C. As much as it galls me to admit it, ANSI C and C++ 2.0 are now much more portable than any flavor of Pascal or Modula-2, largely because of the breadth of the ANSI standard library spec.

My own research in C++ led me right to that conclusion: Early on I wrote some programs in Zortech C++, and when Turbo C++ came along I ported even the biggest one from Zortech to Turbo in about half an hour.

Going for Syntactic Portability

Achieving some degree of syntactic portability can be done according to these time-honored principles:

Use standard library routines wherever you can.

Avoid vendor-supplied language extensions whenever possible.

When you must use nonstandard language extensions, confine them as much as possible to mission-specific library modules.

This isn't an especially good prescription for Modula-2 programmers, and it's simply beyond hope for Pascal, because in Pascal there's neither a useful language standard nor any standard libraries at all. Principle #3 still has some validity, however, and if you choose to incorporate syntactic portability into your project as a design goal, you might consider these strategies:

Isolate direct references to hardware devices (modems, FAX boards, extended memory, and so on) inside modules. Don't sprinkle your 80,000-line application with hooks into some fourth-tier company's scanner interface board. This precaution is easy and simple prudence; do it whether you need portability or not. Hardware devices come and go like the wind, and over the life of an application you may have to change FAX boards or scanner interface boards two or three times. Better still, create some sort of installable device driver system for such things so that changing the supported device doesn't require recompilation of the application. (Unfortunately, portable mechanisms for loading code at runtime don't exist in Pascal, and will require some considerable calisthenics in Modula-2. If anyone has done this, drop me a note.)
Create an intermediate layer module between the standard language bulk of your application and calls to vendor-specific extensions to the language standard. This works best when you anticipate moving to another compiler that has most of the same functionality in its extensions but simply implements them in a slightly different way. The intermediate layer module isolates all interface to the language extensions, and when port-time comes, most of the work to be done will be done in that intermediate layer.
In other words, to reposition the cursor, don't call Turbo Pascal's GotoXY routine directly. Create a routine in the intermediate layer named CursorXY, and then implement CursorXY this way:
```
   PROCEDURE CursorXY(X,Y: Integer);
   BEGIN
     GotoXY(X,Y);
   END;
```
The intermediate layer will use Turbo Pascal's Crt unit, but the modules comprising the standard portion of the application will make no reference to Crt at all. All video and DOS access will be through the intermediate layer. Later on, when you implement the intermediate layer module for another platform, replace the GotoXY call in CursorXY with the platform-specific call that positions the cursor on the destination platform. Your application only calls CursorXY, and the intermediate layer handles the translation to the specifics of the current platform.
I've seen this done effectively in moving between DOS Turbo Pascal and character-mode Unix Pascal. The downside is that the layer can eat performance significantly if carelessly done, and will always slow you down at least a little. And for ambitious applications, that intermediate layer module can get enormous. It's a clunky thing to do. But it may be the only thing you can do.

Don't use Pascal or Modula objects. The OOP soup is still bubbling. OOP standards are not even in the talking stage for these languages. Everybody's doing things differently. If you insist on using objects, consider Smalltalk -- which, as I'll explain a little later in this column, has it all over C++ for portability.

The Platform Problem

Sounds grim, this reaching for syntactic portability. But wait, it gets worse. Syntax, in fact, is a minor problem, solvable by doing enough somersaults and sticking sufficient mediation between the standard language and the machine it runs on. The real headaches come from elsewhere, notably, the fact that not all platforms are created equal. To ease into that discussion, a little history:

Long ago, there was a brave attempt at ideal portability called the "P System." It came out of the University of California at San Diego (UCSD), was really big for about half an hour in the middle of the CP/M era, then pretty much died its first death once the IBM PC appeared on the scene. In the mid-eighties it was purchased by a new vendor and resurrected for a while, but its second death soon followed.

The P System was pretty amazing in its day. The vendors could implement it on any damfool machine they got their hands on in only a week or two, and it was available for a lot of different machines. And lo! You could take object code compiled on any P System machine and run it on any other P System machine, regardless of CPU or how different the two hardware implementations were.

The P System was in fact an operating system, but more than that, it was an operating system written for a virtual CPU; that is, a CPU that exists only as a software simulation written to run on real silicon CPUs. Its registers were memory locations and its microcode was implemented in the instruction set of the host CPU. In effect, the "P-Machine" (P for "pseudo") executed an interpreted assembly language. The P-Machine supported a suite of virtual opcodes, and these opcodes were executed by calling short sequences of silicon opcodes that taken together provided the function of the virtual opcode.

Alas, I've long since dumpstered my P System documentation, but as I recall, most of the virtual instructions took two or more silicon opcodes to implement. For example, if the P Machine's pseudoregisters were kept in memory locations, then executing the virtual opcode to move one register to another would require executing the silicon opcodes that moved one memory location to another -- which for the 86-family meant moving memory into a register and then moving the register back out into memory. This proved the undoing of the P System; it invariably gobbled about 50 percent of the performance of the machine in an age when the machines were none too powerful to begin with.

Nonetheless, this made for tremendous portability, executed at what amounted to the microcode level. The P System's compilers and other utilities were binary files of virtual opcodes or pseudocode (a term generally shortened to P- code) meant to be executed by the P-Machine. The P-Machine was the only part of the system specific to a particular silicon CPU. Porting the P-Machine to a new silicon CPU only required rewriting the P-Machine's "microcode" as required by the new silicon CPU.

(An interesting sidenote to the P System concerns Western Digital's late seventies attempt to speed the P- code execution by creating a multichip silicon CPU whose instruction set was in fact identical to the P-Machine's virtual instruction set. This "Pascal MicroEngine" eliminated the interpreter layer and allowed P- code to execute directly on the CPU as native code. Alas, the MicroEngine was faster than an interpreted P-Machine, but turned out to be quirky and only about as fast as a good CP/M machine -- while costing about twice as much. No one seemed to think portability was worth a 100 percent cost premium, and I hardly blame them.)

I'm spending a lot of time on the P System because it's a good illustration of a solution to the portability problem -- and a warning to people who gloss over the importance of performance competitiveness in our industry. (And also because somebody is trying the very same thing again today, in an unexpected way, and with considerably better chance for success. See if you can guess who, and for what language, before I describe the effort later in this column.)

The P System succeeded at its nominal goal of providing binary-file portability across all platforms. Most people credit this success to the P System's use of a common, identical language syntax (UCSD Pascal) on all these platforms. This isn't quite half true. A common language syntax helped, but what really made the P System work was its way of providing identical platform semantics on all the supported platforms. Therein hangs a lesson few people have learned.

Platform Semantics

Compared to language syntax (which is just an orderly convention for hanging language elements together) language semantics are much harder to define. In fact, "language semantics" is a misnomer. "Semantics" deals with what things mean, and the semantics of a programming language is the description of what a language's statements mean in the context of a specific underlying machine.

Moving a screen cursor can be as syntactically simple as the statement GotoXY(X,Y). What executing GotoXY(X,Y) accomplishes (in effect, what the statement means) depends on what sort of cursor/video setup a given machine has. On a text screen, X,Y specifies a character position, where a single character may exist without existing in any other position, and overlapping no other character. On a graphics screen, however, X,Y specifies a pixel position, which may fall within two or more overlapping graphics characters. On graphics systems you can't say "the character at X,Y" because there may be no single character at X,Y.

These differences are differences in semantics, and because they depend on the specifics of the underlying platform, I call them platform semantics.

Other examples of differences in platform semantics: The support for multiple mouse buttons in some platforms, compared to Apple's militantly defended insistence on a single mouse button for the Mac. (Users are too dumb to handle more than one mouse button, dontcha see?) Hard-code handling for the right (or middle) mouse button into your app, and you have a difficult question when moving the app to the Mac. What becomes of that right mouse button event?

Here's one of my favorites: The use of color in one platform versus a monochrome platform with no gray scales. Or: Porting a multitasking app to a single-task platform.

And of course, there are a multitude of little piddly differences between platforms that individually may not seem very serious, but when taken together with all their interactions, will make you tear your hair out.

Least Common Denominator Porting

The traditional way of dealing with platform differences is simply to see what both platforms have in common, and use only what those platforms have in common, on both platforms, ignoring the additional features of the more advanced of the two platforms. That this is wasteful in the extreme should be obvious, as evidenced by the mechanism through which Turbo Pascal Macintosh allowed character-mode PC programs to run on the Mac: by making the Mac a character-mode machine. This did not enthrall Mac owners.

The P System handled platform semantics by being the platform on every machine. The P System was a disk operating system and a set of screen management conventions. Your P System applications could only use those disk I/O and screen management features supported by the P System. Anything else had to be done by circumventing the P System (through escape sequences or direct ROM calls or somesuch) which, of course, rendered an application nonportable.

Of course, back then this was less of a liability, since machines rarely had much of anything useful in ROM, and the P System often offered lots more than your typical CP/M system could offer the user.

The other factor that killed the P System was that it had very primitive disk space management. There was no File Allocation Table. Disk files used contiguous blocks of memory, and when you deleted a file you had a "hole" on disk that only a file the same size or smaller could use. Eventually you had a disk carved up into multitudes of useless slivers, and had to perform slow manual "garbage collection" to gather free space back into a contiguous block. Even CP/M did better than that.

The P System was widely used in schools, and it provided a level of portability never equalled to this day. This is why lots of academics yearn to go home to the "good old days" when all software was portable. They seem to have forgotten that all software was portable because the P System made all machines equally clumsy and limited.

The Ghost of the P System

I had despaired of portability for many years because of the problem of platform semantics. What would all of the GUI marvels of the MAC mean when translated (clumsily) to text mode on the PC? Every Unix vendor had a grossly different set of networking and UI assumptions, and nobody took the need for a common binary code format seriously. This is the sole reason Unix blew its one chance to become the platform for desktop computing. DOS is in the saddle now, and Unix will forever be a niche OS.

I decided to write this column, however, because the diverse paths are beginning to converge again. The Mac and Windows are alike enough (thanks to Xerox's seminal research and no thanks at all to lawyer-crazed Apple) to make portability between the two platforms at least possible. There are plenty of semantical hangups to be overcome, but not so many as to warrant despair.

Unix is now coming around to agreement on X Window as the underlying windowing architecture, but true to form, those ever-so-righteous dudes can't decide on a UI. By the time they choose, Unix probably won't matter anymore -- but if portability to Unix is important to you, the path to either Open Look or Motif is plain. (I recommend Motif.)

The Mac, MS Windows, and X Window platforms have now grown close enough semantically and the underlying machines powerful enough to support another stab at the P System concept. Sure enough, somebody is trying it, and this time it might just work. ParcPlace Systems is doing it with their Objectworks/Smalltalk Release 4 product, a Smalltalk development environment tailored specifically to overcome differences in platform semantics while providing a very clever common binary code format.

Objectworks/Smalltalk confronts differences in platform semantics in a number of ways. In general, Object works makes use of platform facilities when it can, and fills in the gaps itself on lesser platforms, to support the Objectworks UI and tools. This "greatest common denominator" solution requires lots of memory and compute power, but since Objectworks starts at the 386-class machines and goes up from there, the power it needs should be available.

The problems of color and aspect ratio are handled by something called the Smalltalk Portable Imaging Model (SPIM) to create graphics images that look identical on any supported platform. SPIM supports device-independent "true color" so that color representation will be consistent on each platform without adjustment. (I confess skepticism on this one. We'll see.) Country differences in character sets and alphabets are handled by using 16 bits to represent each character.

Native Code Binary Portability

The kicker from a portability perspective is that Objectworks goes the P System one better: What runs on each platform is not slow interpreted P-code, but true native code, regardless of where the application was originally compiled. In other words, if you compile your Smalltalk app on the Mac and run it as 68000 native code you can take the very same compiled file to a 386-based PC system, load it, and run it as 386 native code. Or run it on a SPARCStation as SPARC native code, and so on.

This is a good trick. What happens is that Objectworks first compiles your application to machine-independent intermediate code, called byte code, which is the analog of the P System's P-code. You can interpret the byte code from the Objectworks environment, which provides numerous debugging tools that work specifically on byte code. The byte code file is truly platform-independent, and you can haul it around your company dropping copies on any platform where Objectworks has been installed.

However, when you finally go to run the application as native code, Objectworks (quickly) compiles the intermediate code to native code, and caches the native code in memory. As long as the compiled native code image remains in memory, the final compilation step need only be done once. What runs is native code, really and truly. And this is how native code binary portability happens using Smalltalk.

My Portability Prescription

Yes, this sounds mighty good, but I had better add that I haven't tried it yet. The Windows 3.0 implementation of Objectworks is still in beta test and should appear this spring. Still, the ParcPlace people are very good at what they do and I expect that they will pull it off. They are, after all, the original Xerox Smalltalk team, spun off at last to a technology company that can get Smalltalk out there into the hands of the people who need it. Smalltalk now has the blessing of IBM and is being used in some extremely conservative DP shops, often by people whose only prior programming experience is in Cobol.

There are lots of questions a system like this raises: How good is the native code produced in that final, runtime compilation? How long does the final compilation step take? How is color handled portably among systems that don't support zillions of VGA colors? For that matter, how is color translated to monochrome Mac systems? I have way too much respect for the platform semantics problem to assume that there aren't still some rough edges here.

Just as surely, I have way too much respect for ParcPlace to think it's a sham. The cost is a little scary to us basement hackers, but if you're the vendor of a $5,000 vertical market package or a corporate MIS strategist, portability like this could be cheap at twice the price. And once ParcPlace proves that the technology is workable, other firms may try implementing such systems for other languages -- including (in my dreams, sigh) Pascal. I've seen P-code implementations of Modula-2 (well, M-code, they call it) so Modula could work there as well. Please keep me apprised of any such efforts if you hear of them.

I'll report further on Objectworks when I have a chance to play with it. From where I sit, it looks to me like your very best chance to incorporate true, absolute drop-in portability into your application design across the (no longer) impassible chasms dividing the PC, Mac, and Unix workstations.

Products Mentioned

Objectworks/Smalltalk Release 4 ParcPlace Systems 1550 Plymouth St. Mountain View, CA 94043 415-691-6700 $3,500

With that in mind, Jeff's Prescription for Portable Design cooks down to this: If you need portability badly enough, go whole-hog with a system that intelligently manages differences in platform semantics -- in essence, doing all the portability work for you. I suspect Objectworks is only the first such system. On the other hand, if you don't need portability that badly, don't bother with it at all. Make as much as you can of the platform you're most familiar with. Your customers will not like being treated as least common denominators. Trust me.

Perhaps you can go home again ... but going home now means going bigtime. The lesson of the P System remains valid: Let the language handle portability. And how many languages are truly big enough to do it? Only Smalltalk.

I have to smile.

Copyright © 1991, Dr. Dobb's Journal