PROGRAMMER'S BOOKSHELF

Subatomic Programming

Andrew Schulman

Most programs are written in a high-level language, not assembly language, but the authors of these programs are generally at least dimly aware that, below the surface, their high-level-language statements such as p = x "turn in" assembly language statements such as MOV AX, [BX]. Even introductory books on computing always seem to include a picture of a funnel, with LETs and GOTOs flowing in the top, and MOVs and JMPs dropping out the bottom.

It is probably a sign of progress in computing that most of us view these MOVs and JMPs as atomic operations. That is, they don't "turn into" anything, except perhaps the "0s and 1s" to which introductory computer books like to vaguely refer. For the majority of programmers, the actually enormous complexity underneath the surface of an "atomic" assembly language statement like MOV AX, [BX] can remain a total mystery.

Nonetheless, it is worth having an appreciation for what makes up these supposedly simple operations. In addition to the pure enjoyment of knowing a little more about the machine, an appreciation for its subatomic particles -- things like bus cycles, memory access time, instruction prefetches, pipelining, DRAM refresh, timing issues, cache management, wait states, DMA, and bus arbitration -- may become more important as microprocessors become faster and more compact. As Intel's new 386SL chipset shows, even a seemingly lowly issue like power management can take on great importance when computers get small enough.

This month, we will examine three books that take us beneath the valley of assembly language.

Zen of Assembly Language

Michael Abrash's oddly titled Zen of Assembly Language is a good place to start. Chapters 3, 4, and 5 in particular deal with what he calls "the raw stuff of performance, which lies beneath the programming interface, in the dimly seen realm populated by instruction prefetching, dynamic RAM refresh, and wait states, where software meets hardware" (p. 75).

Interestingly, Abrash's goal is actually to show that we can't totally understand this level. "The exact performance of assembler code over time is such a complex problem that it might as well be unsolvable" (p. 114). He shows all instruction timings are relative. In one example code sequence, the SHR instruction takes eight-plus cycles to execute, and in another it takes only two. Thus, "the only true execution time for an instruction is a time measured in a certain context, and that time is meaningful only in that context" (p. 91).

In other words, "there's no way to be sure what code is the fastest for a particular purpose"; one must "write code by feel as much as by prescription." Apparently such thoughts are what inspired the "Zen" book title. "How can it not be possible to come up with a purely rational solution to a problem that involves that most rational of man's creations, the computer?" he asks (p. 113), yet the answer is, at this subatomic level, that the order and duration of events is unknown.

In a particularly nice demonstration, Abrash hooks a logic analyzer up to the 8088 and PC bus, and examines the following simple instruction sequence:

  
i   db 1   
j   db 0   
mov ah, ds:[i]   
mov ds:[j], ah

The result is a timeline of "170 Cycles in the Life of a PC" (pp. 119-121), in which we see the 8088's execution unit load up opcodes from the instruction prefetch queue, the bus-interface unit reload the instruction queue from memory, the occurrence of DRAM refresh reads, wait states, and so on. And we see even these simple instructions behave differently (execute at different speeds) at different times.

Abrash's own conclusion is "code execution isn't all that exciting ... it's awfully tedious, even by assembler standards. During the entire course of the figure only seven instructions are executed -- not much to show for all the events listed." Abrash's point is that such a "microanalysis ... is not only expensive and time consuming, but also pointless."

Yet, for most readers this is the most fascinating part of the book! Abrash's book can be used, not only as a guide to assembly language performance issues, but also as a fine explanation of what really happens "inside" a MOV instruction.

Of course, the title is slightly misleading, in that he is talking about Intel assembly language, not assembly language in general. Furthermore, the focus is far too much on the 8088 than seems appropriate now that the baseline PC machine is 80286-based. Abrash takes a perverse pleasure in the poor quality of the 8088, because clearly the worse the chip, the more one needs assembly language optimizations! However, he does devote an entire chapter to "Other Processors" (the 80286 and 80386); this chapter alone is worth the price of the book.

And perhaps the book's 8088 focus may not be so off base, after all. Abrash points out, "If you're going to go to the trouble of using 80386-specific features, thereby eliminating any chance of running on PCs and ATs, you might as well go all the way and write 80386 protected-mode code" (p. 716). In a book on real-mode programming, then, perhaps there isn't much to say on the 80286 and 80386. "The protected-mode 80386 is a wonderful processor to program, and a good topic -- a terrific topic -- for some book to cover in detail, but this is not that book" (p. 717).

Even Abrash's entire chapter on the 8080 (!) is not so out of place for the 1990s. "You no doubt think you've seen the last of the venerable but not particularly powerful 8080. Not a chance. The 8080 lingers on in the instruction set and architecture ... Although it may seem strange that the design of an advanced processor would be influenced by the architecture of a less capable one, that practice is actually quite common" (p. 266). As a result, even the spiffiest 486 has many features in common with the 8080, a glorified calculator chip. This chapter of the book ("Strange Fruit of the 8080") makes particularly enjoyable reading, because it shows how minor engineering decisions live on for many years. A frightening thought.

Structured Computer Organization

Our next book, Tanenbaum's Structured Computer Organization, may not at first seem relevant. What does this venerable (now in its third edition) computer architecture textbook have to do with the bizarre world Abrash describes?

Tanenbaum takes us to some of the levels below the odd enough level of bus cycles and instruction prefetches. Chapter 3, "The Digital Logic Level," is a superb examination of everything from NAND gates to the construction of latches, flip-flops, and registers, up to memory and buses. Furthermore, this is no abstract discussion of a hypothetical machine. Throughout the book, Tanenbaum uses the Intel 80x86 and Motorola 680x0 families as his running examples. For example, this chapter contains a discussion of the IBM PC and AT buses. The internal workings of "a typical IBM PC clone" are described at the chip level, and a circuit diagram is given and discussed at length.

Tanenbaum's book is based on "the idea that a computer can be regarded as a hierarchy of levels" (p. xv). Furthermore, each level, even the lowest "device level," corresponds to a language. "A central theme of this book that will occur over and over again is: Hardware and software are logically equivalent" (p. 11).

Chapter 4, "The Microprogramming Level," includes brief but useful studies of the microarchitecture of the Intel and Motorola chips. Many PC programmers will want to at least read the discussion (pp. 215-220) of the Intel 8088 microcode. I've never seen this discussed anywhere else.

Tanenbaum also has brief, but useful coverage of the issues of instruction pipelining, memory interface, and cache memory. I found myself wanting more on these increasingly important topics. One good book is: High-Performance Computer Architecture, Second Edition, by Harold S. Stone (Addison-Wesley, 1990).

One aspect of Tanenbaum's text that seems odd, at least with the benefit of hindsight, is his choice of OS/2, rather than MS-DOS, as the archetypal Intel operating system. True, "OS/2 has a surprisingly large number of features that are not present in UNIX and are well worth examining." But it makes no sense to write off MS-DOS with the comment that it is "an obsolete, primitive, and not very interesting system, despite its widespread use" (p. 372). Its widespread use is precisely what makes DOS intrinsically interesting. To say that something is "of great commercial importance" but "of little interest to us" (p. 373) seems like a bad way to educate engineers! Since "the OS/2 designers were not permitted to simply treat MS-DOS as a bad dream and start all over," it's not clear why anyone else should pretend they have such a luxury. Oh, well.

But, like Tanenbaum's other books, Computer Networks and Operating Systems, this one is nearly perfect.

80x86 Architecture and Programming

Finally, we come to Volume II of Rakesh Agarwal's 80x86 Architecture and Programming. As an odd reversal to the natural order, Volume I apparently won't be available for almost a year. Nonetheless, Volume II stands on its own as an indispensible guide to the 80286, 80386, and 486 microprocessors. In particular, Agarwal presents such a clear picture of the processors' operation in protected mode that one could probably use his extensive C code and diagrams to clone an Intel chip.

Agarwal presents extremely detailed C (and pseudo-C) code for each Intel instruction. These in turn use a library of functions such as LA_rdChk() (linear-address read), LA_wrChk() linear-address write), priv_lev_switch_CALL() (privilege-level switch), enter_new_task(), and the sickeningly complex read_descr() (read-descriptor).

The book also contains up-to-the-minute information on the 486 cache, hard-to-find details on the floating-point exception/NMI interface (and how it had to be faked on the 486!), a complete discussion of the undocumented LOADALL instruction, and similar goodies. Unfortunately, the book did come out too soon for inclusion of the deranged eight (count 'em) new address spaces added on the 386SL Super-Set chips.

If you've ever asked what really happens when you MOV ES, AX in protected mode, or how Windows 3.0 enhanced mode traps IN and OUT instructions using Virtual 8086 mode, this is the book to get. When you're finished, you may be sorry you asked, but that's a different story.