PROGRAMMER'S BOOKSHELF

You Could Look it Up

Andrew Schulman

It has recently come to my attention that "no one reads computer books." In fact, my coworkers delight in telling me this, especially when I've just happened to mention that I was up the night before, writing one.

Okay, so maybe curling up with 80x86 Architecture and Programming, Programming Windows, Using C++, Computer Architecture: A Quantitative Approach, or any of the other books we've reviewed in these pages isn't your idea of a good time. Some of you may even be distrustful of all forms of prose, and just want to be given some sample source code and -- as one reader put it -- "the facts."

No doubt about it, you can get by without prose, that is, without explanations. But you can't do your job without hard-core reference material. It's time for "Programmer's Bookshelf" to take on the sort of books that programmers might actually keep on their desks and use every day: reference manuals.

The Oxford Dictionary

I'd like to start off with a somewhat unconventional manual, though: a dictionary of computing. Did you know that such a thing even exists? Even many of the writers I know don't seem to use one. But a good computer dictionary can be a remarkably useful tool.

The Oxford Dictionary of Computing, now in its third edition, has recently come out in paperback. Any reader of Dr. Dobb's will benefit from owning a copy of this handy, inexpensive volume. The scope of the book can be seen from the entries on one randomly selected pair of pages; see Example 1.

Example 1: Entries from Oxford's Dictionary of Computing

  Trojan horse
  Tron (real-time operating system
     nucleus)
  trouble shooting
  true complement
  truncation (see roundoff error)
  trunk
  trunk circuit
  trusted
  truth table
  TSR (see also hot-key)
  TTL
  T-type flip-flop
  Turbo languages
  Turing computability
  Turing machine (TM)

This also provides a glimpse at the wide range of the field of computing itself: everything from mathematical logic and combinatorics, security issues, electronics, real-time operating systems, and switching theory, to hacks and mass-market commodity compilers.

I use this book whenever I come across or need to use a term that, when I'm being honest with myself, I only half understand. For example, in a manual I recently wrote for Phar Lap, I needed to explain the difference between interrupt and exception. What a fool, you say: Everybody knows that! But try explaining it now, out loud. You probably have a fuzzy "sense" of the distinction between these two words, but not a precise definition. Well, that's what dictionaries are for.

The cross-references in a dictionary such as this are useful when you know only a little bit about a subject, and would like to learn a few of the key issues and maybe pick up a few of the key terms (perhaps so you can impress your coworkers). For instance, let's say that I am interested in learning more about data compression, but don't know (or can't remember) anything specific about it. The Oxford dictionary entry on "data compression" doesn't say a whole heck of a lot, but it does refer to the entries for "information theory," "reduncancy," and "source coding." Turning to the entry on source coding, I can read about variable-length codes, Huffman coding, Shannon-Fano coding, and Kraft's inequality. Most important, I see that source coding is contrasted with "channel coding." That, it turns out, is another term for error detecting and error correcting codes. Turning now to the description of error correcting codes, I find out a little bit about Hamming codes, Reed-Solomon, and simplex codes. Thus, simply by flipping through a few pages, a previous ignoramous has learned something about how data compression (source coding) on the one hand and error detection/correction (channel coding) on the other fit into the grand scheme of information theory.

Naturally, the Oxford dictionary has a somewhat academic bent. While surprisingly topical in some places (such as "TSR" and "Turbo languages"), in others you may glance at the definition for a term with which you really are familiar, and find that you have absolutely no idea what they're talking about! For example, while it is certainly reasonable for the definition of "regular expression" to make reference to formal language theory, the example given of a regular expression might have been more useful if it looked a little more like grep, and a little less like something out of a linguist's or logician's nightmare.

Naturally, as in any book, there are errors. The definition for "threading," for instance, gets completely confused between "threading," a technique used in interpreted programming languages such as Forth, and "threads," meaning lightweight processes.

In any dictionary like this, there is a danger of collecting a lot of formal sounding academic terms (such as "semi-Thue system"), and missing some of the more colorful jargon of computing. I found, however, that despite some unfortunate omissions ("lvalue" and "thunk" -- how could they leave those out?!), many of these phrases did make it into the Oxford dictionary: shell, pipeline, stub, execute, latch, lazy evaluation, Look and Feel, bit-slice, rollback, thrashing, garbage collection, remote procedure call, hash, clone, cache, and carry. Just skimming through a book like this will give you an appreciation for the vast number of key concepts -- contributions to human thought, in a way -- produced by the field of computing.

I can't think of a better way to spend $10.95.

Backup Dictionary

One thing is missing from the Oxford dictionary, however. While it has excellent coverage of the timeless truths of computing (such as Chomsky hierarchy, partial recursive function, and TSR), it's less helpful with some of the uglier manifestations of computing in the here and now. For instance, even after all these years, your average marketing weenie, sales manager, or software engineer still can't remember the difference between extended and expanded memory. It sure would be nice to be able to look these up somewhere, but the Oxford dictionary doesn't have them, nor, I think, should it. Such topics are simply too ephemeral (despite their surprising persistence year after year) to justify inclusion in a book of this sort. Likewise for terms such as "protected mode," "real mode," "Dynamic Data Exchange," and "resource fork."

So where do you turn for decent explanations of terms such as these? Your coworkers? No, they don't know what they're talking about! What you need is an additional, backup dictionary (remember, we've only spent $10.95 so far) like Microsoft Press Computer Dictionary.

The Microsoft dictionary has short, breezy, but genuinely useful definitions for many of the terms you come across every day. Again, even if you have some notion of what these terms mean, you will sharpen your understanding of them by keeping this book on your desk and using it a few times a week.

My one complaint about the Microsoft dictionary is that it often misses the richness of the concepts it defines. The definition of "virtual machine" is a good case in point. Whereas the Oxford dictionary defines a VM as a "collection of resources that emulates the behavior of an actual machine," going on to explain what this means by discussing processes, workspaces, and isolation, the Microsoft dictionary merely says that a VM is "software that mimics the performance of a hardware device," giving the not-quite-right example of running Intel-based software on a Motorola chip. The Microsoft definition seems to imply that any form of emulation constitutes a virtual machine; the Oxford definition focuses the definition properly.

However, the scope of the topics covered seems just about right, as indicated by the entries on another randomly chosen set of pages; see Example 2.

Example 2: Entries from Microsoft Press Computer Dictionary

  emulsion laser storage
  enable
  Encapsulated PostScript (EPS)
  encipher
  encode
  encryption
  end-around carry
  end-around shift
  en dash
  End key
  endless loop (see infinite loop)
  end mark
  end-of-file
  end-of-text
  end-of-transmission
  endpoint
  end user
  engine
  Enhanced Expanded Memory
    Specification
  Enhanced Graphics Adapter
  enhanced keyboard (101/102-key)

While I can't see myself looking up the definition of "endless loop" (I see enough of the real thing in my own code), certainly a brief explanation of EPS or the enhanced keyboard, or even a well-written paragraph explaining what the overused word "engine" is supposed to mean, is useful to have nearby.

H&S

If you're using C, the next reference book you must get is C: A Reference Manual, Third Edition, by Samuel P. Harbison and Guy L. Steele, Jr.

It has never been clear to me why, once the first edition of Harbison and Steele's book was available, K&R remained popular. Sure, every C programmer owes Kernighan and Ritchie an enormous debt for describing and creating what to many of us remains the world's most useful programming language. But, once you know C, the K&R book just isn't all that useful.

H&S is a book that every C programmer will use again and again. Now in its third edition, the book covers both ANSI, C and "traditional C." It also does a good job of mentioning odd-ball but important variants in the language, such as the far and huge keywords in Microsoft C and other Intel-based compilers.

One section that I have found particularly useful over the years is the lengthy discussion of the C preprocessor. Like everything else in H&S, the preprocessor is defined much more rigorously than in other C books. Perhaps this is because Steele is an outsider to the C community (he is codesigner of the beautiful language, Scheme, and of the big language, Common LISP), and therefore takes much less for granted than might someone from AT&T Bell Labs.

Considering that many (too many!) full-length books on the C runtime library are available, H&S's 100 page section on the C runtime libraries is surprisingly useful. The small blocks of sample source code shown are always illuminating. The explanation of the time and date facilities and of setjmp/longjmp is the best I've seen.

My one disappointment in the third edition was that H&S dropped an extremely nice package of functions and macros for set manipulation that had appeared in the second edition. Every now and then I used to take out the book and ponder the function in Example 3 for quickly computing the size of a set.

Example 3: H&S function for computing the size of a set

  typedef unsigned SET;
  #define emptyset    ((SET) 0)
  int cardinality (SET x) {
      int count = 0;
      while (x != emptyset) {
          x ^= (x & -x);
          ++ count;
          }
      return count;
  }

I still don't get how that works, but it sure is beautiful. I guess I'll keep my copy of the second edition too.

Finally!

For years, Microsoft Corporation has wished that its most successful product, MS-DOS, would just go away and die. This is a very strange thing for a corporation to want its cash cow to do.

But no matter how many times they try to abandon this cow, it keeps coming back. So Microsoft has decided to take care of it -- for a while, anyhow. As part of its recent release of MS-DOS 5.0, Microsoft has also brought out an "official" programmer's reference manual for DOS -- MS-DOS Programmer's Reference: Version 5.0.

You might think that having a widely available official reference for one's operating system is an obvious thing to do, but it is nonetheless a surprising and welcome move by Microsoft. It's all part of the company's coming to terms with the continued outrageous success of DOS. They seem to no longer find it technically interesting or challenging, but it just won't go away.

Of course, Microsoft Press also sells the standard references on MS-DOS: Ray Duncan's Advanced MS-DOS Programming, The MS-DOS Encyclopedia, and the incredibly useful MS-DOS Extensions. This new book, however, has no author (it was "written, edited, and produced by Microsoft Corporation"), has the word "official" on it, has a blurb on the back that says "Accept no substitutes," and makes no attempt to explain or teach -- it presents nothing but "the facts."

Thus, we now have Microsoft's official statement of what MS-DOS is. It's not as good as Ray's books, but if you do any kind of DOS programming, you're going to have to get this book too.

Naturally, MS-DOS Programmer's Reference includes the new memory management (INT 21h AH = 58h) and task switching (INT 2Fh AH = 4Bh) functions added in DOS 5.0. Furthermore, there is good coverage of various INT 2Fh subsystems, so at least now one knows what Microsoft considers to be part of MS-DOS. There is an assembly language STRUC for each DOS data structure, with a paragraph of explanation for each field.

The biggest surprise is that Microsoft has finally officially documented some of the most commonly used undocumented DOS functions. Of course, previous Microsoft documentation (such as the chapter on TSRs in The MS-DOS Encyclopedia) has mentioned these functions, but always with the proviso that "Microsoft cannot guarantee that the information in this article will be valid for future versions of MS-DOS," and always without including the functions in the standard INT 21h references. In other words, everyone knew about the functions and used them, but they weren't "supported." The previously undocumented INT 21h functions shown in Example 4 are now supported. Unfortunately, many crucial functions are still undocumented, but it's a start.

Example 4: Previously undocumented INT 21h functions

  Get Default DPB (1Fh)
  Get DPB (32h)
  Get InDOS Flag Address (34h)
  Load Program (4B01h)
  Set PSP Address (50h)
  Get PSP Address (51h)
  Set Extended Error (5D0Ah)

It's also unfortunate that version information is placed in a separate section of the book, away from the functions themselves, and that in some cases the version information is wrong or misleading. For instance, the important Get Startup Drive function (INT 21h AX = 3305h) is listed as available in DOS 2.0 and higher; in fact, the function is only available in DOS 4.0 and higher. (Have you ever tried to find from which disk a DOS 3.3 machine was booted?) Similarly, some of the previously undocumented functions have been documented by pretending that they are only available in DOS 5.0 and higher.

Errors really are unavoidable, but in any "official" reference, they can be exceedingly costly. For example, there is an extremely serious error in the documentation for the previously undocumented LOAD structure used with the Load Program function (INT 21h AX = 4B01h): The last two fields, 1dCSIP and 1dSSSP, are reversed. This appears in two separate locations in the book, and is the sort of thing that could cost someone days of lost time.

All in all, this is an absolutely essential reference for all DOS*programmers, but the "Accept no substitutes" slogan doesn't quite work. Being a pure reference, without any explanations or sample source code, some of the material here is simply not useable by itself. Using the task-switcher API, for instance, would require much more material than this book provides. Thus, MS-DOS Programmer's Reference should be viewed as the starting point for a whole new set of DOS programming books.

For Microsoft, too, the MS-DOS Programmer's Reference should be viewed simply as a starting point. Three nice spin-offs would be a disk with C and ASM header files, a QuickHelp version of the book, and a DOS test suite that exercised each function and demonstrated its proper use. DOS really is a cash cow, and it would be nice to see Microsoft do more to milk it.


Copyright © 1991, Dr. Dobb's Journal