PROGRAMMER'S BOOKSHELF

Wake Up and Smell the Working Set

Andrew Schulman

For years, microcomputer programming has been dominated by small amounts of memory. On the PC, for example, a lot of programming was oriented toward working within or around the 640K barrier. Now in more and more places, these barriers are lifting. With the widespread use of Windows Enhanced mode, the standard microcomputer has megabytes of readily accessible memory, a larger pool of virtual memory, and even a flat-memory model.

Yet, in the same way that a citizen of the former Soviet Union might still hang on to a Lenin pin, programmers still cling to the old ways. In Windows programming circles, for example, the little discussion of performance one finds seems to be dominated by memory-management considerations that don't make much sense anymore. Windows Enhanced mode, OS/2 2.0, and Win32/NT are all demand-paged virtual-memory systems. Yet the majority of Windows programming books are still filled with dire predictions of what will happen if you don't keep your segments discardable, movable, and small.

The PC world isn't 640K anymore, and in more and more places it's not chopped into 64K pieces anymore, either. It's really time to wake up, smell the coffee, and throw out all the old baggage.

Oddly enough, though, we don't really need any "new ideas." In fact, with the increasing popularity of flat-memory models and demand-paged virtual memory, it's time to dust off your old college textbooks. Why? Because PC systems are finally starting to resemble the way we were taught computers are supposed to work!

Except that, rather than dusting off your old textbooks, I would suggest picking up a new one.

About halfway through writing an article on demand-paged virtual memory in Windows Enhanced mode for Microsoft Systems Journal, I realized that if the article was going to have any substance at all, it would have to discuss (or at least be based on some awareness of) virtual memory in general, not just the way it happens to be implemented in one mode of one version of one Microsoft product.

So I started going through my book collection, looking for background reading on demand-paged virtual memory. Many of the books Ray Duncan and I have reviewed in "Programmer's Bookshelf"--Dewar and Smosna's Microprocessors: A Programmer's View (reviewed in DDJ, September 1990), Hennessy and Patterson's Computer Architecture: A Quantitative Approach (October 1990), and Tanenbaum's Modern Operating Systems (May and June 1992)--discuss virtual memory. There's a ton of literature available on this subject.

The nicest discussion of the subject, though, and the most useful to a programmer rather than a chip designer, was the 25-page section on virtual memory I found in a book from 1990 that we somehow haven't reviewed here yet: Harold Stone's High-Performance Computer Architecture. It may be a little strange to examine this book now, particularly since Hennessy and Patterson's book seems to have blown away everything else in this field. But I was surprised to find Stone's discussion of virtual memory and other topics--including cache memory, pipelining, and multiprocessors--gave me more of what I as a programmer actually needed to know than Hennessy and Patterson's wonderful, definitive work.

Why should a programmer care about this stuff in the first place? After all, demand-paged virtual memory is supposed to be transparent! You can dereference a pointer to access a page of memory, even if that page is currently on disk; a page-fault handler within the operating system will take care of loading the page without you being aware of it. Obviously, the presence of virtual memory is no more relevant to your average programmer than is, say, the presence of an instruction prefetch queue on the processor.

Unfortunately, disk access is several orders of magnitude slower than access to main memory. Consequently, there is one overwhelming reason why programmers must understand the workings of "transparent" virtual memory: performance.

The reason for most programmers to study virtual memory, then, is so they can understand its performance implications for their software. Stone's High-Performance Computer Architecture does a great job of drawing out just these implications. Rather than merely describing how virtual memory works, he presents a detailed performance model in terms understandable to applications programmers as well as systems designers. The very simple "working set" concept is key here, and is of course discussed in every other book on the subject, but somehow Stone manages to convey this concept in a way that is genuinely helpful rather than merely informative.

For a few years, programmers working with systems such as Windows and OS/2 were worrying themselves silly with rules about segment sizes and segment attributes. One venerable Microsoft University lecturer pounded into the minds of an entire generation of Windows programmers the need to, "Keep your segments as small as possible, as discardable as possible, and as unlocked as possible."

Well, in Windows Enhanced mode, in OS/2 2.0, and in Win32/NT, all this pretty much goes out the window. What replaces these old, out-moded ideas of how to get good performance? The even older notion of "working set." We had a period of a few years in which we had to do memory management and the like in very odd ways, and that period is now thankfully coming to a close.

"Working set" is really just the 90/10 rule expressed in a different way. The 90/10 rule states that you will spend 90 percent of your time working over 10 percent of your code. But it also states that 90 percent of the software's running time occurs in only 10 percent of the code. This is the whole basis for virtual memory: Potentially, a program can run at full speed with only 10 percent of itself--or whatever the working set is--loaded into memory at any given time. Unlike that nasty segment stuff, the programmer does not specify any of this in advance. The operating system "discovers" a program's working set on-the-fly, through page faults.

As Stone shows, paged virtual memory depends on the fact that all programs have reasonably sized working sets or "footprints;" that is, that all programs can run for a while with only discrete, page-sized bits of themselves in memory at any given time. All programs?! Well, that's the problem: A virtual-memory operating system can't know in advance how all programs, or even how one program, will behave. All it knows is the probable behavior of the average program. The average program will behave well under virtual memory.

Your program may not, however, if the way it accesses memory doesn't correspond to the model that virtual memory is based on. The model is simple: If your program accessed x[i] at time t, then it is very likely to refer to x[i+1] or x[i- 1] at time t + 1.

What began as a simple statistical description of the behavior of programs has now been turned into a prescription: Your software had better behave this way. Hence the great relevance of the section on virtual memory in Stone's book, such as his discussion of "Improving Program Locality," to programmers interested in getting decent performance in Windows, OS/2, or any other demand-paged system.

I've focused here on virtual memory, but this is just one section of High-Performance Computer Architecture. The 100-page chapter on memory-system design is actually largely devoted to a detailed analysis of cache memory: cache analysis, cache writes, replacement policies, performance metrics, and so on. A lot of the cache-memory discussion sounds just like the virtual memory discussion, except for one small thing: Virtual memory involves hitting the disk, and disks are very slow. This one point makes virtual memory and cache memory fundamentally different. Other chapters in the book discuss pipelining, numerics, vector computers, multiprocessing, and multiprocessor algorithms. The chapters on multiprocessing are noteworthy for their sensible position that, until the communication and synchronization overhead of multiprocessing is reduced, multiprocessor systems are likely to involve just a handful of processors, not the 1000-processor behemoths one might imagine.

I read the second edition of this book when it first came out in 1990 and, frankly, I didn't get much out of it at the time. Yet, as I've tried to indicate here, when I picked it up again in late 1992, much of it seemed amazingly relevant to my daily work. Material that once would have seemed unfortunately irrelevant to daily PC programming practice is becoming more important every day. Why? Because the 32-bit Intel architecture and the operating systems sold on top of it are becoming more and more like other 32-bit systems every day.

In fact, Intel might even come to regret the day it started pushing 32 bits, because 32-bit code is portable to other architectures in a way that segmented 16-bit code never was. Well, that is pretty unlikely, but certainly PC programmers will increasingly be able to benefit from the lessons learned on other processors and other operating systems, and from textbooks such as High-Performance Computer Architecture.