Dr. Dobb's Journal April 1998
Once in a great while, a landmark computer-science book is published. Computer Architecture: A Quantitative Approach, Second Edition, is such a book. In an era of fluff computer books that are, quite properly, remaindered within weeks of publication, this book will stand the test of time, becoming lovingly dog-eared in the hands of anyone who designs computers or has concerns about the performance of computer programs.
Hennessy and Patterson are well-known researchers in the field of computation. Hennessy started the MIPS project at Stanford and is one of the cofounders of MIPS Computer Systems. Patterson led the design of the RISC-1 computer project and the RAID storage project at Berkeley.
Computer Architecture offers two significant benefits to readers of DDJ. First, it provides answers to those nagging questions such as: "Why don't those idiot chip designers just add more {registers}{cache}?" Second, once you have read it, you are in a fine position to win bets with other armchair computer designers on almost any hardware-related topic.
Seriously, the book will give you the ability to understand the fundamental issues that influence the design of modern computer systems. Moreover, it will provide you with the analytical methods you need to quantify real-world design choices.
It is difficult to summarize such a tome in a brief review, but a trip through the chapters will give a taste of the topics covered.
Computer Architecture opens with "Fundamentals of Computer Design," discussing the design task and issues of cost, performance, application domain, reliability, chip yield, technology trends, and time to market. The authors define terms such as "speedup" because "a few innocent terms have been shanghaied." They point out the peril of the arithmetic mean, the subtleties of benchmarking, and the "which CPU is faster" paradox. As the chapter moves into the realm of "Quantitative Principles of Computer Design," Amdahl's Law is introduced as a formalism for "make the common case fast." Then they examine CPU performance equations involving cycle time, clocks per instruction, instruction counts, and pipeline behavior. Discussion of locality of reference, the memory hierarchy, and the basics of cache behavior follow. The "Fallacies and Pitfalls" section debunks various performance measures including MIPS and MFLOPS rates, synthetic benchmarks, and peak performance. This section should be read by anyone who has uttered the words "performance" and "computer" in the same breath.
Chapter 2 ("Instruction Set Principles and Examples") concentrates on instruction set architectures. It covers register- versus memory-oriented designs, memory addressing modes, operand type and width, and the increasing interplay between optimizing compilers and computer design. The synthetic DLX architecture is introduced as a simple model of a modern computer design. We begin to see the virtues of RISC designs here.
Chapter 3 ("Pipelining") presents "the key implementation technique used to make fast CPUs," by showing a design for a simple DLX processor, then extending that design to use pipelining. This chapter is essential for understanding the issues involved in designing high-performance processors. Problems arising out of pipelining, such as clock skew, pipeline stalls, hazards, code scheduling, branch penalties, and exceptions are presented in a clear manner that belies their complexity. The complexity of these issues is underscored in the "Fallacies and Pitfalls" section, which shows how a more sophisticated pipeline design can actually reduce performance by limiting the maximum clock rate compared to a simpler design. Issues of this nature underlie the Intel Pentium versus DEC Alpha performance war.
Chapter 4 ("Advanced Pipelining and Instruction-Level Parallelism") extends the concepts presented earlier. Beginning with the observation that the place to seek parallelism is across loop iterations, the authors introduce pipeline scheduling, loop unrolling, dependences, scoreboarding, branch prediction, multiple issue, compiler support, software pipelining, predicated instructions, speculative execution, and more. The performance of seven SPEC92 benchmarks on the PowerPC 620 is analyzed in detail as a concrete example of the concepts presented earlier in the chapter. An example problem, typical of the sort of detailed analysis favored by the authors, compares the performance of a simple pipeline design, a deeply pipelined design, and speculative superscalar design. The latter two designs, surprisingly, turn out to have inferior performance compared to the simple design, due to the smaller caches and slower clocks required by their more complex circuitry. Those who think that throwing more circuitry at a processor design is a sure way to get higher performance should read this eye-opening chapter.
Chapter 5 ("Memory-Hierarchy Design") introduces cache memory, opening the door to this incredibly complex area of system design (the authors note that 1600 cache-related papers were written between 1989 and 1995). The book presents the fundamental equation for cache performance analysis, discusses methods for reducing the cache miss rate, cache miss penalty, and for reducing hit time. It then turns to main memory, emphasizing the growing performance gap between memory and processors. This leads to a discussion of methods for improving main memory performance, including interleaving, wider data paths, and memory banks. Virtual memory support is covered next, with a detailed analysis of the DEC Alpha 21064 and the considerably more complicated (for both designer and programmer) Intel Pentium.
In Chapter 6 ("Storage Systems") the stage is set with the sentence: "Input/output has been the orphan of computer architecture," after which the authors discuss disk, tape, buses, I/O performance measures, some simple queueing theory that makes it easier to construct performance models, the TP transaction processing and SPEC SFS benchmarks, reliability, availability, and RAID storage. As an example of the crosscutting issues mentioned later, this chapter notes the stale data problem of I/O versus cache memory. A section on I/O system design walks through several examples that will be of benefit to those application designers charged to create production systems that must meet stringent performance requirements. One such example determines the "cost per I/O per second of using small or large disk drives" with a set of specifications for the CPU, SCSI bus, controllers, various disks, and storage capacity requirements.
Chapter 7 ("Interconnection Networks") is about the most readable coverage of networks I have read. Starting with a simple model of performance based on overhead, message size, and bandwidth, the authors evaluate transmission media such as twisted pair, coaxial cable, fiber, and a car filled with tapes. Switch topologies, connection-oriented versus connectionless message switching, routing, congestion control, standardization, and fault tolerance are all presented in a clear, concise manner.
Chapter 8 ("Multiprocessors") offers a good introduction to the challenges of parallel computing, including discussion of communication bandwidth, latency, variations in parallelism in different application domains, synchronization, memory coherence models, and memory architectures and their implementation. A detailed example of the design and performance of a real system is given, using the SGI Challenge Multiprocessor as the underlying hardware. People who believe that building effective multiprocessor systems is merely the bolting together of several chips, and those who believe that writing high-performance parallel programs is easy, would do well to read this chapter.
The appendixes are almost books in themselves. The first one, "Computer Arithmetic," gives an excellent discussion of the IEEE 754 floating-point standard used in all new computer designs. There is a good discussion of basic techniques of integer arithmetic as well as a detailed section on issues in floating-point arithmetic, including denormalized numbers, precision, exceptions, and underflow. The "Vector Processors" appendix is a good overview of the state of the art in the high-performance arena, covering vector length, stride, compiler vectorization, chaining, conditional execution, and sparse matrices. The "Survey of RISC Architectures" appendix gives a concise comparison of the features of the DLX, MIPS IV, PA-RISC 1.1, PowerPC, and SPARC V9 architectures. The "Intel 80x8" appendix describes that processor family whose "checkered ancestry has led to an architecture that is difficult to explain and impossible to love." One of the more interesting bits of quantitative information contained here is a comparison of the data traffic for the 80x86 versus DLX, as a CISC versus RISC case study. The Intel processor, because of its dearth of registers and stack-oriented floating-point design, does about one quarter more data accesses than DLX for integer programs, and two to four times as much for floating-point programs. The "Historical Perspective" section sheds light on this imbalance, offering quotes from Kahan, Palmer, and Morse on the history of 8087 design and the price paid for segregating software and hardware design.
Several themes recur throughout Computer Architectures. Amdahl's Law "defines the speedup that can be gained by using a particular feature." The book frequently appeals to the law in order to quantify the potential performance benefit of a particular design choice, or to show that a design approach need not be examined in detail because it could not materially affect performance. The "Crosscutting Issues" sections discuss the interactions between topics in the current chapter and topics in other chapters, thereby highlighting the many compromises that must be made in the design of a real computer. The well-written "Fallacies and Pitfalls" in each chapter present the reader with the lay of the land just covered and point out the minefields that others have found the hard way. Detailed, yet clearly presented, analyses of the impact of various design choices are given throughout the book in both examples and as exercises. These analyses are vital tools in giving the reader a concrete understanding of the impact of design choices. They are one of the book's most significant contributions, particularly when compared to other books that either ignore or hand-wave such analysis. The "Historical Perspective and References Section" in each chapter are interesting reading and invaluable starting points for further research.
Computer Architecture: A Quantitative Approach is insightful, thought-provoking, and meticulously edited. It may appear to be expensive, but the per-page cost is low and the per-insight cost is even lower. The exercises at the end of each chapter of the text are relevant and will further increase the reader's understanding of computer architecture. I highly recommend this book. A well-edited gold mine of ideas, it will serve its readers well. I consider it required reading for all professional programmers and for those who are seriously involved with the design and performance of computer systems old and new.
DDJ