COMPUTER SCIENCE AND THE MICROPROCESSOR

The battle for the desktop

Nick Tredennick

Nick did the logic design and microcode for the Motorola 68000 and IBM Micro/370 microprocessors and is the author of Microprocessor Logic Design (Digital Press, 1987). He can be contacted at 1625 Sunset Ridge Road, Los Gatos, CA 95030.


The invention of the integrated circuit in 1959 began a beneficial technology spiral in which, following Moore's Law, the possible number of transistors on an integrated circuit has been doubling every year. Three effects combine to sustain this trend: Chips grow, features shrink, and design techniques improve. The number of transistors on a chip goes up directly with increases in chip area--twice the area permits twice as many transistors. Circuits are formed on a chip by a complex process involving coating, etching, doping, and baking. The size of transistors and wires (features) is determined by the sophistication of the process--the better the process, the smaller the transistors and wires. If the width of a feature (wire or transistor) decreases by a factor of two, the number of transistors in a fixed area increases by a factor of four. Evolving implementation techniques improve the match between circuit requirements and the constraints of the semiconductor technology.

Integrated circuits improved the design of electronic systems. Before the integrated circuit, electronic systems were built of resistors, capacitors, inductors, diodes, and transistors (or vacuum tubes). Integrated circuits replaced collections of transistors, diodes, resistors, and capacitors. Texas Instruments and others introduced logic families, like the 74xx series TTL devices. Designers partitioned electronic systems into available logic modules. Systems became smaller, cheaper, and more reliable since each logic module replaced many discrete components. Design with logic modules was popular, and soon module catalogs such as TI's The TTL Data Book for Design Engineers grew to include hundreds of different logic modules.

Custom-chip design was one alternative to designing with standard logic modules. You could partition your system into unique chips and get someone like Intel to build them. The design would be fewer chips than a design using standard modules, so the implementation might be cheaper and more reliable to manufacture. But custom chip designs were expensive, so manufacturing volumes would have to be high to amortize the development cost enough to make custom chips the right choice. In the late '60s, it looked as if desktop calculators might have a high enough volume to justify custom-chip design.

In September 1969 the Japanese company Busicom approached Intel with a proposal for a calculator design using seven custom chips. In October, Intel's Ted Hoff countered with a three-chip design based on the idea of building computer-like chips and programming the logic to perform the desired function. One of these chips, the 4004, became the first commercial microprocessor. The 4-bit 4004 CPU, which processed data a nibble at a time, contained an execution unit (registers, arithmetic unit, and connecting logic) and a control (which interprets instructions and directs actions of the execution unit). The microprocessor is connected to memory (to hold instructions and data) and input/output logic (to communicate with the outside world) to make a working system.

Intel introduced the 4004 commercially in 1971. Since then, the number of transistors in a microprocessor implementation has been doubling every two years. The implementation of microprocessors isn't following Moore's Law. Figure 1 graphically describes the introduction of Intel microprocessors since 1971. Each microprocessor part is plotted by year of introduction and number of transistors per processor from the introduction of the 2,300-transistor 4004 in 1971 to the introduction of the three-million-transistor Pentium processor in 1993. The solid line plots transistors doubling every two years, starting in 1971.

Embedded Control

The first commercially available microprocessor wasn't invented as a natural consequence of evolution in computer design. Instead, it replaced custom logic in what became known as embedded-control applications, which involves any use of a microprocessor other than as the central processing unit (CPU) in a computer system. Microprocessor-based logic was fair competition for logic based on custom-chip designs, but the majority of system designs employed standard TTL modules. If the microprocessor was to be a commercial success, it would have to compete with TTL modules in system designs. Market emphasis in embedded-control applications led to microprocessors designed to meet the requirements of low cost, adequate performance, simple, flexible bus protocols, and few pins.

The microprocessor had to be cheap to compete with the high-volume TTL modules it replaced. Performance of the programmed logic employing the microprocessor had to match the performance of the logic it replaced--not a big challenge. Simple, flexible bus protocols allowed the microprocessor to work with a variety of memory and peripheral chips on a common bus. Few pins meant cheaper packages. Package size was also driven by the desire to match the pin and row spacing of the TTL modules. The first microprocessors were expensive, so their first design wins were probably as alternatives to custom-chip designs rather than replacing standard logic module-based designs. As volumes grew and technology improved, microprocessors got better and cheaper, which opened up more application opportunities. Microprocessor design diversified into microprocessors, which contain just the CPU, and microcontrollers, which include memory or I/O logic (or both) on the same chip with the microprocessor. The embedded-control market grew from essentially zero in 1971 to an expected volume of almost two billion units (worldwide) in 1993.

Embedded-control applications fall into four market segments: zero cost, zero power, zero delay, and zero volume.

The zero-cost segment, to a first approximation, is 100 percent of the embedded-control market. Virtually all embedded-control applications are in high-volume, highly competitive, cost-sensitive consumer appliances: TVs, VCRs, toasters, blenders, washers, dryers, and microwave ovens. Component cost is usually the first and most important consideration in an embedded-control application. In a microwave oven, for example, minimizing component count and component cost is vastly more important than minimizing power dissipation or maximizing performance. What difference does it make whether the microprocessor is 0.1 watts or 10 watts in a 1500-watt microwave oven? The wall outlet looks like an infinite power source to the microprocessor, and the microprocessor's power dissipation is inconsequential compared to the power dissipated by the oven. Furthermore, performance of even a bit-serial processor would be lightning fast compared to the glacial pace of human command inputs to the microwave.

The zero-cost segment, which accounts for almost all unit volume in embedded control, employs 4- and 8-bit microcontrollers. The first commercial 4-bit microprocessor began shipping in 1971. In 1993, shipping volume for 4-bit microcontrollers should exceed 800 million units with an average selling price of just under $1. The first commercial 8-bit microprocessor, the 8008 (also from Intel), began shipping in 1972, just one year after the introduction of the 4004. In 1993, shipping volume for 8-bit microcontrollers is expected to be over 1 billion units, with an average selling price below $4. Even though the 8-bit microprocessor followed the 4-bit microprocessor's introduction by less than a year, it wasn't until 1990 that shipping volumes for 8-bit microprocessors passed the 4-bit microprocessors. This indicates the importance of low cost and the unimportance of absolute performance in most embedded-control applications. Microprocessor manufacturers competing for shares of the zero-cost segment must have high-volume, low-cost production.

Zero power, the next-largest segment of the embedded-control market, is mostly a special subset of the zero-cost segment: It includes applications for which dissipating zero power is more important than achieving zero cost. Zero power, to a first approximation, represents zero percent of the embedded-control market. Zero-power applications include items such as smoke detectors, remote controllers, and pocket calculators. We'd like to have these devices run entirely on weak ambient light or run for a few years on a single watch battery. Zero-power applications use the smallest, cheapest, slowest microprocessor consistent with the requirements of the application. Since most applications are consumer appliances, cost is still important. For most applications, 4- and 8-bit microprocessors are sufficient, but the emerging personal digital assistants (PDAs) probably require 16- and 32-bit microprocessors. Microprocessor manufacturers competing for shares of the zero-power segment must have efficient designs and good technology as well as high-volume, low-cost production.

Zero delay is the third-largest segment. It includes applications such as scanners, laser printers, and fax machines, for which performance is the most important consideration. For these applications, zero processing delay is more important than achieving zero cost. The market is competitive, so cost is still important. Zero delay is the primary segment for the 16- and 32-bit microprocessors. These are the high-end, embedded-control applications as reflected by the expected 1993 average selling prices for 16- and 32-bit microcontrollers of just under $10.00 and just under $60.00, respectively. Unit volumes in the zero-cost segment are 20 times the unit volumes in the zero-delay segment, but the substantially higher average selling price of the 16- and 32-bit microcontrollers brings the dollar value of the zero delay segment to about one-third the value of the zero-cost segment.

I thought I'd covered all the market segments with these three--until I talked to John Wharton. I explained the zero-cost, zero-power, and zero-delay segments to him and asked, "So what do you think?" He immediately replied, "You forgot the zero-volume segment." Indeed I had.

Zero volume is the market segment for applications with (essentially) zero volume, but which have some attraction for the manufacturer other than sales volume and profit. Intel built and delivered the 960MX microprocessor--at the time, Intel's fastest and most complex microprocessor--solely for the YF-22 Advanced Tactical Fighter. Since the crash of the single flying prototype of the YF-22, the volume looks as if it will actually be zero, but Intel could hardly have expected to sell more than a few thousand microprocessors for the YF-22 even in the best of circumstances. The visibility conferred by such a high-profile application made the design win desirable. The zero-volume segment is not sensitive to cost. All microprocessor manufacturers can compete for applications in the zero-volume segment. High-volume, low-cost production is not required.

The technology spiral fed the expansion of the microprocessor market. By any standard, the growth from introduction in 1971 to an expected market of close to two billion microprocessors in 1993 is phenomenal.

Enter the Personal Computer and Computer Architect

By 1974, the microprocessor had gotten cheap and common enough for the invention of microprocessor-based computer systems and the sale of these "personal computers" to individuals. Invention of the PC served to split the microprocessor market into two segments: embedded control and CPU. The two market segments have different requirements: Embedded control wants low cost, while CPUs want high performance. Most microprocessors go into embedded-control applications, but CPU applications have grown from essentially 0 percent of unit volumes in 1974 to an expected value of almost 2 percent in 1993. About 30 million microprocessors should ship as the computer system CPU in 1993. Since embedded-control applications have always represented 98 to 100 percent of unit volumes, manufacturers have traditionally ignored the CPU market segment. Microprocessor designs supported embedded-control requirements for low cost and adequate performance: If they also got used as CPUs, so much the better.

As computers advanced, so did the field of computer science. In the academic world, it progressed from a side-interest within the mathematics or electrical engineering departments to being its own separate field, bringing with it its own professionals in industry and academics with career aspirations in computer-related topics.

The first computers were built of vacuum tubes and were huge, expensive electromechanical engines. Only a few large companies (like IBM) capable of making large business machines could build these "mainframe" computers. Designers of mainframe instruction sets and microarchitectures were rare and probably thought of themselves as engineers and programmers. After the invention of the transistor and the integrated circuit, computers got smaller and cheaper. More companies could build these smaller, cheaper "minicomputers." Designers of minicomputers were still fairly rare and also probably saw themselves as engineers and programmers. After the invention of the microprocessor, any company capable of building integrated circuits could design a computer instruction set. The number of instruction-set and microarchitecture designers reached critical mass: The designers began to think of themselves as "computer architects," and computer architecture became its own profession.

Invention of the computer architect brought with it an avalanche of experiments and publications as career computer researchers competed for the best positions in universities and industrial research organizations. But the study of computers is a weak science. When pencil and paper produced quantitative results, researchers spent considerable effort deciding which quantitative results were worth computing. The computer itself is the enemy of experiments in computer science: The computer readily produces quantitative results. Also, the field of computer science is developing under intense commercial pressure, which further weakens experimental procedure. Researchers may have a financial interest in a point of view. There are few independent investigators.

RISC

In the late '70s and early '80s, investigators at universities and industrial research organizations noticed the mismatch between the implementation of microprocessors and the requirements of a CPU. Consequently, they invented RISC (reduced instruction set computers). Manufacturers were busy building microprocessors to compete with standard logic modules for embedded-control applications, since to a first approximation, embedded-control applications represented 100 percent of the market for microprocessors. (CPU applications were essentially 0 percent.) Microprocessors were designed for low-cost, adequate performance (relative to standard logic-module solutions), few pins, and leisurely bus protocols. Low cost was the most important feature.

But low cost isn't a major objective for the microprocessor in a computer system. The cost of the power supply, display, hard disk, printer, keyboard, chassis, and other components swamps the cost of the CPU. Performance is the major objective for a microprocessor used as a CPU. Designing for best absolute performance is so at odds with designing for lowest cost that researchers investigating the design of microprocessors for CPU applications found room for improvement over microprocessors designed for embedded control. Early papers proposing RISC cited no fewer than 16 factors contributing to enormous gains in reported performance, among them: simplified instruction set, overlapped register windows, large register set, simplified addressing, high-level language user interface, advanced compiler technology, delayed branch, advanced procedure calls, single-cycle execution, simplified implementation, quicker time to market, better design procedures, better design tools, on-chip cache, wider external buses, and load/store architecture. Wider, faster external buses, which increased bandwidth to memory by a factor of six to ten, probably made the biggest contribution to reported performance improvements.

Twelve years of subsequent investigation have not clarified or isolated the contribution of any of these changes to increases in reported performance. Instead, a pseudotechnical debate of epic proportions ensued, pitting RISC (all that is good) against CISC (complex instruction set computers, or all that is bad). The real issue had nothing to do with RISC or CISC. The real issue has always been microprocessors with different design objectives. Manufacturers supported designs for volume shipment--embedded-control applications. RISC advocates supported designs for CPU applications. Microprocessors for embedded control emphasized low cost. Microprocessors for CPU applications emphasized performance.

The Battle for the Desktop

In a coincidence with unfortunate consequences, IBM introduced its personal computer in 1981--just as researchers were inventing RISC. Sales of the IBM PC took off, forever locking RISC out of the volume market in personal computers. The invention of RISC merely split the CPU market segment into PCs and workstations, in the same way the invention of the PC had split the microprocessor market into CPUs and embedded controllers. In 1993, unit volumes will be approximately 2 billion embedded controllers, 30 million personal computer CPUs, and half a million workstation CPUs.

Even though CPU applications represent less than 2 percent of microprocessor shipments, CPU designs are the glamour topic in microprocessor design. High-end microprocessor designs are the focus of conferences, trade press, technical publications, popular interest, and research. Ever since IBM selected the lowly 8088 as the CPU in its PC, it has been an intolerable affront to computer architects that it can't be displaced on the desktop by any of the plethora of clearly superior RISC architectures. Since the advent of RISC, computer architects have produced many microprocessors more suited to CPU applications than the 80x86 architecture. Every microprocessor architecture announced since the invention of the acronym has been labeled RISC. Applications for RISC CPUs have grown from zero in 1981 to domination of the half-million-unit workstation market in 1993. In the meantime, the PC market has grown to about 30 million units a year. Although there are other personal computers, IBM-compatible PCs have about 90 percent of the market and Apple about 10 percent with the 680x0-based Macintosh. The Motorola 680x0 family is another old CISC architecture, so the PC market belongs exclusively to the old, ugly CISC architectures.

Won't the superior performance of RISC-based workstations help them displace CISC-based PCs on the desktop? RISC advocates and the trade press have been predicting for years that sales of RISC-based computers would take off very soon and begin eating into IBM-compatible PC volumes. Folklore has emerged to explain why workstations will soon begin to displace PCs. The biggest advantages for workstations are in performance, price/performance, hardware, and new developments. The biggest advantages for PCs are availability, applications, and the installed base. Folklore suggests that cost and price are about the same for workstations and PCs.

Workstation Advantages

The killer advantage workstations are thought to have is in absolute performance or in price/performance. "Once users get their hands on $&your.favorite.workstation and see its blazing speed, sales of $&your.favorite.workstation will surge as users switch from the PC." That's the theory, anyway. I think it's wrong.

Leaving aside the question of whether there's a significant difference in performance, price/performance and absolute price have more influence on the choice of a PC or workstation than absolute performance. Comparing the best price/performer workstation to a fully configured, top-of-the-line, list-price IBM or Compaq system is a mistake. It may be that workstations have a giant advantage in price/performance at that workstation price (and, perhaps, almost every other workstation price). I don't know, and I don't think it matters. The only price point that matters is the lowest workstation price, because it's the only price at which workstations and PCs can compete for the same customer. The relevant comparison is price/performance of the cheapest workstation compared to a similarly priced IBM-compatible PC clone. At the lowest workstation price, PCs have better price/performance.

Workstations are supposed to have an enormous advantage in hardware: architecture, implementation, technology, and time to market. The story goes something like this:

All workstations use RISC microprocessors. RISC has inherent architecture advantages over CISC. RISC implementations are cheaper and faster, and they get to market quicker. Since product cycles are shorter, new ideas can be implemented sooner and more product generations can be introduced in a fixed time. Also, shorter design times mean RISC uses better technology (or gets equivalent technology to the field sooner).

Advantages in architecture are unproven and probably swamped by effects of operating systems, compilers, assemblers, languages, and system design. The latest high-end microprocessors implemented up to 3 million transistors. They were all--RISC and CISC--complicated and difficult to design. Budgets for next generation microprocessors will be three to ten million transistors and will use similar technology and have similar implementations. They'll all be complicated and difficult to design. In a ten-million transistor design, instruction-set architecture offers no significant shortcut to implementation.

PC Advantages

The killer advantages for the PC are software applications and the 100-million-unit installed base. Applications for the PC are plentiful and cheap. The price/performance advantage of a workstation would have to be gargantuan to overcome the inertia of the installed base. PC owners can count on finding cheap applications to suit their needs, and they can count on cheap, regular hardware, software, and operating-system upgrades. PCs are also readily available. You can get whatever PC configuration you want today at your local computer store for a competitive price. If you're willing to wait a day or so, you can get the same PC for even less through mail-order. Availability, applications, cost, and the installed base--that's a lot to overcome.

Cost and Price

Folklore says cost and price for workstations can be about the same as for PCs. Cost is how much the manufacturer pays to make a workstation or PC. Price is how much you and I have to pay to get one. In an ideal manufacturer's market, price might be five or six times cost. In an ideal consumer market, price might be only slightly above cost. The PC market is a consumer market. Workstations will have to have consumer pricing to compete for PC customers. Let's assume PCs and the low-end workstations being designed to compete with them have similar features and use common components (power supplies, glue logic, hard disks, floppy drives, displays, keyboards, and the like). Assume differences in volume discounts for PC and workstation manufacturers are small (so workstation and PC manufacturers are paying about the same for their components). But there's a difference in CPUs between PCs and workstations. PCs are based (mostly) on 80x86 microprocessors, and workstations are based on RISC microprocessors. Is there a difference in cost or price for the CPU?

It costs about $65.00 to build a current high-end microprocessor. It costs about $600.00 to process a six-inch wafer which will yield 12 to 14 good chips; that's about $45.00 per working microprocessor. Add $10.00 to package the chip and $10.00 more to test it, and you get $65.00 per CPU, All the high-end microprocessors are about the same size, so there shouldn't be a significant difference in cost to build. But cost to build isn't the whole story--figuring out what to build and drawing up the plans (designing the microprocessor) can be significant. Development cost for a high-end microprocessor runs between $30 and $100 million. If you spend $50 million designing a microprocessor and sell only 50, you'll have to charge more than $1 million for each just to recover your costs. If you can sell 50 million parts, you only have to charge $66.00 to recover your costs. Figure 2 plots cost per part against parts shipped for design costs of $30 to $100 million (assuming $65.00 per CPU in fixed manufacturing cost).

The workstation market is fragmented. SPARC from Sun, MIPS from Silicon Graphics, POWER from IBM, PA (Precision Architecture) from Hewlett Packard, 881x0 from Motorola, Alpha from DEC, and Clipper from Intergraph all compete for shares of the half-million-unit workstation market. There are about 20 manufacturers making high-end RISC microprocessors for workstations. Currently, only Intel makes high-end microprocessors for IBM-compatible PCs. Costs are clearly not equal. If you have to do a new design every two years to keep up with everyone else, you'll be getting a share of either a million workstations (the number shipped in two years) or 60 million PCs. If you're one of the 20 manufacturers making a microprocessor for workstations, your share of the market is likely to put you well over on the left side of the curve in Figure 2. Amortizing the development cost over your share of the workstation market will drive the price of the chip--manufacturing cost won't be a big factor. If you're Intel, you'll be operating well off the right side of the chart--amortized development cost won't be a big factor in setting the price of the chip. Intel is currently shipping four to five million high-end microprocessors per quarter. RISC microprocessors cost a lot because high development cost must be amortized over low workstation shipping volumes.

Costs for workstations and PCs are not equal. Workstations cost more because development cost for the CPU must be amortized over significantly lower volumes. Lower manufacturing volumes also lead to smaller discounts on other component purchases. The more sophisticated workstation systems (cache, fast memory, special I/O) inherently cost more than the relatively unsophisticated PC. Workstations have traditionally been sold through more expensive distribution channels than the PC, which also contributes to higher cost.

Without even considering software, which is probably the most important determinant, I think the battle for the desktop is over. The PC is the winner--it is grabbing applications from the workstation market. The workstation market has been maintaining volume by pressing for ever higher performance and capturing new applications. We've reached the steady state. Workstations will continue to push for higher performance and specialized markets where they can command the higher prices they require. The PC will continue to chase the workstations out of their old market segments.

More Details.

Software

PC software is cheaper than workstation software. PC software is the stuff everyone needs: word processors, editors, communications, spreadsheets. Workstation software is specialized software designed for a particular market: chip design, visualization, timing analysis. Software is the same situation as the CPU. For PC software, high volumes mean low amortized development cost, so manuals and distribution probably dominate the cost. For workstation software, low volumes and complex applications mean high amortized development cost.

The major applications for the PC have already been written. If you're a PC software developer, do you have a chance of developing a new word processor and capturing the market? Not likely. So what's Microsoft doing with all their programmers? Over the past ten years, they've been working frantically on major high-volume applications for the PC. Now they're done. About all they need is two programmers and 30 documentation people per application to crank out the annual updates to Word, Excel, Power Point, and so on. So what are the other 6000 employees doing? When the applications that sell ten million copies are done, they'll work on applications that sell one million copies. When the applications that sell one million copies are done, they'll work on applications that sell a 100,000 copies. When the applications that sell a 100,000 copies are done, Microsoft will start laying off programmers. It's my guess that programmers at Microsoft are now working on applications which will sell between a 100,000 and one million copies. Engineering-design applications, the traditional market for workstations, will be converted to the PC by Microsoft programmers just before the big layoffs begin.

Conclusion

The battle for the desktop has reached a steady state: The PC is eating into traditional workstation applications at about the same rate that workstations, with ever-higher performance CPUs and more complex systems, are finding new applications. This, however, leads to problems for the RISC CPU manufacturers because workstation volumes are too low for the chip manufacturers to recover their development costs. The technology spiral is driving them to more complex CPUs with correspondingly higher development cost while their market is staying about the same size. This has caused the RISC CPU manufacturers to rediscover the embedded-control market. RISC CPU manufacturers have begun a major marketing campaign to capture embedded-control applications. After competing for a few years for shares of a half-million-unit market, it must look as if there's room for everyone in a market of two billion. There isn't.

Most of the market volume for embedded control belongs to the 4- and 8-bit microprocessors. That's the zero-cost segment. The zero-power and zero-delay segments are also cost sensitive, and belong to companies with high-volume, low-cost manufacturing. The zero-delay, embedded-control segment uses some high-end microprocessors, but the average selling price of a 32-bit CPU for embedded control is $65.00--about the same as the manufacturing cost for a RISC CPU. There's no way to recover development cost if the base price is the same as the manufacturing cost. That leaves the zero-volume segment. RISC CPUs are capturing high-profile applications in the zero-volume segment. The problem with the zero-volume segment is, as its name implies, that there's not enough volume to recover development cost.

PC sales are stalled at 30 million a year, workstation sales are stalled at half million a year, and the ancient CISC CPUs own the embedded-control market. The news is all bad for the makers of RISC CPUs. That's too bad, because it's fate and has nothing to do with the intrinsic value of the product (not that the intrinsic value is well known, given the state of computer science--but that's another story). It's all tied up in the technology spiral, the invention of the microprocessor, and the timing of the invention of the personal computer, the computer architect, and RISC. If you're a RISC advocate and this news has depressed you, here's something to make you feel better: Perhaps the technology spiral will come back to bite even the CISC microprocessors. The microprocessor was invented for embedded control: It displaced modules with a programmed logic solution. Perhaps reconfigurable or even self-configuring logic will displace the microprocessor in embedded-control applications. After all, the microprocessor is only an interim solution. Shouldn't those applications have self-configuring logic modules?

Microprocessor Implementations

First-generation microprocessors, typified by the Motorola 6800, didn't use pipelining. They didn't have to be fast for simple embedded-control applications and, since integrated-circuit technology was new, chip area for transistors was expensive. Early microprocessors used simple control and a simple interface to external memory. The microprocessor fetched the instruction, decoded it, and then executed it. When the microprocessor finished the first instruction, it started on the second, and so on--no pipelining, a simple controller. This execution model is shown in Figure 3.

The bottleneck in this simple, nonpipelined design is the controller. The external bus is only used every third cycle for the fetch, unless the execute cycle reads or writes an operand. The instruction decoder is only used every third cycle. And the execution unit is only used every third cycle. The pipeline can't stall since there isn't one.

In the late '70s, the next-generation microprocessors, typified by Motorola's 68000, used a simple, three-stage pipeline called instruction overlap. As the first instruction is executed, the second instruction is decoded, and the third is fetched. This execution mode is shown in Figure 4.

Instruction overlap makes better use of microprocessor resources than a nonpipelined version. The external bus, the instruction decoder, and the execution unit are all used on every cycle, unless there's a conflict for resources. It's possible for the processor to complete an instruction on every cycle. Fetch takes one cycle, decode takes one, and execute may take one to many cycles. If execute takes more than one cycle, the following instructions are held in the fetch and decode stages until the current instruction finishes execution. Only one instruction at a time is allowed to begin execution, so there are no operand conflicts. The execute stage and the fetch stage may contend for the external bus. In an add-memory-to-register instruction, for example, the execute stage will compute the operand address, read the memory operand, add the register and memory operands, and store the result in the register. If the memory-to-register add is instruction 1 in Figure 4, its execute phase would extend from cycle 3 through cycle 6, instruction 2 would be held in Decode, and instruction 3 would be held in Fetch. Instructions 2, 3, and, 4 would begin Execute, Decode, and Fetch, respectively, in cycle 7.

The first commercial RISC microprocessors introduced an extended pipeline. The extended pipeline split the execute phase into address calculation, operand access, execute, and write phases. Additional pipeline stages removed pipeline delays caused by resource conflicts such as contention for access to external memory. The extended-pipeline execution model is shown in Figure 5.

The extended pipeline potentially completes one instruction every cycle, but with additional stages, there are fewer delays due to resource contention. But, there are costs. The four instructions past the decode stage have potential operand conflicts to resolve. Additional pipeline stages require additional resources to avoid conflicts. You can estimate resources by looking at Cycle 6 in the figure. Since Cycle 6 represents the theoretical steady-state instruction flow through the microprocessor, it should be able to accommodate any combination of six instructions without resource conflicts. The memory system, for example, must have at least two read ports and one write port (for Instruction 1 write, Instruction 3 read, and Instruction 6 fetch) to avoid access conflicts. There must also be more ports to the register file (for address, read, and write) and at least two arithmetic units (one for address calculation, and one for execute).

While the Motorola 68040 uses a six-stage pipeline, there's nothing magical about it. Intel's 80486 and MIPS' R3000 are five-stage pipelines, and the newer MIPS R4000 is an eight-stage pipeline. (MIPS uses the pompous term "superpipeline" to describe their eight-stage pipeline.) The original Fujitsu SPARC gate array and the first custom Cypress SPARC use a four-stage pipeline. Increasing the number of stages in the pipeline reduces resource conflicts and may allow a faster clock. Throughput increases, but these pipelines still only complete one instruction per cycle, since they only issue one instruction per cycle.

A superscalar pipeline attempts to issue more than one instruction per clock. Intel's 80960CA, announced in 1989, was the first microprocessor with a superscalar pipeline. Figure 6 shows a six-stage pipeline capable of issuing two instructions per cycle.

Instructions 1 and 2 start at the same cycle, instructions 3 and 4 start at the same cycle, and so on. If we started three instructions per cycle, we could potentially complete three instructions per cycle. But look at the loaded pipeline represented by cycle 6 (as it was in the extended pipeline). The microprocessor is processing 12 instructions at each cycle. There's enormous potential for operand and address conflict. The register file and memory system need at least four read ports and two write ports each. And there must be at least four arithmetic units (two for address calculation, and two for execute). Hardware resources for a superscalar pipeline are substantial and grow as more instructions can be issued simultaneously. One way to limit required resources is to restrict combinations of instructions permitted simultaneous issue. DEC's new 21064 Alpha microprocessor, for example, uses a seven-stage pipeline and can issue two instructions per cycle with some restrictions on pairs that can issue simultaneously. HP's PA 7100 can issue a floating-point instruction and an integer instruction simultaneously, but cannot issue two integer instructions during the same cycle. TI's SuperSPARC and Motorola's 88110 allow simultaneous issue of two integer instructions. Intel's Pentium and Motorola's 68060 will also sport superscalar pipelines.

--N.T.


Copyright © 1993, Dr. Dobb's Journal