The Promise and
Reality of SMP

Hal W. Hardenbergh

Hal is a hardware engineer who sometimes programs. He is the former editor of DTACK Grounded, and can be contacted at halwh@ddj.com.

I have had this experience exactly once. I was driving on a four-lane avenue just north of Highway 101, in the middle of Silicon Valley, at the peak of the morning rush hour. I suddenly realized that I was alone. Whether I looked forward, to the right, left, or in the rear-view mirrors, there was no sign of life in sight. No people. Worse than that, there was no traffic. And I was wide awake and completely sober! It was like being on the set of a science fiction movie; I expected a giant Mantis to stride into sight any moment. I actually pulled to the side of the road and stopped. After about a half-minute, one lone car appeared and passed me. Whew!

It was early August, 1989. I'd been working for a year-and-a-half at Vicom, then one of four companies in the business of making general-purpose image-processing computers in the $100K-and-up price range. Vicom had shrunk by 33 percent during the time I worked there, and it had less than a third of the employees it had at its peak (and the industry's peak) in 1984. Every Friday at 5:30 p.m., Philadelphia time, a Mayfield Fund representative would electronically deposit enough money for Vicom's payroll, and the pay checks would be handed out at Vicom in San Jose.

You see, Vicom was all but legally bankrupt. It had numerous bills outstanding for over six months. As is usual, a creditors' committee was watching for an opportunity to force Vicom into Chapter 11 so they could recover a few cents on the dollar. But they couldn't do that until Vicom had some assets, and Vicom had long since sold all its equipment, including the file cabinets, and leased them back. Vicom had no assets, so the creditors didn't file; they wouldn't even have recovered court costs.

What happened to Vicom is what was happening to the entire image-processing industry, or at least the part that made general-purpose image-processing computers. The desktop personal computer was eating away at the base of the marketing pyramid that supported the industry. I knew what was going on as it was happening, in some ways better than my bosses at Vicom. I had personally participated in the general-purpose CAD (Computer Aided Design) industry and watched as the desktop personal computer ate that industry alive, too! (Would you believe a CAD computer used to cost $250K and up?) And I had run my own small company for 14 years, so I understood the financial nuances.

Vicom's boss persuaded the Mayfield Fund that Vicom was too small, so Vicom bought up Gould IGD (Image and Graphics Division) in Fremont, California, and bought the image-processing division of a little company in San Leandro, CA, called Pixar. (The part of Pixar that remained behind later produced Toy Story.) But Gould IGD had 125 employees in Fremont, and Vicom was moving to the Gould plant in Fremont. I decided I didn't want to commute on the long, narrow parking lot locally known as Interstate 880.

So I was on my way to a job interview. Just before 8:00 a.m., I had driven north on Lafayette Street and made the cloverleaf turn onto Tasman Drive heading towards Great America Parkway. Apparently Tasman, between Lafayette and Great America, was not a major traffic artery even during the morning commute.

Startup Number Three

S3 (short for "Startup #3"), a spinoff of Chips & Technology, was led by one of C&T's founders. The company was in its initial startup mode and had rented space in a two-story office building at the corner of Tasman and Great America. I didn't know what product they were developing, but a colleague recommended I apply there for a job; he had worked with S3's founder.

Having just moments before had an experience that surely is similar to an LSD rush, I arrived at that building and discovered it looked exactly like a motel. I had lived in a motel in Sunnyvale for almost four months before landing the Vicom job. I've never liked motels. I thought about the situation briefly and left, eventually going to work for another image-processing company. It's a very good thing I never applied for that job because (continuing the SciFi metaphor) I might have disappeared in a nuclear explosion.

1989 SMP Market: 17 Units

You see, S3 was founded to make an enormous amount of money by making a motherboard chipset that would support up to four x86 CPUs. Yep, S3 was going to ride the SMP (symmetrical multiprocessing) market to fame and fortune. I had definite ideas about the SMP market. One was that there was an annual demand for about 17 SMP computers. And I've never been bashful about expressing my opinions.

C&T had discovered what the spinoff was up to and, naturally, started developing its own x86 SMP motherboard chipset called "MPAX." S3 discovered the insanity of its business plan before actually placing its SMP chipset in production and switched to Plan B, video-accelerator chips for PCs. C&T was unlucky enough to actually put MPAX into production, at which time its customers discovered that the desktop x86 SMP market was good for 17 units a year. C&T tried to sell off MPAX but in the end gave the product away. (You can read more about MPAX in Microprocessor Report, February 21, 1990.)

Had that S3 interview transpired, I would have been correct in my assessment of S3's original target market. Being right wouldn't have helped, there would been little radioactive bits of me spread all over Silicon Valley. At the time, I had a lot of experience designing frame buffers using video controllers and that's the field S3 entered with great success shortly thereafter. We just never made connections.

Recently, a small but profitable niche market has developed for SMPs as web servers. As I write this, two PC magazines have reviews of these quad-P6 servers in their current issues. Prices range from $30K to $36K. Corollary Inc. has just announced an 8-CPU x86 SMP motherboard controller chipset called "Profusion." NCR is even getting into the 8-CPU act with a proprietary memory controller called "OctaScale." To kill a really bad idea, you have to drive a wooden stake through its heart and bury it at midnight in a crossroad. Say, the intersection of Tasman and Great America. (S3's main plant is only a block away.)

SMPs: A Great Idea!

Like most of you, I like the idea of SMPs. I even toyed with producing one back in the days when I was running DTACK Grounded. I'd love to have a quad-P6 SMP for some artificial neural net back-propagation experiments, but I won't pay $30K to get one. Jean-Louis Gassee's BeBox is widely admired but not purchased.

A colleague of mine in Rhode Island also likes SMPs, but he thinks the future of SMPs is on a single die. Why, he reasons, expend 15 million transistors on a single CPU when you can instead put four CPUs with 3.75 million transistors each on that die? Lots of smart folk share his opinion here in Silicon Valley. In fact, when DEC introduced its Alpha microprocessor in 1992, it announced that over ten years the chip would evolve into a single-chip SMP with up to ten CPUs on board! Alas, four of those ten years have elapsed and Alpha is still a uniprocessor architecture. And a 15-million-transistor CPU has, in fact, been introduced--IBM's P2SC, with 128-KB data and 32-KB instruction caches and (get this) a 256-bit data bus! It's a uniprocessor.

This column is inspired by the special August 5, 1996 Microprocessor Report, which celebrates the 25th anniversary of the microprocessor and contains articles discussing the future of microprocessing. One of these articles, by DEC's Richard Sites, confesses the "failure" of the Alpha SMP plan (the Alpha architecture is doing very well indeed, thank you, as a uniprocessor).

The article by University of Michigan researcher Yale Patt focuses entirely on the issue of uniprocessing versus SMP. He points out that we're coming up on a billion transistors per die--that's 180 P6 equivalents--and poses this question:

If the design point is performance... Is it better to partition the billion transistors into smaller units and implement a multiprocessor on a chip, or to build a very wide VLIW uniprocessor [or implement other SMP configurations]? ...At the level of 100 million transistors, the answer is very clear to me: use the transistors in a uniprocessor... At 100 million transistors, I don't believe we have at all run out of steam in things we can add in support of a single instruction stream. At one billion transistors, the question is less easy to answer cavalierly. Still, I think I would opt for a yet more powerful uniprocessor.

I think it'll be a long, long time before my Rhode Island colleague has a one-die SMP in his desktop computer. Perhaps in his next incarnation, or the one after that. (Twenty-five years elapsed between the 2260-transistor Intel 4004 and the 15-million transistor IBM P2SC. That's a 6637:1 increase in the number of transistors. To get from the P2SC to a billion-transistor CPU, the step is only a tenth as large. A 100-million-transistor CPU? The step is a hundredth as large. Professor Patt's scenarios aren't all that bizarre or far out.)

The benefits of SMPs as server engines is easy to explain. Tom, Dick, and Harry want to access a web site at the same time. One of the SMP's quad CPUs is assigned to each, with a CPU in reserve in case Irving logs in.

The attraction of general-purpose SMPs is easy to explain. Four CPUs have four times the number-crunching power (for instance) of one CPU. But...there are fatal software and hardware problems: In general, applications cannot be partitioned easily (think of word processing). And there's a massive bottleneck at the memory interface (think of a drag-racer with a 5000-hp engine and a VW transmission).

Projections, Not Predictions

I like to take long-standing trends and project them. I'm going to take this opportunity to describe the 100-million and 1-billion transistor CPU. The numbers in Table 1 aren't something I grabbed out of a hat; they're printouts from a simple QuickBasic program that manipulates log-linear graph data. I'm using 25 years and the Intel 4004 and IBM P2SC microprocessors as the base data.

Let me say this again: The numbers in Table 1 are not predictions. They are projections based on 25-year trends as defined by just two microprocessors. However, all of these parameters have, in fact, been steadily progressing at a substantially constant rate over the past 25 years. If the trends hold, the table's numbers are golden.

Everybody's favorite trend-buster is the design rules. At this time, 0.135 microns looks iffy and 0.0535 microns looks extremely iffy. But as recently as five years ago, today's 0.29 microns didn't seem possible without going to X-rays for mask projection.

In my previous column, a very steady 17-year memory price trend was graphed. The plot (and the DRAM market) went bonkers when the U.S. government, in its infinite wisdom, stepped into the DRAM market. Maybe the CPU trends will continue for another 12 years. If not, it might not be a technical problem that proves the stopper. Our government is here to help us...

Looking back, I realize I had the wrong movie set in mind. Instead of a giant Mantis, a flowing river of--not army ants, but voracious desktop computers--would appear, devouring everything in sight, including entire industries.

Table 1: 100-million and 1-billion transistor CPU trends using the Intel 4004 and IBM P2SC microprocessors as base data. *BGA=Ball Grid Array.

	Transistor Count:  2260 to 15 million 
	doubling time =1.969 years 
	5.39 years to CPU_100M, 11.93 years to CPU_1B

	Die Size:  .0165 to .525 square inches 
	doubling time = 5.01 years 
	CPU_100M = 1.1 sq in, CPU_1B = 2.74 square inches

	Design Rules:  10 microns to 0.29 microns 
	halving time = 4.89 years 
	CPU_100M = .135u, CPU_1B = .0535u

	Clock Rate:  750 KHz to 135 MHz 
	doubling time = 3.34 years 
	CPU_100M = 414 MHz, CPU_1B = 1.61 GHz

	Bus Width:  4 bits to 256 bits 
	doubling time = 4.17 years 
	CPU_100M = 627 (512) bits, CPU_1B = 1862 (2048?)

	Pin Count:  16 pins to 1088 "pins" BGA* 
	doubling time = 4.11 years 
	CPU_100M = 2702 "pins," CPU_1B = 8150 "pins"