Letters

Dr. Dobb's Journal December 2000

Zuse Accolades

Dear DDJ,

I'm writing regarding the remarkable article "Simulating Konrad Zuse's Computers" (DDJ, September 2000). First of all, I'd like to thank Dr. Rojas for his article and for his work to make Konrad Zuse's extraordinary achievements known.

I have a comment about the following statement at the end of the article: "From a practical perspective, and in the way the Z3 was really programmed, it was not equivalent to the modern computers." It appears Z3 was farther ahead that we might've realized.

The above statement seems to refer to the way Z3 deals with computations that include branches. As the article well explains, Z3 did not have any branching instructions. An algorithm with conditional jumps has to be transformed for Z3. We have to identify "traces" -- sections of code without branches. We assemble the traces into Z3 code and write them one after another on Z3's tape glued in a loop. Each trace starts with computing a binary number t based on the current state of the algorithm stored in Z3 memory. Operations a=b op c in the algorithm are written in Z3 code as a=a*t+(t-1)*(b op c). All traces were repeatedly executed, but only the one with the predicate t equal to 1 could alter the memory.

Modern processors indeed have a branch instruction. This instruction however causes a lot of pain to designers of superscalar CPUs as taking a branch disrupts the pipeline of already fetched and analyzed instructions. Therefore, superscalar CPUs and the corresponding compiler systems try to eliminate branches as much as possible: either by prediction and speculative execution, or -- in recent years -- by predication.

The following is a quotation from an article "Technology Outlook: Introduction to Predicated Execution" by Wen-mei Hwu (IEEE Computer, January 1998). It is interesting to compare this quote with the aforementioned description of Z3's handling of conditional jumps.

The story of Merced, Intel's first processor based on its next-generation 64-bit architecture, will continue to unfold in 1998. Intel expects this product of its collaboration with Hewlett-Packard to reach volume production in 1999. To date, however, the two companies have released few details about Intel Architecture 64 (IA-64). One significant change they did admit to at the October 1997 Microprocessor Forum was the switch to full predicated execution, a technique that no other commercial general-purpose processor employs.

Predicated execution is a mechanism that supports the conditional execution of individual operations [B. Rau et al., "The Cydra 5 Departmental Supercomputer," IEEE Computer, January, 1989]. Compared to a conventional instruction set, an operation in a predicated-execution architecture has an additional input operand -- a predicate -- that can assume a value of true or false. During runtime, a predicated-execution processor fetches operations regardless of their predicate value. The processor executes operations with true predicates normally; it nullifies operations with false predicates and prevents them from modifying the processor state.

Using predication inherently changes the representation of a program's control flow. A conventional instruction set requires all control flow to be explicitly represented in the form of branches, the only mechanism available to conditionally execute operations. An instruction set with predicated execution, however, can support conditional execution via either conventional branches or predicated operations.

Predication is most commonly utilized in a compiler by employing if-conversion. This technique converts conditional branches in an acyclic region of the control flow into predicate-defining operations. With if-conversion, a straight-line sequence of predicated code can replace complex nets of branching code. As Figure 1 shows, the execution of the code after if-conversion does not involve any branch. This means the compiler can eliminate problematic branches and avoid their associated overhead. It also facilitates increased instruction-level parallelism by allowing the compiler to overlap the execution of separate control flow paths. This allows the processor to simultaneously execute multiple paths from a single thread of control. An important benefit of predication not illustrated in Figure 1 is that it allows overlapping and independent control flow constructs without expanding the code.

In the future, integrating control and data speculation with predicated execution will enable advanced compiler techniques to increase the performance of future processors. With the adoption of advanced full predication support in IA-64 and perhaps many other architectures, predicated execution may become one of the most significant advances in the history of computer architecture and compiler design.

Note the last sentence: Predicate execution (that is, computing without branches) is considered one of the most revolutionary advances in computer architecture. This technique was also implemented by Konrad Zuse back in 1938. I stand in awe before such an intellectual achievement and foresight. The fact that the name of Konrad Zuse is not widely recognized in this country is very disturbing. I wonder what can be done to make amends. Maybe the Germany Technology Museum will consider a tour of the U.S. dedicated to Konrad Zuse. Maybe PBS will make a documentary about him, with sponsorships from ACM, IEEE, and the computing industry. After all, Konrad Zuse is morally entitled to a share of every sale of Itanium, UltraSparc-III and Crusoe chips. I want to thank DDJ for publishing such a remarkable article about Konrad Zuse, and giving credit to a man who invented the future.

Oleg Kiselyov

oleg@pobox.com

Exploiting 64-bit Parallelism

Dear DDJ,

Thanks for Ron Gutman's "Algorithm Alley" (DDJ, September 2000) on exploiting 64-bit parallelism. I would add that the best reason for reversing the order of bits in a word is to unscramble the vector elements either before or after a Fast Fourier Transform. I modified Ron's algorithm to swap the six pairs of bits which index the elements of a 4096-pt FFT, and it's just as fast as the library routine from Texas Instruments, but doesn't require the lookup table of their version.

Is there any collection of such nifty algorithms that anyone knows of? (I have Sedgewick's book on algorithms, and Numerical Recipes.)

Counting the number of 1s in a word is also very useful in creating parity-check codes for communications, though we're really just interested in whether the number is even or odd. And if you want to find the correlation function of a bit stream (such as the ranging code on a CW radar or sync pattern in spread-spectrum), XOR with an offset and count the bits.

Charles Dorcey

c.t.dorcey@ieee.org

Ron responds: Charles, thanks very much for your note. I'm very pleased to learn about your applications for the bit-parallel algorithms in my article. I'll pass this on to my boss who knows much more about FFT than I do. I'm not sure what the best collection of nifty algorithms is, but I just learned from another reader that my bit counting algorithm appears in Jon Bentley's Programming Pearls. So I guess its not my algorithm after all :-). It's a small world, algorithmically speaking.

Triangle Correction

Dear DDJ,

We'd like to point out to DDJ readers a typo in our article "Triangle Intersection Tests" (DDJ, August 2000). In Example 1, page 34, line 16 (approximately) it reads: 16: v=(e0x-e2x*u)e1x. A division symbol was left out, it should read: 16: v=(e 0x-e2x*u)/e1x. And another minor typo is: Line 9 has a "u" in it which should be full size, not a subscript of e2y; that is, v =(e0y-e2y*u)/e1y.

Eric Haines and Tomas Moller

tompa@eecs.berkeley.edu

Computer Science Versus Programming

Dear DDJ,

Thanks for the information on the Carnegie Technology Education program in Jonathan Erickson's October 2000 "Editorial." When I was an academic, I constantly struggled to find a balance between a computer foundation that lasts for a career versus a set of skills needed for the next job. It was an especially acute problem for continuing education students, who expected to gain marketable skills almost immediately. If we educate continuing education students on a computer science foundation, everyone complains about the lack of immediately marketable skills. If we educate toward existing skills, the complaints are much louder from those students whose favored skills are missing (with comments such as "What about Ada?" or "Everyone should know VMS"). It was also good to see the distinction made between computer science and computer programming. That distinction may be the best way to have truth in advertising in computer education.

Peter Varhol

peter.varhol@numega.com

Microsoft Made Its Own Bed

Dear DDJ,

In replying to John Vlissides's question of the notorious unreliability of Microsoft software, Steve McConnell said "there aren't any other companies doing what Microsoft is doing." ("McConnell Complete," by John Vlissides, DDJ, October 2000). Microsoft made a conscious decision to sanction the myriad baseboards and peripheral cards that appeared over the years. Microsoft made a conscious decision to intimately tie its operating systems to physical hardware (Wintel). Microsoft made a conscious decision to intimately tie its applications to the operating system, (private APIs). In a family environment, we call this "incest" and we all know the consequences thereof. Having made conscious choices as to where they wanted to be today, McConnell cannot now bestow poor baby status on Microsoft.

Edward Harned

ed@coopsoft.com

Client/Server Tip

Dear DDJ,

Here's a tip [that] may be useful to other DDJ readers. When I develop client/ server apps, I always have to start and stop many different programs to test them together (I use VNC to control remote computers). I just found a nice utility called "process control" that lets me start/stop many programs very easily from a small list. It works very well together with VNC, and even terminates crashed processes. The utility was featured in PC World magazine and can be downloaded from http://www.pcworld .com/ (search for "process control task manager") Hope you find my tips useful. Thanks for a good magazine.

jojje

m_jojje@hotmail.com

DDJ