SPEED TRIALS: FIVE C'S COMPARED

FIVE Cs COMPARED

Richard Relph

Richard Relph is a technical editor for DDJ and the author of several previous articles on C compilers. Richard is a member of the ANSI X3J11 committee, and the chief technical officer of Embedded Performance Inc. He may be reached at EPI, 3707 Willams Rd., Ste. 2, San Jose, CA 95117.


C is a language renowned for its efficiency. This fact has not been lost on compiler vendors, who in their advertising claims are always using words such as faster and fastest. As the author of "Optimizing Compilers for C" in last year's C issue of DDJ, I am well aware of the difference between advertising claims and the quality of generated code. This year, instead of looking at the theoretical aspects, I decided to take a look at he optimizing C compilers that are actually available.

This article reviews five optimizing C compilers for MS-DOS-Microsoft C, Version 1.5; Datalight's Optimum-C, Version 3.12; and Computer Innovation C86+, version 1.10. They are all from vendors who make vast claims as to the efficiency of their compilers. I decided not to include MetaWare's High C compiler, not because of a lack of technical merit but because mu company now has a close business relationship with MetaWare and I didn't want any charges of bias.

With a previous review for DDJ, there'd been a few accusations form readers that I was supplying an overwhelming number of numbers. This review will not suffer that fate. Instead, I'll concentrate on usability issues, language conformance, and the quality of the generated code.

The Test Software

This year, my company splurged and purchased the C Torture Test from Austin Code Works for $20. (See Sidebar.) Unfortunately, it seemed evident from the test results that most of the C compiler vendors never bothered to invest the $20 to obtain this suite.

I adapted the test suite for ANSI compatibility. The final performance tests consisted of compiling, linking, and running both the modified C Torture Test and the Dhrystone benchmark. (As most DDJ readers are aware, the Dhrystone is a performance benchmark that times execution of an "average" set of instructions and is as much a benchmark of characters and integer performance as Whetstone is of floating-point performance. Around the time of this review, I was also attempting to port a very large source-level debugger (Third Eye's CDB) from Unix to the PC. As a result of these different exercises, I soon had an excellent feel for all the compilers.

I ran all the tests on a Compaq 386 (16-MHz) Model 130 with 3 Mbytes of memory. I used 1 Mbyte of memory as a RAM disk (using RAMDRIVE which comes with Microsoft Windows/386) and the other spare Mbyte as a disk cache (using SMARTDRV which also comes with Microsoft Windows/386). I believe that machines with this speed and memory are no longer exceptional and will soon be the norm for most developers. With one exception (mentioned later), the test results should be representative of the rankings for these compilers, regardless of the particular machine you use. If you're impatient for the results, you can grab a peek at Table 1, below

Table 1: Performance test results

                                           
Compile
Product                  Version       Link/Run       Dhrystone
                                       (sec)          (sec)

Microsoft C              5.1           129            10.4*
Walcom C                 6.0           187            10.6
Borland Turbo C          1.5           43             13.3
Datalight Optimum C      3.1           165            10.9
Computer Innovations     1.1           327            14.0
C86+

Although I used a 386 machine for the tests, please note that I made no effort to have the compilers produce code specifically for the 386. The code was generated to run on any member of the Intel 8086/80286/80386 line. As the Dhrystone was used as the performance test, all the compilers were directed to generate code with the fastest execution time (instead of optimizing for space).

Common Problems

In the C Torture Test, there were a couple of constructs that gave more than one compiler problems. The first is the following sequence of code:

foo(a)
struct x {int y, z;};
struct x a;      {...}

This source code should declare a structure tag x and define a function foo taking an argument a with type struct x. Three of the five compilers reviewed here, however, would not compile this construct.

The second common problem has to do with large constants. The suite attempts to confirm that decimal, octal, and hex representations of the same value all yield, in fact, the same value. The suite tests this at many points in the number range-nearly all of which work fine, of course. The problems arose when the suite attempted to test compile-time constant overflow. in particular, the test suite attempts to create three numbers greater than or equal to 232 in each of the three bases. Several of the compilers refused to compile such constants at all, yielding fatal errors. I believe the appropriate response is to warn about such things but also to continue compilation.

The last area of common problems has to do with the new ANSI escape sequence \x, which is a lead-in to a hex constant. The problem is what should the compiler do if all that it sees is \x? Should it yield a 0, a lowercase x, or a compile-time error? The ANSI draft is mute. My preference, as both a programmer and as a member of the ANSI committee, is to have \x yield a lower-case x, as this is unambiguous and will break the least amount of existing code.

The highlights (if you can call them that) of the test suite output are shown as Examples 1-4, starting on page 24. Perhaps the most interesting thing to note from the test results is how the compilers differ in allocating registers. Watcom C's output, for example, shows that this compiler reserves four(!) registers for pointer variables.

Microsoft C

The current standard bearer isn't just resting on its laurels. Microsoft C 5.1 improves on the old Version 4.0 C by including the Quick C compiler, an improved Codeview, the ability to run under and/or generate code for OS/2, as well as better code generation. The package now contains 14 diskettes, 2 of which are high density (the OS/2 library disk and the OS/2 Codeview disk). Regrettably, the documentation in the package i got was the 5.0 documentation with 5.1 update pages. The new update pages do not fit in the 5.0 binders, leading to some real headaches.

Microsoft C was able to deal with structure tag declarations in the area between a function declarator and its opening bracket. On the \x issue, Microsoft's compiler chose to generate a fatal compile-time error. When it came to constants that were larger than unsigned long, the compiler gave error messages. This is acceptable. What is not is that the compiler also complained about the decimal form of the largest unsigned long, claiming the constant was too big. The ANSI draft clearly states that this constant should compile, with a resulting type of unsigned long.

I originally compiled the suite and benchmarks with /Ox (which is equivalent to /Oatil /Gs), which yielded a number of validation suite errors when aliasing was being tested explicitly. So I decided to run the compiler for this review with /Oilt /Gs, which eliminates /Oa from the /Ox option. I was surprised to find that, besides all the aliasing tests now working, some of the floating-point tests that had previously failed (with/Ox) now succeeded.

When compiling with /Ox, Microsoft seems to cheat on floating point a bit. Under these conditions, it seems that 1.084202e-019 is the smallest number that can be added to either a single- or double-precision 1.0 and result in a number that does not compare equal to 1.0. Because float variables have only 24 logical mantissa bits, it's just not possible for there to be 19 digits of significance. This result could be obtained only if the compiler skipped the assignment to the float or double variable in the loop instead of leaving the result in the 80-bit floating-point registers. This same sort of "cheating" also explains the s626,er1 error I had got with /Ox that went away with /Oilt /Gs.

Microsoft C also doesn't seem to support signed bit fields yet, as required by the ANSI standard.

As for performance, it took 2 minutes and 9 seconds to compile all the test suite, run it, and then compile the Dhrystone and run it too. The execution time for the Dhrystone was the second best of all the compilers at 10.4 seconds. When compiled with /Ox, the Dhrystone time was 10.0 seconds, but I'm not certain the program ran correctly, given the problems with /Oa noted earlier.

Watcom C

Watcom is a newcomer to the DOS C world, but its first effort in the arena is very impressive. Watcom C gives no quarter to Microsoft C. The code generation is exceptional, and the package is extensive. Included with the optimizing compiler is a Quick C-like compiler (called Express C, also available separately), a source-level debugger that is better in some ways than CodeView, an editor, and the usual complement of tools. Watcom C comes on eight diskettes with five manuals--four wire-bound and one perfect-bound (like DDJ).

Watcom C failed to accept a structure declaration between a function declarator and its open {. It also complained about \x. The character constant \X is also subjected to the same treatment as \x, which is inexplicable as C has never had case-insensitive escape sequences. This is an outright bug. Watcom C did successfully deal with constants that overflowed 32 bits but did so without warning (at any warning level).

Watcom enforces a concept endorsed by the ANSI committee that may cause some of your existing code to break. The concept is called Miranda prototyping, after the Miranda rule of Supreme Court note. The idea is that if you do not supply a prototype for a function, one will be supplied for you. In particular, if a function foo is called in the absence of a prototype, it is assumed that all calls to foo will pass arguments of the same type, number, and order as the first. If you do not obey this rule, you fall into the ANSI pit of undefined behavior. With Watcom C, such breaking of the rules is a fatal run-time error. The reason why it's a fatal error is very interesting.

Watcom C uses, by default, a calling convention that is different from that used by either Microsoft C or Lattice C. Watcom C passes parameters in registers, which is a good idea but which requires the consistency of function usage mandated by ANSI for all "strictly conforming" programs. It is important to note that Watcom C can switch calling conventions on a function-by-function basis, giving you the maximum flexibility imaginable in working with other compilers (regardless of language).

As a result of Watcom's use of registers, its compiler discovered inconsistencies in arguments to a function in the test suite. I prefer this behavior because it will detect code that certainly isn't ANSI conforming and it probably is wrong. In the specific example, the arguments to the function were pointer to int, pointer to array of int, and pointer to array of array of int-all amounting to pointer to int.

After the compilation errors were cleaned up, the execution showed a small flaw in floating point (s626,er1) that is more probably an invalid assumption on the part of the test than a compiler bug. Watcom does fold duplicate string constants into a single string.

Watcom C took 3 minutes and 20 seconds to compile, link, and run the validation suite and Dhrystone. Dhrystone run time was by far the best of the bunch at 8.8 seconds, or 5681 Dhrystones per second. The difference in compile time is probably mostly because of the lack of use of the available RAM disk for temporary files.

Borland's Turbo C

Borland's Turbo C excels in two areas: speed of compilation and Unix compatibility. Code generation is not so great, and the lack of a source-level debugger is fatal.

Turbo C failed to accept a structure declaration between a function declarator and its opening curly bracket but properly handled excessively large constants and \x. After the problem with structure declarations was fixed, no problems were reported at all--a very impressive feat.

Not so impressive, however, is the lack of a clock function in the library. Borland's was the only compiler that did not have this key (for benchmarks) function.

Borland's compiler really shines in compile and link times. Turbo C took only 43 seconds to compile and link-something that took more than 2 minutes for the nearest competitor. The Dhrystone run time was a respectable, if not overwhelming, 13.3 seconds.

Although it has long been rumored that the next version of Turbo C will include debugging support, Version 2.0 wasn't available from Borland (not even in beta) at the time I conducted my tests. Borland did give me a demonstration of some planned improvements to Turbo C at its offices but declined to let me examine the product in its unfinished state. All I can say at this point is that it appears that the next version of Turbo C will include some tweaks to the run-time library and, more important, the long-awaited source-level debugger. I did not see any evidence, however, of major improvements in code generation--and it will take some real work for Borland to seriously challenge Watcom and Microsoft in this area.

Datalight's Optimum-C

Datalight's Optimum-C features excellent code generation and is pretty quick in compilation. But the package suffers from the lack of a source-level debugger.

Optimum-C was able to deal with structure tag declarations in the area between a function declarator and its opening curly bracket, and with the \x issue but not with the very-large-constant test. It produced the wrong answer for the size of a string constant of odd length by rounding it up. This resulted in a character-by-character memory compare looking at one too many bytes, the last of which, of course, failed.

Datalight's compiler took 2 minutes and 45 seconds to compile, link, and run the validation suite and the Dhrystone. Dhrystone run time alone accounted for 10.9 seconds--the third best of the bunch.

My recommendation to Datalight would be somehow to include a debugger to round out an otherwise complete package.

Computer Innovations' C86+

Computer Innovations was the first company to advertise an optimizing compiler as such. C86+ is the product. Unfortunately, the compiler is slow and the code it produces is not so great. Aggravating the problems of performance was the lack of a source-level debugger.

C86 + failed to accept a structure declaration between a function declarator and its opening curly bracket. After that was fixed, no problems were reported at compile time, but the validation suite ran two tests, then terminated with READ, whatever that means. Even odder was that when I ran the executable version of this test, I got the following output:

Section s22                 returned 0.
Section s241                returned 0.
s243,er2 Section s243       returned 2. READ

In contrast, when I redirected the output to a file, all I got was the last line. Neither result amounts to an acceptable performance on the test suite. C86 + took 5 minutes and 27 seconds to compile, link, and run the validation suite and the Dhrystone. Dhrystone run time accounts for 14.0 seconds. This makes C86 + the slowest compiler producing the slowest code in this group.

Summary

Because the emphasis here was on the quality of the code produced by the compilers, I didn't address other aspects of the packages. All the compilers reviewed here came with adequate documentation as well as an editor and a make facility. None of the editors even vaguely tempted me to switch away from Epsilon. Regarding the make utilities, Microsoft's was the weakest of the bunch but was fairly easy to understand; the other vendors attempted more or less to mimic the standard (Unix) make. Again, I found nothing to make me switch from the tool I already use (in this case, NDMake from D.G. Kneller).

As you read earlier, none of the compilers emerged from the testing unscathed. As a result, my recommendations really depend on the type of work you want to do.

Example 1: Test suite results for Microsoft C, Version 5.1

8 bits in chars.
16 bits in ints.
16 bits in shorts.
32 bits in longs.
16 bits in unsigneds.
32 bits in floats.
64 bits in doubles

1. 192093e-007 is the least number that can be added to 1. (float).

2. 220446e-016 is the least number that can be added to 1. (double).

Register count for char is unreliable. Register count for pointer is unreliable. Register count for int is unreliable.

char alignment: 1
short alignment: 2
int alignment: 2
long alignment: 2
unsigned alignment: 2
float alignment: 2
double alignment: 2

No errors detected.

Example 2: Test suite results for Watcom C, Version 6.0

8 bits in chars. 16 bits in ints. 16 bits in shorts. 32 bits in longs. 16 bits in unsigneds. 32 bits in floats.
64 bits in doubles

1. 192093e-007 is the least number that can be added to 1. (float).

2. 220446e-016 is the least number that can be added to 1. (double).

sign extension in char assignments

0 registers assigned to char variables.
2 registers assigned to pointer variables.
2 registers assigned to int variables.

char alignment: 1
short alignment: 2
int alignment: 2
long alignment: 2
unsigned alignment: 2
float alignment: 2
double alignment: 2

Sign extension in fields
Be especially careful with 1-bit fields

No errors detected.

Example 3: Test suite results for Borland's Turbo C, Version 1.5

 8 bits in chars.
16 bits in ints.
16 bits in shorts.
32 bits in longs.
16 bits in unsigneds.
32 bits in floats.
64 bits in doubles

1. 192093e-007 is the least number that can be added to 1. (float).

2. 220446e-016 is the least number that can be added to 1. (double).

Register count for char is unreliable. 4 registers assigned to pointer variables. Register count for int is unreliable.

char alignment: 1
short alignment: 1
int alignment: 1
long alignment: 1
unsigned alignment: 1
float alignment: 1
double alignment: 1

Signed extension in fields
Be especially careful with 1-bit fields:

Failed.

Example 4: Test suite results for DataLight's Optimum-C, Version 3.12

8 bits in chars.
16 bits in ints.
16 bits in shorts.
32 bits in longs.
16 bits in unsigneds.
32 bits in floats.
64 bits in doubles

1. 192093e-007 is the least number that can be added to 1. (float).

2. 220446e-016 is the least number that can be added to 1. (double).

1 register assigned to char variables. Register count for pointer is unreliable. Register count for int is unreliable.

char alignment: 1
short alignment: 2
int alignment: 2
long alignment: 2
unsigned alignment: 2
float alignment: 2
double alignment: 2

Failed.

If I were programming for OS/2 and DOS, then Microsoft is the only choice, primarily because of its superior debugger, CodeView. But if I were to program for DOS only, I would have great difficulty deciding between Microsoft and Watcom. At EPI we have been using Microsoft but we are going to start using Watcom (at least in parallel). It's a close call, though, as Microsoft still doesn't have a decent make and Quick C is crippled by the lack of support for multiple memory models, as is Express C.

Watcom C bears special watching as an excellent all-round compiler, good for DOS work, OK for porting work, and better than most for embedded systems work. I really want this compiler to succeed because it is so close to being the absolute best in every way. Watcom C is as good as Microsoft C in every significant way and is better in a few, most notably in execution time.

If I were porting existing programs from Unix to DOS, then Borland Turbo C would be my choice, but just barely. If the program is known to be portable, then Turbo C offers the best chance of running it unchanged. But the lack of a debugger makes using Turbo C problematical if the program doesn't "just run" after compiling. With the debugger I saw at Borland, this reservation would go away completely.

As nice as the Datalight Optimum-C compiler is, however, you probably won't be able to find it on the shelves much longer because the marketing rights to Optimum-C have recently been licensed to a British company called Zortech. According to a Zortech spokesperson, that company will not market Optimum-C per se, but instead incorporate the compiler's optimization technology into its own compiler called Zortech C. Datalight customers can expect to get product support from Zortech.

Datalight's Optimum-C is an excellent compiler, and I do like it very much. You get source code for the run-time library and an outstanding compiler for very little money. But the package lacks a debugger and professional documentation. Optimum-C would be my first choice for embedded systems work, but still it is not ideal (missing are the necessary utilities for burning PROMs and "remote" debugging).

Computer Innovations' C86 +, regrettably, has nothing except the reputation of the company to recommend it. You do get source code for the library but this hardly makes up for the lack of a debugger. And there's no escaping the fact that C86 + was the slowest compiler of the' group, producing the slowest code.

Notes

1. Richard Relph, "Optimizing Compilers for C," DDJ (August 1987).

2. Richard Relph, et al., "Benchmarking C Compilers," DDJ (August 1986).

3. Brian W. Kernighan and Dennis M. Ritchie, The C Programming Language (Englewood Cliffs; NJ.: Prentice-Hall, 1978).

Optimizing Techniques

The term "optimization" refers to the techniques a compiler uses to produce code that is appreciably smaller and/or faster than code generated by comparable non-optimized compilers. Over the past year, C compiler manufacturers have been jumping on the optimization bandwagon as a means of both improving their product and of differentiating it from other compilers on dealers' shelves. Unfortunately, they don't go into much detail explaining exactly what they mean by optimization other than saying their product is "faster and better" than other available compilers.

There are several optimization techniques available to compiler developers of which one or more method may be implemented within an individual compiler. These techniques include register allocation, constant propagation, dead assignment elimination, dead code removal copy propagation, and common subexpression elimination. For a detailed description of each of these methods, see "Optimizing Compilers for C" by Richard Relph (DDJ, August, 1987). You may also want to refer to Compiler Principles, Techniques, and Tools (also known as "The Dragon Book") by A. Aho, J. Ullman, and R. Sethi (Addison-Wesley).

Of all these methods, register allocation is perhaps the most advanced and important optimization technique and a key aspect of approach used by developers such as Watcom, Datalight/Zortech, and others. With register allocation, the number of memory references in a program are kept to a minimum by keeping variables and temporary results in registers. This allows shorter instructions to be generated which results in smaller, faster code.

The compilation process always begins by translating source into some internal, intermediate level representation. Once an intermediate representation of a function has been built, algorithms can be used to perform live variable analysis. A variable is live at a point in the flow graph if its value at that point could be used at some other point in the flow graph. Intuitively, a variable is live if its value may be used "later on" in the function.

Live variable information is used to build a data structure called the conflict graph. Every variable in the function has a corresponding node in the conflict graph. A link between two nodes is present whenever two variables are live at the same point in the flow graph. These links indicate that the two variables may not share the same register. The method used to assign registers to variables is called a coloring algorithm since it is similar to the problem of coloring a map so that no two adjacent countries have the same color. Since a machine has a limited number of registers, the object is to color (assign a register to) as many nodes of the conflict graph as possible without assigning the same color (register) to any two adjacent nodes. When a node is left uncolored, the corresponding variable must be assigned a position in memory or on the stack.

Of course, allocating registers intelligently requires information about the use of the variables. A variable that is heavily used in a loop is a better candidate for a register than a variable that is only used twice at the beginning of a function. Analysis may be performed on the flow graph to determine the loop nesting depth of each basic block. The uses of each variable are found and a weight is assigned to each use. Higher weights are assigned to uses within one or more loops. These weights are added up and stored in the conflict graph. The coloring algorithm takes these weights into account when assigning registers.

Although optimization can occur over several ranges statement, block, function, module, and program the compilation process, the most advanced are global optimization techniques that occur over the range of an entire function rather than just a single statement. Global optimizations are applied to intermediate representations of functions that consist of instructions formed by a set of operands and one operator. These instruction sets are organized into sequences called blocks that have one label at the beginning and one jump or return at the end. Blocks are linked together to form the control flow of the function.

In some cases, it is convenient for compilers to let the register allocator introduce some anomalies that are removed later in a post-optimization phase. One post-optimization technique involves keeping track of the value of a register at any point in the block. The data structure used to keep track of this information is sometimes called a register scoreboard. Each instruction is examined and the scoreboard is updated to reflect the effect that the instruction has on the register. ff a register is stored in a memory location, the scoreboard remembers that the memory location and the register contain the same value. When a move instruction is encountered, the scoreboard is consulted to determine if the move is redundant. When a memory reference is found in an instruction, the scoreboard is checked to find out if a register contains the appropriate value, By carefully designing the register scoreboard, a significant gain can be achieved. -- eds.

Putting C Compilers Through the Wringer

All the major players in the C compiler marketplace claim to support the standard language as defined by Kernighan and Ritchie, but do they really? And if not, what are the discrepancies?

An outfit called The Austin Code Works (11100 Leafwood Ln., Austin, TX 78750; 512-258-0785) sells a test suite to find out. It's called the C Compiler Torture Test. For $20, you get a PC-compatible disk containing 23 C programs and a READ.ME file that tells you how to use them.

The Torture Test measures a compiler's compliance with the K&R "white book." In a total of some 6600 lines of code, the suite exercises the entire standard. Particularly stressed are language features likely to be used by advanced C programmers: complex constructs, cascading defines, pointer manipulation such as multiple indirection, and so forth.

Testing with the suite occurs on two levels: compile-time and run-time. In some cases, the compiler might choke on certain constructs. This is a clear indication of noncompliance. A clean compile means that the compiler accepted the source language, but it doesn't mean that the result is correct.

Consequently, after torturing the compiler, you run each test program to see that it does what is expected. For example, dereferencing a chain of pointers should ultimately lead to a known variable that the program can compare with a hard-coded value. When the result is wrong, the program issues an error message. The messages are cryptic, but the test programs are structured so that you can look up the error message in the source text and determine from comments what the problem is.

Since the Torture Test programs contain only vanilla K&R code (regardless of its level of difficulty), a perfectly complying compiler should compile them without a whimper and the resulting programs should run correctly. A failure at-either level of testing pinpoints lack of compliance with the standard.

The C Torture Test is a valuable resource to both compiler writers and C users. Although delivered on a PC diskette, the code can be ported to any platform and used as is. And at only $20, nobody can complain about the price. -- eds.