Aho, Sethi, and Ullman's classic text, Compilers: Principles, Techniques, and Tools, [3] is a great starting point for studying compiler optimizations.Loop optimizations are of key importance. Aho, Sethi, and Ullman point out that most programs spend the bulk of their time in loops. Since loops may iterate several thousand times, optimizing away a few instructions from the loop body can save thousands or millions of clock cycles. Depending on the processor's speed, this could represent a speedup of several seconds for a single loop.
Three important loop optimizations are code motion, induction variable analysis, and loop unrolling.
Code motion (sometimes called loop invariant removal) looks for statements that always produce the same value no matter how many times they are executed (that is, they are "loop invariant"). For example:
for (i = 0; i < 10000; i++) { /* Other stuff here */ n = 12; }No matter how many times the loop iterates (in this case 10,000), n is always 12 when the loop completes. Code motion would take the "n = 12;" statement and move it outside the loop, and save 9999 assignments (one is still performed prior to loop execution).Induction variable analysis is more complicated, and based on mathematics. The idea is that some arithmetic performed iteratively has a simple "one time" formula which produces the same result. For example:
i=0; while (i < 1000) { i++; }Is equivalent to:
i = 1000;In this example, i is called an induction variable. This is probably the simplest form of induction variable analysis, but many other mathematical formulas can be used to remove calculations from a loop. Note that in the example above the loop was eliminated entirely. This is what should be done when all statements are removed from a loop: remove the empty loop shell as well.Loop unrolling is an optimization which performs best on certain RISC processors. A full discussion of what CPU characteristics make this optimization desirable, as well as a complete description of the optimization itself, can be found in Hennessy and Patterson's Computer Architecture: A Quantitative Approach [4]. The idea is straightforward: reduce the ratio of loop control statements to loop body statements. For example:
i = 0; while (i < 1000) { a += f(x); i++; }Becomes:
i = 0; while (i < 1000) { a += f(x); a += f(x); a += f(x); a += f(x); i += 4; }This reduces the number of tests and loop index increments by a factor of 4. In some systems, the loop can be unrolled completely, meaning that the while statement and loop index are removed and replaced by 1000 a += f(x); statements. This however, is unusual in practice because of the large memory consumed by such a great increase in code size. The compiler implementer picks an "unrolling factor" (4 in the example above) that is considered well suited to the processor the code will run on.