Source Optimization % of Cycles # of Cycles Performance Improvement
fir0a None 64.34 27230.60 --
fir8a Separated Inner Loop into Extension Instruction 25.15 5065.00 >5x
fir8b Shuffled Extension Instruction Schedule 17.46 3189.00 8x
fir8c Manually Unrolled Loops 15.24 2711.40 >10x
fir16 Used 16 Multipliers 6.96 1128 >24x
fir32 Used 32 Multipliers 4.36 687 >39x

Table 1: Performance improvement of executing function over straight C code.

Back to Article