From last time, we had:
for (i = 0; i < N; i++) { sq[i] = i*i; }Translated into MIPS assembly, it is:
instructions cycles li $t0, 0 1 b test 1 bod: mul $t0,$t0 7 + ? (depends on multiplier) mflo $t1 1 sll $t2,$t0,2 1 add $t2,$t2,$a1 1 sw $t1,0($t2) 1 (if write buffer) + ?? test: blt $t0,$a0,bod 1 (ignoring pipeline issues for now)The runtime of this code is 3 + (12 + ?) N
Strength reduction of squaring:
for (isq = i = 0; i < N; i++) { sq[i] = isq; isq = isq + 2 * i + 1; }Translated into MIPS assembly:
instructions cycles li $t0, 0 1 li $t1, 0 1 b test 1 bod: sll $t2,$t0,2 1 add $t2,$t2,$a1 1 sw $t1,0($t2) 1 (if write buffer) + ?? sll $t3,$t0,1 1 add $t1,$t1,$t3 1 add $t1,$t1,1 1 add $t0,$t0,1 1 test: blt $t0,$a0,bod 1 (ignoring pipeline issues for now)The run time is 4 + (8 + ?) N , or about 2/3 the runtime of the orginal.
We can strength reduce the array index calculation and save an extra cycle:
int *sqp = sq; for (isq = i = 0; i < N; i++) { *sqp++ = isq; isq = isq + 2 * i + 1; }Translated into MIPS assembly:
instructions cycles li $t0, 0 1 li $t1, 0 1 li $t4, $a1 1 b test 1 bod: sw $t1,0($t4) 1 (if write buffer) + ?? add $t4,$t4,4 1 sll $t3,$t0,1 1 add $t1,$t1,$t3 1 add $t1,$t1,1 1 add $t0,$t0,1 1 test: blt $t0,$a0,bod 1 (ignoring pipeline issues for now)And this run time is 5 + (7 + ??) N , 12.5% faster still, for large values of N.
I talked about data hazards and control hazards and how they stall the pipeline or introduce ``bubbles'' into it. I also talked about how bypass circuitry is used to eliminate some of the bubbles for data hazards, and how the branch delay slot in the MIPS architecture eliminates bubbles for control hazards. The trend in processor architecture design is not to use branch delay slots, since there are inherent problems with them (what are they?) -- instead, branch prediction is used to guess the direction that a conditional branch will take, and to speculatively execute along that branch. If the guess is right, everything is okay; if the guess is wrong, the partially executed guessed instructions are flushed from the pipeline. (Results are ``pending'' and committed to the register file only when the branch decision is known to be good.)
bsy+www@cs.ucsd.edu, last updated