for (i = 0; i < N; i++) { dst[i] = src[i]; }is translated into MIPS as
li $t0, 0 b test bod: sll $t1,$t0,2 add $t2,$t1,$a0 add $t3,$t1,$a1 lw $t4,0($t3) sw $t4,0($t2) add $t0,$t0,1 test: blt $t0,$a3,bodwith the obvious register assignments. The runtime of this code is 3 + 7 N cycles.
To unroll this loop, first we assume divisibility of N by 4:
for (i = 0, sp = src, dp = dst; i < N; i += 4) { dp[i+0] = sp[i+0]; dp[i+1] = sp[i+1]; dp[i+2] = sp[i+2]; dp[i+3] = sp[i+3]; dp += 4; sp += 4; }which would be translated into MIPS code as
li $t0, 0 move $t8,$a0 move $t9,$a1 b test bod: lw $t1,0($t9) lw $t2,4($t9) lw $t3,8($t9) lw $t4,12($t9) sw $t1,0($t8) sw $t2,4($t8) sw $t3,8($t8) sw $t4,12($t8) add $t0,$t0,4 add $t9,$t9,16 add $t8,$t8,16 test: blt $t0,$a2,bodwhich has a run time of 5 + 12 (N/4) = 5 + 3 N cycles. This could actually be improved a little still, without unrolling any more:
move $t8,$a0 move $t9,$a1 sll $t1,$a2,2 add $t0,$t9,$t1 b test bod: lw $t1,0($t9) lw $t2,4($t9) lw $t3,8($t9) lw $t4,12($t9) sw $t1,0($t8) sw $t2,4($t8) sw $t3,8($t8) sw $t4,12($t8) add $t9,$t9,16 add $t8,$t8,16 test: blt $t9,$t0,bodWhat did I do there?
To handle the cases when N is not a multiple of 4, we do
move $t8,$a0 move $t9,$a1 # and $t1,$a2,3 sll $t1,$t1,2 # was missing lw $t1,jtbl($t1) jr $t1 L3: lw $t1,0($t9) sw $t1,0($t8) add $t9,$t9,4 add $t8,$t8,4 L2: lw $t1,0($t9) sw $t1,0($t8) add $t9,$t9,4 add $t8,$t8,4 L1: lw $t1,0($t9) sw $t1,0($t8) add $t9,$t9,4 add $t8,$t8,4 and $t1,$a2,~3 # 0xfffffffc sll $t1,$t1,2 add $t0,$t9,$t1 .data jtbl: .word test .word L1, L2, L3 .text # bod: lw $t1,0($t9) lw $t2,4($t9) lw $t3,8($t9) lw $t4,12($t9) sw $t1,0($t8) sw $t2,4($t8) sw $t3,8($t8) sw $t4,12($t8) add $t9,$t9,16 add $t8,$t8,16 test: blt $t9,$t0,bodThis is roughly the C code:
sp = src; dp = dst; switch (N % 4) { case 3: *dp++ = *sp++; case 2: *dp++ = *sp++; case 1: *dp++ = *sp++; } N = N & ~3; for (endptr = sp + N; sp < endptr; ) { dp[i+0] = sp[i+0]; dp[i+1] = sp[i+1]; dp[i+2] = sp[i+2]; dp[i+3] = sp[i+3]; dp += 4; sp += 4; }
Multitasking is the ability to run several (usually unrelated) programs at once; the programs typically have separate address spaces. Multithreading is having several virtual CPUs, typically sharing the same address space.
bsy+www@cs.ucsd.edu, last updated