Little-endian byte order: the reverse. The least significant byte is stored in the lowest address.
The MIPS architecture is bi-endian -- the processor chip can be configured to be big-endian (as in the SGI machines) or little-endian (as in the old DEC machines).
The reason for doing this is performance/simplicity: the data bus is typically the width of a cache line (or a small integer fraction of a cache line for lower cost implementations), and aligned memory references mean that in all cases a single bus transaction will suffice to obtain the word in a cache miss. If words do not have to be aligned, then two bus transfers would be needed if the word spans two cache lines. Requiring the hardware to detect when this is needed and handle such transfers makes the processor implementation more complex and thus slower.
fact: sub $sp,$sp,12 sw $fp,4($sp) add $fp,$sp,12 sw $ra,-4($fp) bgt $a0,1,rec_fact li $v0,1 b rec_done rec_fact: sw $a0,0($fp) sub $a0,$a0,1 jal fact lw $a0,0($fp) mult $v0,$v0,$a0 rec_done: move $t0,$fp lw $fp,-8($fp) lw $ra,-4($fp) move $sp,$t0 jr $raThis is the same as:
int fact(int n) { if (n <= 1) return 1; else return n * fact(n-1); }
int fact2(int n) { int v = 1 ,i; for (i = n; i > 1; i--) v = v * i; return v; }The loop invariant -- at the test -- is n! = v * i!. We prove this using induction: the base case is when i = n, and v = 1 which obviously satisfies the invariant expression. We do induction counting down: assume that the invariant holds at the test, so after some k iterations we have i = ik and v = vk satisfying n! = vk * ik!. We run the loop body once, see that the variable v is updated to contain vk * i, and the variable i is updated to contain ik-1. We check whether the new values in the variables satisfy the invariant:
v * i! = (vk * ik) * i! = vk * ik * (ik - 1)! = vk * ik! = n!
i = n; while (i > 1) { v = v * i; i--; }The naive way to compile this loop into assembly is to write it as:
move $t0,$a0 # $t0 is i, $a0 is n, $t1 is v top: ble $t0,1,loop_done mult $t1,$t0 # mult $t1,$t1,$t0 expands into this two mflo $t1 # instruction sequence subi $t0,$t0,1 b topbut it is more efficiently translated as:
move $t0,$a0 b test top: mult $t1,$t0 mflo $t1 sub $t0,$t0,1 test: bgt $t0,1,topThis is more efficient, since the loop body is shorter.
int i; for (i = 0; i < N; i++) { tbl[i] = i * i; }We take advantage of the algebraic identity (i + 1)2 = i2 + 2 i + 1 :
int i, isq; for (i = isq = 0; i < N; ) { tbl[i] = isq; isq = isq + 2 * i + 1; i++; }We have gotten rid of the general multiplication and replaced it with a multiplication by a power of 2 and two adds. The multiplication by 2 is implemented as a simple left by 1 bit, so all three operations are single cycle operations. The run time for mult was given in the following table:
Implementation | mult | multu | div | divu |
R2000 | 12 | 12 | 35 | 35 |
R3000 | 12 | 12 | 35 | 35 |
R4000 | 10 | 10 | 69 | 69 |
R6000 | 17 | 18 | 38 | 37 |
At the end of the class, I asked what you would do for
int i; for (i = 0; i < N; i++) { tbl[i] = i * i * i; }to eliminate the multiplications. You should think this through.
bsy@cse.ucsd.edu, last updated