CSE 30 -- Lecture 12 -- Nov 10

Assignment 4: due to disk quota problems earlier today, the deadline is extended to Nov 11 4:40pm. You will also have 4 late days instead of 2, still at 10% penalty per late day.

Assignment 5: the Model 200 Calculator

You must implement the model 200 version of the original calculator from assignment 4. It must have the following new-and-improved features:

32 deep stack in new mode; retain compatibility mode of 4 deep stack. When the calculator is first turned on, it runs in compatibility mode, where the stack is 4 deep and behaves as in the previous model.

new ``m'' command to switch calculator modes. This mode command checks the top of the stack to make sure that the value is either a 0 or 1. If not, an error message is generated and the stack is not affected. If the value is 0, it is popped off in the current mode (this determines which element is replicated), and then the calculator is switched into compatibility mode; if the value is 1, it is also popped off in the current mode, and then the calculator is switched into extended mode, where the stack is 32 elements deep.

new ``t'' command to pop the top element of the stack off (if greater than equal to zero) and push the number of moves required to solve the tower-of-hanoi problem with the number of disks equal to the popped off number. The tower-of-hanoi move function is:

int	tower_moves(int	n)
{
	if (n == 0) return 0;
	else return 1 + 2 * tower_moves(n-1);
}

If the top element of the stack was less than zero, an error message should be generated and the stack should be unaffected.

new ``c'' change-sign function to change the sign of the topmost element of the stack.

If you are uncertain about how this works, use the provided ~/../public/calc200 binary to see how it should work.

Loop unrolling

We consider the following table initialization code example.

	int	i;

	for (i = 0; i < N; i++)
		tbl[i] = i;

It assembles into

	li $t0, 0
	b test
loop:	sll $t1,$t0,2
	sw $t0,tbl($t1)
test:	blt $t0,$a1, loop	# assume a1 has N
				# slt $at,$t0,$a1
				# bne $at,$zero,loop

which is really

	li $t0, 0
	b test
loop:	sll $t1,$t0,2
	lui $at, UPPER(tbl)
	addu $at, $at, $t1
	sw $t0,LOWER(tbl)($at)
test:	slt $at,$t0,$a1
	bne $at,$zero,loop

This loop uses 6N + 2 instructions to initialize a table of N entries.

First, assume N is a multiple of 4. We write the code as

	int	i, *tblp;

	for (i = 0, tblp = tbl; i < N; tblp += 4) {
		tblp[0] = i++;
		tblp[1] = i++;
		tblp[2] = i++;
		tblp[3] = i++;
	}

which assembles into

	li $t0, 0
	la $t1, tbl		# lui $t1,UPPER(tbl); ori $t1,$t1,LOWER(tbl)
	b test
loop:	sw $t0,0($t1)
	addi $t0,$t0,1		# C compilers would use addiu since C has
	sw $t0,4($t1)		#  no exceptions for overflows
	addi $t0,$t0,1
	sw $t0,8($t1)
	addi $t0,$t0,1
	sw $t0,12($t1)
	addi $t0,$t0,1
	addiu $t1,$t1,16
test:	slt $at,$t0,$a1
	bne $at,$zero,loop

The second loop runs ^N/₄ times, each iteration costing 11 instructions. Thus the run time is 11 ^N / ₄ + 4 or 2.75 N + 4. For sufficiently large N, this is more than twice as fast.

To handle an input N that is not a known constant that is a multiple of 4, we do the following:

	int	i, *tblp, N0;

	i = 0; tblp = tbl;
	N0 = N >> 2;	/* N div 4 */
	switch (N&3) {	/* N rem 4 */
	case 3:	*tblp++ = i++;
	case 2:	*tblp++ = i++;
	case 1:	*tblp++ = i++;
	}
	for (; i < N; tblp += 4) {
		tblp[0] = i++;
		tblp[1] = i++;
		tblp[2] = i++;
		tblp[3] = i++;
	}

which assembles into

	li $t0, 0
	la $t1, tbl	# lui $t1,UPPER(tbl); ori $t1,$t1,LOWER(tbl)
	sra $t2,$a1,2	# assume a1 has N
	andi $t3,$a1,3
	lw $t3,case_tbl($t3)
	jr $t3
c3:	sw $t0,0($t1)
	addi $t0,$t0,1
	addiu $t1,$t1,4
c2:	sw $t0,0($t1)
	addi $t0,$t0,1
	addiu $t1,$t1,4
c1:	sw $t0,0($t1)
	addi $t0,$t0,1
	addiu $t1,$t1,4
	.data
case_tbl:
	.word test,c1,c2,c3
	.text
	b test
loop:	sw $t0,0($t1)
	addi $t0,$t0,1
	sw $t0,4($t1)
	addi $t0,$t0,1
	sw $t0,8($t1)
	addi $t0,$t0,1
	sw $t0,12($t1)
	addi $t0,$t0,1
	addiu $t1,$t1,16
test:	blt $t0,$a1, loop

There is a little more constant overhead to deal with the portion of the array not covered by the multiple-of-4 loop body, but for large N this is negligible.

bsy@cse.ucsd.edu, last updated Tue Nov 11 03:14:00 PST 1997.

email bsy