CSE 30 -- Lecture 12 -- Nov 10


Assignment 4: due to disk quota problems earlier today, the deadline is extended to Nov 11 4:40pm. You will also have 4 late days instead of 2, still at 10% penalty per late day.

Assignment 5: the Model 200 Calculator

You must implement the model 200 version of the original calculator from assignment 4. It must have the following new-and-improved features:

  • 32 deep stack in new mode; retain compatibility mode of 4 deep stack. When the calculator is first turned on, it runs in compatibility mode, where the stack is 4 deep and behaves as in the previous model.
  • new ``m'' command to switch calculator modes. This mode command checks the top of the stack to make sure that the value is either a 0 or 1. If not, an error message is generated and the stack is not affected. If the value is 0, it is popped off in the current mode (this determines which element is replicated), and then the calculator is switched into compatibility mode; if the value is 1, it is also popped off in the current mode, and then the calculator is switched into extended mode, where the stack is 32 elements deep.
  • new ``t'' command to pop the top element of the stack off (if greater than equal to zero) and push the number of moves required to solve the tower-of-hanoi problem with the number of disks equal to the popped off number. The tower-of-hanoi move function is:
    int	tower_moves(int	n)
    {
    	if (n == 0) return 0;
    	else return 1 + 2 * tower_moves(n-1);
    }
    
    If the top element of the stack was less than zero, an error message should be generated and the stack should be unaffected.
  • new ``c'' change-sign function to change the sign of the topmost element of the stack.
  • If you are uncertain about how this works, use the provided ~/../public/calc200 binary to see how it should work.

    Loop unrolling

    We consider the following table initialization code example.
    	int	i;
    
    	for (i = 0; i < N; i++)
    		tbl[i] = i;
    
    It assembles into
    	li $t0, 0
    	b test
    loop:	sll $t1,$t0,2
    	sw $t0,tbl($t1)
    test:	blt $t0,$a1, loop	# assume a1 has N
    				# slt $at,$t0,$a1
    				# bne $at,$zero,loop
    
    which is really
    	li $t0, 0
    	b test
    loop:	sll $t1,$t0,2
    	lui $at, UPPER(tbl)
    	addu $at, $at, $t1
    	sw $t0,LOWER(tbl)($at)
    test:	slt $at,$t0,$a1
    	bne $at,$zero,loop
    
    This loop uses 6N + 2 instructions to initialize a table of N entries.

    First, assume N is a multiple of 4. We write the code as

    	int	i, *tblp;
    
    	for (i = 0, tblp = tbl; i < N; tblp += 4) {
    		tblp[0] = i++;
    		tblp[1] = i++;
    		tblp[2] = i++;
    		tblp[3] = i++;
    	}
    
    which assembles into
    	li $t0, 0
    	la $t1, tbl		# lui $t1,UPPER(tbl); ori $t1,$t1,LOWER(tbl)
    	b test
    loop:	sw $t0,0($t1)
    	addi $t0,$t0,1		# C compilers would use addiu since C has
    	sw $t0,4($t1)		#  no exceptions for overflows
    	addi $t0,$t0,1
    	sw $t0,8($t1)
    	addi $t0,$t0,1
    	sw $t0,12($t1)
    	addi $t0,$t0,1
    	addiu $t1,$t1,16
    test:	slt $at,$t0,$a1
    	bne $at,$zero,loop
    

    The second loop runs N/4 times, each iteration costing 11 instructions. Thus the run time is 11 N / 4 + 4 or 2.75 N + 4. For sufficiently large N, this is more than twice as fast.

    To handle an input N that is not a known constant that is a multiple of 4, we do the following:

    	int	i, *tblp, N0;
    
    	i = 0; tblp = tbl;
    	N0 = N >> 2;	/* N div 4 */
    	switch (N&3) {	/* N rem 4 */
    	case 3:	*tblp++ = i++;
    	case 2:	*tblp++ = i++;
    	case 1:	*tblp++ = i++;
    	}
    	for (; i < N; tblp += 4) {
    		tblp[0] = i++;
    		tblp[1] = i++;
    		tblp[2] = i++;
    		tblp[3] = i++;
    	}
    
    which assembles into
    	li $t0, 0
    	la $t1, tbl	# lui $t1,UPPER(tbl); ori $t1,$t1,LOWER(tbl)
    	sra $t2,$a1,2	# assume a1 has N
    	andi $t3,$a1,3
    	lw $t3,case_tbl($t3)
    	jr $t3
    c3:	sw $t0,0($t1)
    	addi $t0,$t0,1
    	addiu $t1,$t1,4
    c2:	sw $t0,0($t1)
    	addi $t0,$t0,1
    	addiu $t1,$t1,4
    c1:	sw $t0,0($t1)
    	addi $t0,$t0,1
    	addiu $t1,$t1,4
    	.data
    case_tbl:
    	.word test,c1,c2,c3
    	.text
    	b test
    loop:	sw $t0,0($t1)
    	addi $t0,$t0,1
    	sw $t0,4($t1)
    	addi $t0,$t0,1
    	sw $t0,8($t1)
    	addi $t0,$t0,1
    	sw $t0,12($t1)
    	addi $t0,$t0,1
    	addiu $t1,$t1,16
    test:	blt $t0,$a1, loop
    
    There is a little more constant overhead to deal with the portion of the array not covered by the multiple-of-4 loop body, but for large N this is negligible.
    [ CSE home | CSE talks | bsy's home page | webster i/f | yahoo | lycos | altavista | pgp key svr | spam | commerce ]
    picture of bsy

    bsy@cse.ucsd.edu, last updated Tue Nov 11 03:14:00 PST 1997.

    email bsy


    Don't make me hand over my privacy keys!