CSE 30 -- Lecture 15 -- Nov 19


Bit Hacks -- Popcount

Popcount -- count the number of 1 bits in a word. What are the runtimes of these various versions?

Simple algorithm

popcount:	move	$v0, $zero
		move	$t0, $zero
		b	test
bod:		andiu	$t1, $a0, 1
		bnez	$t1, nope
		addiu	$v0, $v0, 1
nope:		addiu	$t0, $t0, 1
		blt	$t0, 32, bod
		jr	$ra

Branch removal

popcount:	move	$v0, $zero
		move	$t0, $zero
		b	test
bod:		andiu	$t1, $a0, 1
		addiu	$v0, $v0, $t1
nope:		addiu	$t0, $t0, 1
		blt	$t0, 32, bod
		jr	$ra

Unrolled

popcount:	move	$v0, $zero

		and	$t1, $a0, 1	# block repeated 32 times
		srl	$a0, $a0, 1		# this omitted in last rep
		add	$v0, $v0, $t1

		jr	$ra

Bit Parallel Algorithm

		.data
tbl:		.word	0x55555555
		.word	0x33333333
		.word	0x0f0f0f0f
		.word	0x00ff00ff
		.word	0x0000ffff

		.text
popcount:	la	$t0, tbl
		lw	$t1, 0($t0)
		srl	$v0, $a0, 1
		and	$v0, $v0, $t1
		and	$a0, $a0, $t1
		add	$v0, $v0, $a0

		lw	$t1, 4($t0)
		srl	$v1, $v0, 2
		and	$v1, $v1, $t1
		and	$v0, $v0, $t1
		add	$v0, $v0, $v1

		lw	$t1, 8($t0)
		srl	$v1, $v0, 4
		and	$v1, $v1, $t1
		and	$v0, $v0, $t1
		add	$v0, $v0, $v1

		lw	$t1, 12($t0)
		srl	$v1, $v0, 8
		and	$v1, $v1, $t1
		and	$v0, $v0, $t1
		add	$v0, $v0, $v1

		lw	$t1, 16($t0)
		srl	$v1, $v0, 16
		and	$v1, $v1, $t1
		and	$v0, $v0, $t1
		add	$v0, $v0, $v1
		jr	$ra

Bit hacks -- FFS

Finding the index of the least significant set bit in a word.

Hash-based

	/* does not distinguish between x = 2^31 and x = 0 */
	return tbl[(x ^ (x-1)) % 37];	/* mod 37 is a perfect hash */

More Math Required: Popcount Revisited

			/*   12345678901                 12345678901 */
	n = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
	/* 4a+2b+c - 2a+b - a = a+b+c, groups of 3 bits */
	return ((n + (n >> 3)) & 030707070707) % 63;
This mathematical trick makes use of the fact that 64 mod 63 is 1. Can you see what's going on?

Like many of the mathematical hacks that use integer division or remaindering, the expression is actually rather expensive to evaluatione on most of the early MIPS processors. On other processors that have faster hardware integer units, this operation might be competitive with the others shown above.


[ search CSE | CSE home | bsy's home page | webster i/f | yahoo | hotbot | lycos | altavista ]
picture of bsy

bsy+www@cs.ucsd.edu, last updated Mon Nov 30 21:53:19 PST 1998.

email bsy & tutors


Don't make me hand over my privacy keys!