$ timing circuit5.asm xterm-size circuit5.asm 2.77071 palmpilot circuit5.asm 2.25902 WinCE circuit5.asm 1.96242The C code from which the fast assembly code was derived is also now available. The technique used is sometimes called ``bit-slicing'', and essentially what is going on is that 32-pixels are worked on at once in parallel.
Stephane's solution is also available. He was asked to restrain himself and use only those tricks that we expect a good student in the class would know to use, and it achieves the following performance:
$ timing cs30f1/xform.asm xterm-size cs30f1/xform.asm 20.06285 palmpilot cs30f1/xform.asm 17.42992 WinCE cs30f1/xform.asm 15.45578Stephane used table-lookup not for counting the neighbors, but instead used a 9-bit index, including the current state of the center pixel, to compute the future state of the pixel (its ``fate'') in one lookup. For some of his code, he found that there is a simple repetitive pattern, and he used an auxilliary C program to ``automatically'' generate it the MIPS assembly language code.
bsy@cse.ucsd.edu, last updated