CSE 127: Lecture 13

The topics covered in this lecture are building a specification for code, proving code correct, and assignment 2.

Developing code specifications

Bugs can occur at several levels: two of the most significant are in the specification and in the implementation. Clearly if the specification has a bug, the implementation will also suffer from that bug. First let's define a specification for the sorting algorithm we looked at earlier:

Specification for void sort(int *arr, int nelt):
Input constraints:

arr is an array of integers
nelt is the length of the array arr

Output constraints for array, after the code runs (denote by arr the original array contents and by arr' what the array contains after the code runs):

For all i, 0 <= i < nelt - 1: arr'[i] <= arr'[i+1]

This is not enough, because any output of monotonically increasing integers would fulfill this output constraint. For example, {0,1,2,3} would always satisfy this constraint when nelt = 4. We need a constraint that preserves the fact that every input element also exists in the output, and vice versa. We need a one-to-one and onto function guarantee between arr and arr'. This type of function is a permutation: Revised output constraints for output array arr':

For all i, 0 <= i < nelt - 1: arr'[i] <= arr'[i+1]
There exists phi: phi is a permutation on {0, 1, 2, ..., nelt - 1} and for all i: 0 <= i < nelt, arr'[phi(i)] = arr[i]

Permutations form a nice class of functions. Here we want to work with permutations of nelt objects, labeled 0, 1, 2, ... nelt-1. We define our domain and range set as S={0,1,2,...,nelt-1}, and require that phi is a bijection on S. A fact that we will use is that the composition of two permutations is itself a permutation.

Then we need to prove the correctness of the sort routine starting with the an identity permutation phi₀, and showing that at each step s the function maintains that there is a permutation phi_s relating the current state of the array with its initial state.

Now here is the proposed code that will implement this specification:

1.  void sort(int *arr, int nelt) {
2.      int mid, smallix, bigix, t;
3.      if (nelt <= 1) return;
4.  
5.      mid = arr[0];
6.      smallix = 0;
7.      bigix = nelt;
8.  
9.      while (smallix < bigix) {
10.         while (smallx < nelt && arr[smallix] <= mid)
11.             ++smallix;
12.         while (bigix > 0 && arr[bigix-1] > mid)
13.             bigix--;
14.         if (!(smallix==nelt || 0 == bigix || smallix == bigix)) {
15.             t = arr[smallix];
16.             arr[smallix] = arr[bigix - 1];
17.             arr[bigix - 1] = t;
18.         }
19.     }
20.     t = arr[0]; arr[0] = arr[smallix - 1]; arr[smallix - 1] = t;
21.     sort(arr, smallix - 1);
22.     sort(arr + smallix, nelt - smallix);
23. }

Proving code correct

There are two key strategies to proving code correctness:

Loop invariants -- an expression that is true upon entry to the loop and remains true after every execution of the body
Induction -- used to prove recursive algorithms correct and to prove that loop invariants hold

Our sort routine has both loops and recursion, so we will need to use both techniques.

Notice that the main while loop divides the array into two parts, based on a pivot point (which is based on the value of variable mid). Now we begin with our inductive proof:

Proof by induction

Base case: arrays of length 0 or 1 are sorted by the algorithm. We know that, by definition, arrays of length 0 or 1 are already sorted. So line 3 handles that case.
Assume that the algorithm works for all nelt < k.
Prove that the algorithm works for nelt = k.

We begin the proof of the third step by noticing that a loop invariant that can help us. We know that smallix < bigix for the entire loop. Why? Because it begins that way (before the loop), and the two small while loops ensure that smallix is only incremented when arr[smallix] <= mid, and that bigix is only decremented when arr[bigix] > mid.

There are three regions of the array: [0,smallix), [smallix, bigix), and [bigix,nelt). The region (smallix,bigix) gets smaller due to the inner while loops, until it eventually reaches size 0 when smallix = bigix.

For each of these regions, we know the following loop invariants are true:

for all i, 0 <= i < smallix: arr[i] <= mid
for all i, bigix <= i < nelt: mid < arr[i]

We can prove these invariants because when you start the main loop, this is true. Also, if the first invariant is true before the while loop that increments smallix, then it is true after that loop. Similarly for the second invariant and bigix. That is the only code we need to look at, since other code doesn't change the values of bigix and smallix.

We will continue this proof next time in class.

You can download this annotated code and play with it yourself:

Assignment 2

After consulting with marketing, your pointy-haired manager decided that the worse-case performance of the original sort algorithm is unacceptable, and changed the code to chose a random pivot. (Your company's competitor has been able to generate input that results in worst-case behavior at a head-to-head competition.) He checked in his version into the project, but neglected to test it.

Your task is to fix his code. In order to keep your job -- your pointy-haired boss will undoubtedly read your changes -- you must keep the random pivot selection idea. Use what you know about testing and proofs of correctness to make this code work again.

You should hand in: the fixed (and properly annotated) code, the testing scaffolding that you used when debugging your implementation, the test inputs (esp those that showed bugs in the pointy-haired boss's implementation or your own initial fixes) that will serve as test cases for regression testing subsequent versions, and a README.txt file containing a description of what you did, how you decide to do what you did, and why you believed it to be the correct fix(es). You should also include in your writeup a discussion of whether the worse-case performance of the original code might actually a problem in practice.

Clarification: by test scaffolding I mean what code you write to test your fixes to the sort function. I expect you to have some sort of testing driver which allows you to at least semi-automate the process of feeding in the test cases from a test suite, and such a driver program will be part of the testing tools for the project, to be handed over to sustaining engineering along with the regression testing test suite. You may wish to generate your test cases via a program, or just have data files -- your design decision should be part of the writeup.

This assignment is due at 2359 on February 15, 2002.

bsy+cse127w02@cs.ucsd.edu, last updated Mon Mar 25 15:22:09 PST 2002. Copyright 2002 Bennet Yee.
email bsy.