The topics covered in this lecture are risks and costs of the attacks and the potential for damage, a discussion on probabilities and expectation values, where efforts to improve security is best spent, some thoughts about how to choose virus software, and a discussion about buffer overflow attacks.

To protect a system, we need to place safeguards which are appropriate for the importance of the data they are protecting. To do this, we have to make estimates of two things:

- The
**likelihood of intrusion**, which is the probability that some attack will occur and cause damage to the system. - The
**cost of potential damage**caused by attacks. In other words, if a virus attacks the company payroll computers and information about salaries is lost, how much will that cost the company? Costs can be in terms of many things, but the most relevant are money and time.

Many times we cannot directly know the probability or the cost of intrusion, but we can make estimates. This is what insurance companies do, based on past data. Once we know what our probabilities and costs are, we can use that information to decide where we should concentrate our efforts in security. Sometimes, there are more efficient ways of protecting our data than adding security, such as making backups.

The risk or ``exposure'' is how much we are likely to stand to lose due to a security problem. To understand risk exposure, we need to first understand some basic notions from probability theory.

`n`

possible outcomes. For
outcome `i`

, we represent the probability as
`p(i)`

, and the value (or cost) as `v(i)`

. Then
the formula for the expectation value of the game is:
```
````E(game) = sum(i=1..n, p(i) * v(i))`

Let's look at some examples of this in games of chance. We will look at three
games, each of which costs a dollar to play and has some payback distribution.
We will decide whether or not it is wise to play these games based on the
expectation value of the payback.
payback:

outcome | you win |

heads | $2.00 |

tails | $0.10 |

Should you play this game? The answer is yes. Let's find out why. The expected payback of the game comes from the equation above:

This means that on average you expect to win $1.05 per game, even though no one game will ever give you $1.05! Since the cost to play is $1.00 per game, you will make a net profit of $0.05 per game if you play, so you should play this game.E(coin game) = p(heads) * v(heads) + p(tails) * v(tails) E(coin game) = 0.5 * $2.00 + 0.5 * $0.10 E(coin game) = $1.05

payback:

outcome | you win |

1 | $4 |

2 | $0 |

3 | $0 |

4 | $0 |

5 | $1 |

6 | $2 |

Should you play this game? Let's do the math to find the expected payback:

This means that on average, you will win $1.16 per game (approximately), netting a profit of $0.16 per game. That's even better than the coin game. So you should play this game.E(1 die game) = p(1)*v(1) + p(2)*v(2) + p(3)*v(3) + p(4)*v(4) + p(5)*v(5) + p(6)*v(6) E(1 die game) = 1/6 *$4 + 1/6 *$0 + 1/6 *$0 + 1/6 *$0 + 1/6 *$1 + 1/6 *$2 E(1 die game) = 1/6 * ($4 + $0 + $0 + $0 + $1 + $2) E(1 die game) = 1/6 * $7 E(1 die game) = $1.16 (approximately)

The outcome here is the

Payback:

outcome | you win |

2 | $3 |

3 | $0 |

4 | $2 |

5 | $0 |

... | $0 |

11 | $0 |

12 | $3 |

Should you play this game? We can figure it out by expectation values. We know that there are 6*6=36 outcomes, and we can figure out the probability of each from this sum table:

1 | 2 | 3 | 4 | 5 | 6 | |

1 | 2 | 3 | 4 | 5 | 6 | 7 |

2 | 3 | 4 | 5 | 6 | 7 | 8 |

3 | 4 | 5 | 6 | 7 | 8 | 9 |

4 | 5 | 6 | 7 | 8 | 9 | 10 |

5 | 6 | 7 | 8 | 9 | 10 | 11 |

6 | 7 | 8 | 9 | 10 | 11 | 12 |

In this game, you expect to get back only $0.33 per game, and you have to pay $1.00 to play. So you expect toE(2 dice game) = p(2)*v(2) + p(3)*v(3) + p(4)*v(4) + p(5)*v(5) + ... + p(11)*v(11) + p(12)*v(12) E(2 dice game) = 1/36*$3 + 2/36*$0 + 3/36*$2 + 4/36 *$0 + ... + 2/36 *$0 + 1/36 *$3 E(2 dice game) = 1/36 * (1*$3 + 3*$2 + 1*$3) E(2 dice game) = 1/36 * $12 E(2 dice game) = $0.33 (approximately)

Note: sometimes we combine the cost of the game with the payback to compute a ``net'' game value by adding in the cost of the game in a net expected value calculation: the cost is simply a negative payback that occurs with probability 1.

Attackers will aim for the weakest link in the security the breaking
of which will let them get at what they're after, which will be the
assets that you're protecting. How much effort -- in terms of money
spent to buy security hardware or software, or person-hours spent in
improving security -- should be applied to what areas? The goal
should be to apply the security improvement resources to minimize the
risk exposure for the system. Doing this requires estimating the
exposure -- the probability that various attacks will occur
(*not* the same as whether they'll succeed!) and the cost if the
attacks succeed gives the expected losses for the various attacks --
and the effectiveness and cost of various security measures that might
be taken.

None of this is easy. Let us consider just the problem of choosing the ``right'' anti-virus software.

- You can rely on third-party (either expert or anecdotal) reviews of the software in question.
- You can check it yourself against known viruses.
- You can look at the company's track record for releasing software updates when new viruses are discovered.

Can we rely on reports like `Virus protection software A can identify X number of viruses, while package B can identify only Y number of viruses' (where X > Y)? Well, we might be able to, but it is important to be aware of how many viruses that are actually `in the wild' that each package can detect. If package A detects a large number of viruses that don't even occur except in research labs, then those numbers can be meaningless.

How quickly new in-the-wild viruses are identified and new virus definition databases released are extremely critical to the effectiveness of virus detection software. Most software of this type are not able to recognize new viruses, and require periodic updates in order to protect you against new threats. Between the time that a new virus starts spreading in the wild and when that update occurs, the probability of catching the virus could be quite high, and therefore making the expected damage or risk exposure high.

We might want to try to differentiate among viruses (esp new ones) according to how destructive their ``payload'' is. This certainly would estimate the value of damage that would occur if the virus ran unchecked better. Note, however, that computer viruses, unlike naturally occurring ones, undergoes human-directed ``evolution'': a virus that is very successful at propagating itself but has a relatively benign payload might easily be modified to carry a much more destructive payload. This is certainly easier for a virus author to do than to design a new virus from scratch. Thus, the existence of the first virus is often statistically correlated with the later introduction of the more destructive variant. Furthermore, the new variant might be detected by the virus detection software in the same way as the ancestral virus is detected, i.e., the recognition is done on parts of the viruses that are unchanged by the mutation. It sometimes makes sense to conservatively evaluate the destructive power of viruses by over-estimating the damage to try to factor in this uncertainty. (Properly this should be modelled in the risk evaluation as a hypothetical new virus that has a good probability of propagating at least as well as the original, and greater -- but yet unknown -- destructive capabilities, coupled with a higher than average probability of detection, which partially mitigates the potential for damage.)

Another thing to consider is how well the software is at avoiding false positives. A false positive occurs when the protection software thinks there is a virus, but it is wrong. This is another cost of using the software: when the detector goes off, the users of the computers will have to spend time determining that it was a false alarm, leading to a loss of productivity.

See the problem here? The functionint f(int j) { int i; char buf[128]; ... gets(buf); ... }

`gets()`

does not know
how much storage the variable `buf`

has. This is because
`buf`

is actually passed as a pointer and not an array. So
what can happen as a result of this?
To answer this, we have to know how the C/C++ stack works. It varies a
bit by architecture and compiler, but in general we have a stack that
grows down as more memory is pushed onto it. So we may have a stack
that looks like this once we have gotten to `gets()`

:

**Stack frame:**

address | size | item |

1000 | 4 | argument j |

996 | 4 | return address of calling function |

992 | 4 | local variable i |

988 | 128 | local variable buf |

860 | (none) | (stack pointer) |

Next time we will investigate how this setup can cause problems.

[ search CSE | CSE | bsy's home page | links | webster | MRQE | google | yahoo | citeseer | certserver ]

bsy+cse127w02@cs.ucsd.edu, last updated

email bsy.