# CSE 127: Lecture 5

The topics covered in this lecture are risks and costs of the attacks and the potential for damage, a discussion on probabilities and expectation values, where efforts to improve security is best spent, some thoughts about how to choose virus software, and a discussion about buffer overflow attacks.

## Risks and Costs

So far we have discussed how information on a computer may be compromised, but we have not had a good idea of how much effort we should put into protecting systems. Some data is more important than other data. For example, the integrity of company payroll information is probably more important than the integrity of corporate email. However, even within email, data secrecy may be more important for the CEO than for the secretary.

To protect a system, we need to place safeguards which are appropriate for the importance of the data they are protecting. To do this, we have to make estimates of two things:

• The likelihood of intrusion, which is the probability that some attack will occur and cause damage to the system.
• The cost of potential damage caused by attacks. In other words, if a virus attacks the company payroll computers and information about salaries is lost, how much will that cost the company? Costs can be in terms of many things, but the most relevant are money and time.

Many times we cannot directly know the probability or the cost of intrusion, but we can make estimates. This is what insurance companies do, based on past data. Once we know what our probabilities and costs are, we can use that information to decide where we should concentrate our efforts in security. Sometimes, there are more efficient ways of protecting our data than adding security, such as making backups.

The risk or ``exposure'' is how much we are likely to stand to lose due to a security problem. To understand risk exposure, we need to first understand some basic notions from probability theory.

## Probability and Expectation Values

If you are playing a game of chance with different possible outcomes that are chosen according to a probability distribution, you might ask what outcome you expect to receive on average if you play the game many times. This is what an expectation value is: the weighted average outcome of a system (or in our case, a game). The weights of the weighted average are the probabilities that each outcome will occur. Suppose we have a game with `n` possible outcomes. For outcome `i`, we represent the probability as `p(i)`, and the value (or cost) as `v(i)`. Then the formula for the expectation value of the game is:
`E(game) = sum(i=1..n, p(i) * v(i))`
Let's look at some examples of this in games of chance. We will look at three games, each of which costs a dollar to play and has some payback distribution. We will decide whether or not it is wise to play these games based on the expectation value of the payback.

### Example 1: flipping a fair coin

cost to play: \$1.00 per coin flip
payback:
 outcome you win heads \$2.00 tails \$0.10

Should you play this game? The answer is yes. Let's find out why. The expected payback of the game comes from the equation above:

```E(coin game) = p(heads) * v(heads) + p(tails) * v(tails)
E(coin game) = 0.5      * \$2.00    + 0.5      * \$0.10
E(coin game) = \$1.05
```
This means that on average you expect to win \$1.05 per game, even though no one game will ever give you \$1.05! Since the cost to play is \$1.00 per game, you will make a net profit of \$0.05 per game if you play, so you should play this game.

### Example 2: rolling a fair die

cost to play: \$1.00 per die roll
payback:
 outcome you win 1 \$4 2 \$0 3 \$0 4 \$0 5 \$1 6 \$2

Should you play this game? Let's do the math to find the expected payback:

```E(1 die game) = p(1)*v(1) + p(2)*v(2) + p(3)*v(3) + p(4)*v(4) + p(5)*v(5) + p(6)*v(6)
E(1 die game) = 1/6 *\$4   + 1/6 *\$0   + 1/6 *\$0   + 1/6 *\$0   + 1/6 *\$1   + 1/6 *\$2
E(1 die game) = 1/6 * (\$4 + \$0 + \$0 + \$0 + \$1 + \$2)
E(1 die game) = 1/6 * \$7
E(1 die game) = \$1.16 (approximately)
```
This means that on average, you will win \$1.16 per game (approximately), netting a profit of \$0.16 per game. That's even better than the coin game. So you should play this game.

### Example 3: rolling two fair dice

Cost to play: \$1.00 per roll of two dice
The outcome here is the sum of the two dice.
Payback:
 outcome you win 2 \$3 3 \$0 4 \$2 5 \$0 ... \$0 11 \$0 12 \$3

Should you play this game? We can figure it out by expectation values. We know that there are 6*6=36 outcomes, and we can figure out the probability of each from this sum table:
 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12
Now that we can compute the probabilities for each outcome, we can find the expected payback:

```E(2 dice game) = p(2)*v(2) + p(3)*v(3) + p(4)*v(4) + p(5)*v(5) + ... + p(11)*v(11) + p(12)*v(12)
E(2 dice game) = 1/36*\$3   + 2/36*\$0   + 3/36*\$2   + 4/36 *\$0  + ... + 2/36 *\$0    + 1/36 *\$3
E(2 dice game) = 1/36 * (1*\$3 + 3*\$2 + 1*\$3)
E(2 dice game) = 1/36 * \$12
E(2 dice game) = \$0.33 (approximately)
```
In this game, you expect to get back only \$0.33 per game, and you have to pay \$1.00 to play. So you expect to lose \$0.67 per game. This is not a game you should play.

Note: sometimes we combine the cost of the game with the payback to compute a ``net'' game value by adding in the cost of the game in a net expected value calculation: the cost is simply a negative payback that occurs with probability 1.

## Security Efforts

Understanding the notion of weakest link (lec 3) and expectation values leads us back to how to do a costs-benefits analysis.

Attackers will aim for the weakest link in the security the breaking of which will let them get at what they're after, which will be the assets that you're protecting. How much effort -- in terms of money spent to buy security hardware or software, or person-hours spent in improving security -- should be applied to what areas? The goal should be to apply the security improvement resources to minimize the risk exposure for the system. Doing this requires estimating the exposure -- the probability that various attacks will occur (not the same as whether they'll succeed!) and the cost if the attacks succeed gives the expected losses for the various attacks -- and the effectiveness and cost of various security measures that might be taken.

None of this is easy. Let us consider just the problem of choosing the ``right'' anti-virus software.

## Choosing virus protection software

If you need to buy a software package to protect you from viruses, which package should you buy? What kind of criteria should you use? How can you know what is good if you are not an expert?
• You can rely on third-party (either expert or anecdotal) reviews of the software in question.
• You can check it yourself against known viruses.
• You can look at the company's track record for releasing software updates when new viruses are discovered.

Can we rely on reports like `Virus protection software A can identify X number of viruses, while package B can identify only Y number of viruses' (where X > Y)? Well, we might be able to, but it is important to be aware of how many viruses that are actually `in the wild' that each package can detect. If package A detects a large number of viruses that don't even occur except in research labs, then those numbers can be meaningless.

How quickly new in-the-wild viruses are identified and new virus definition databases released are extremely critical to the effectiveness of virus detection software. Most software of this type are not able to recognize new viruses, and require periodic updates in order to protect you against new threats. Between the time that a new virus starts spreading in the wild and when that update occurs, the probability of catching the virus could be quite high, and therefore making the expected damage or risk exposure high.

We might want to try to differentiate among viruses (esp new ones) according to how destructive their ``payload'' is. This certainly would estimate the value of damage that would occur if the virus ran unchecked better. Note, however, that computer viruses, unlike naturally occurring ones, undergoes human-directed ``evolution'': a virus that is very successful at propagating itself but has a relatively benign payload might easily be modified to carry a much more destructive payload. This is certainly easier for a virus author to do than to design a new virus from scratch. Thus, the existence of the first virus is often statistically correlated with the later introduction of the more destructive variant. Furthermore, the new variant might be detected by the virus detection software in the same way as the ancestral virus is detected, i.e., the recognition is done on parts of the viruses that are unchanged by the mutation. It sometimes makes sense to conservatively evaluate the destructive power of viruses by over-estimating the damage to try to factor in this uncertainty. (Properly this should be modelled in the risk evaluation as a hypothetical new virus that has a good probability of propagating at least as well as the original, and greater -- but yet unknown -- destructive capabilities, coupled with a higher than average probability of detection, which partially mitigates the potential for damage.)

Another thing to consider is how well the software is at avoiding false positives. A false positive occurs when the protection software thinks there is a virus, but it is wrong. This is another cost of using the software: when the detector goes off, the users of the computers will have to spend time determining that it was a false alarm, leading to a loss of productivity.

## Buffer overflow attacks

Buffer overflow attacks are the most common types of security attacks. The Internet Worm of 1988 infected systems in several ways, one of which is a buffer overflow attack. Buffer overflows can happen when using a language that is not type-safe, such as C or C++. For example, look at the following function definition in C:
```int f(int j) {
int i;
char buf[128];
...
gets(buf);
...
}
```
See the problem here? The function `gets()` does not know how much storage the variable `buf` has. This is because `buf` is actually passed as a pointer and not an array. So what can happen as a result of this?

To answer this, we have to know how the C/C++ stack works. It varies a bit by architecture and compiler, but in general we have a stack that grows down as more memory is pushed onto it. So we may have a stack that looks like this once we have gotten to `gets()`:

Stack frame:
 address size item 1000 4 argument j 996 4 return address of calling function 992 4 local variable i 988 128 local variable buf 860 (none) (stack pointer)

Next time we will investigate how this setup can cause problems.

[ search CSE | CSE | bsy's home page | links | webster | MRQE | google | yahoo | citeseer | certserver ]

bsy+cse127w02@cs.ucsd.edu, last updated Mon Mar 25 15:22:10 PST 2002. Copyright 2002 Bennet Yee.
email bsy.