Continuous Distributions

Mean

Variance

MomentGenerating Function

1 ; θ ≤ y ≤ θ2 θ2 − θ1 1

θ1 + θ2 2

(θ2 − θ1 )2 12

etθ2 − etθ1 t (θ2 − θ1 )

1 1 2 (y − µ) √ exp − 2σ 2 σ 2π −∞ < y < +∞

µ

σ2

β

β2

(1 − βt)−1

αβ

αβ 2

(1 − βt)−α

v

2v

(1−2t)−v/2

α α+β

(α + β) (α + β + 1)

Distribution

Uniform

Normal

Exponential

Probability Function f (y) =

f (y) =

f (y) =

Gamma

Chi-square

f (y) =

1 α−1 −y/β e ; α y (α)β 0 0, and P(A) < P(A|B), show that P(B) < P(B|A).

2.80

Suppose that A ⊂ B and that P(A) > 0 and P(B) > 0. Are A and B independent? Prove your answer.

2.81

Suppose that A and B are mutually exclusive events, with P(A) > 0 and P(B) < 1. Are A and B independent? Prove your answer.

2.82

Suppose that A ⊂ B and that P(A) > 0 and P(B) > 0. Show that P(B|A) = 1 and P(A|B) = P(A)/P(B).

2.83

If A and B are mutually exclusive events and P(B) > 0, show that P(A) . P(A|A ∪ B) = P(A) + P(B)

2.8 Two Laws of Probability The following two laws give the probabilities of unions and intersections of events. As such, they play an important role in the event-composition approach to the solution of probability problems. THEOREM 2.5

The Multiplicative Law of Probability The probability of the intersection of two events A and B is P(A ∩ B) = P(A)P(B|A) = P(B)P(A|B). If A and B are independent, then P(A ∩ B) = P(A)P(B).

Proof

The multiplicative law follows directly from Deﬁnition 2.9, the deﬁnition of conditional probability. Notice that the multiplicative law can be extended to ﬁnd the probability of the intersection of any number of events. Thus, twice applying Theorem 2.5, we obtain P(A ∩ B ∩ C) = P[(A ∩ B) ∩ C] = P(A ∩ B)P(C|A ∩ B) = P(A)P(B|A)P(C|A ∩ B). The probability of the intersection of any number of, say, k events can be obtained in the same manner: P(A1 ∩ A2 ∩ A3 ∩ · · · ∩ Ak ) = P(A1 )P(A2 |A1 )P(A3 |A1 ∩ A2 ) · · · P(Ak |A1 ∩ A2 ∩ · · · ∩ Ak−1 ). The additive law of probability gives the probability of the union of two events.

58

Chapter 2

Probability

THEOREM 2.6

The Additive Law of Probability The probability of the union of two events A and B is P(A ∪ B) = P(A) + P(B) − P(A ∩ B). If A and B are mutually exclusive events, P(A ∩ B) = 0 and P(A ∪ B) = P(A) + P(B).

Proof

The proof of the additive law can be followed by inspecting the Venn diagram in Figure 2.10. Notice that A ∪ B = A ∪ (A ∩ B), where A and (A ∩ B) are mutually exclusive events. Further, B = (A ∩ B) ∪ (A ∩ B), where (A ∩ B) and (A ∩ B) are mutually exclusive events. Then, by Axiom 3, P(A ∪ B) = P(A) + P(A ∩ B) and

P(B) = P(A ∩ B) + P(A ∩ B).

The equality given on the right implies that P( A ∩ B) = P(B) − P(A ∩ B). Substituting this expression for P(A ∩ B) into the expression for P(A ∪ B) given in the left-hand equation of the preceding pair, we obtain the desired result: P(A ∪ B) = P(A) + P(B) − P(A ∩ B). The probability of the union of three events can be obtained by making use of Theorem 2.6. Observe that P(A ∪ B ∪ C) = P[A ∪ (B ∪ C)] = P(A) + P(B ∪ C) − P[A ∩ (B ∪ C)] = P(A) + P(B) + P(C) − P(B ∩ C) − P[(A ∩ B) ∪ (A ∩ C)] = P(A) + P(B) + P(C) − P(B ∩ C) − P(A ∩ B) − P(A ∩ C) + P(A ∩ B ∩ C) because (A ∩ B) ∩ (A ∩ C) = A ∩ B ∩ C. Another useful result expressing the relationship between the probability of an event and its complement is immediately available from the axioms of probability. F I G U R E 2.10 Venn diagram for the union of A and B

A

B

Exercises

THEOREM 2.7

59

If A is an event, then P(A) = 1 − P(A). Observe that S = A ∪ A. Because A and A are mutually exclusive events, it follows that P(S) = P(A) + P(A). Therefore, P(A) + P(A) = 1 and the result follows.

Proof

As we will see in Section 2.9, it is sometimes easier to calculate P(A) than to calculate P(A). In such cases, it is easier to ﬁnd P(A) by the relationship P(A) = 1 − P(A) than to ﬁnd P(A) directly.

Exercises 2.84

If A1 , A2 , and A3 are three events and P(A1 ∩ A2 ) = P(A1 ∩ A3 ) =

0 but P(A2 ∩ A3 ) = 0, show that P(at least one Ai ) = P(A1 ) + P(A2 ) + P(A3 ) − 2P(A1 ∩ A2 ).

2.85

If A and B are independent events, show that A and B are also independent. Are A and B independent?

2.86

Suppose that A and B are two events such that P(A) = .8 and P(B) = .7. a b c d

2.87

Is it possible that P(A ∩ B) = .1? Why or why not? What is the smallest possible value for P(A ∩ B)? Is it possible that P(A ∩ B) = .77? Why or why not? What is the largest possible value for P(A ∩ B)?

Suppose that A and B are two events such that P(A) + P(B) > 1. a What is the smallest possible value for P(A ∩ B)? b What is the largest possible value for P(A ∩ B)?

2.88

Suppose that A and B are two events such that P(A) = .6 and P(B) = .3. a b c d

2.89

Is it possible that P(A ∩ B) = .1? Why or why not? What is the smallest possible value for P(A ∩ B)? Is it possible that P(A ∩ B) = .7? Why or why not? What is the largest possible value for P(A ∩ B)?

Suppose that A and B are two events such that P(A) + P(B) < 1. a What is the smallest possible value for P(A ∩ B)? b What is the largest possible value for P(A ∩ B)?

2.90

Suppose that there is a 1 in 50 chance of injury on a single skydiving attempt. a If we assume that the outcomes of different jumps are independent, what is the probability that a skydiver is injured if she jumps twice? b A friend claims if there is a 1 in 50 chance of injury on a single jump then there is a 100% chance of injury if a skydiver jumps 50 times. Is your friend correct? Why?

60

Chapter 2

Probability

2.91

Can A an B be mutually exclusive if P(A) = .4 and P(B) = .7? If P(A) = .4 and P(B) = .3? Why?

2.92

A policy requiring all hospital employees to take lie detector tests may reduce losses due to theft, but some employees regard such tests as a violation of their rights. Past experience indicates that lie detectors have accuracy rates that vary from 92% to 99%.2 To gain some insight into the risks that employees face when taking a lie detector test, suppose that the probability is .05 that a lie detector concludes that a person is lying who, in fact, is telling the truth and suppose that any pair of tests are independent. What is the probability that a machine will conclude that a b

2.93

Two events A and B are such that P(A) = .2, P(B) = .3, and P(A ∪ B) = .4. Find the following: a b c d

2.94

each of three employees is lying when all are telling the truth? at least one of the three employees is lying when all are telling the truth?

P(A ∩ B) P(A ∪ B) P(A ∩ B) P(A|B)

A smoke detector system uses two devices, A and B. If smoke is present, the probability that it will be detected by device A is .95; by device B, .90; and by both devices, .88. a If smoke is present, ﬁnd the probability that the smoke will be detected by either device A or B or both devices. b Find the probability that the smoke will be undetected.

2.95

In a game, a participant is given three attempts to hit a ball. On each try, she either scores a hit, H , or a miss, M. The game requires that the player must alternate which hand she uses in successive attempts. That is, if she makes her ﬁrst attempt with her right hand, she must use her left hand for the second attempt and her right hand for the third. Her chance of scoring a hit with her right hand is .7 and with her left hand is .4. Assume that the results of successive attempts are independent and that she wins the game if she scores at least two hits in a row. If she makes her ﬁrst attempt with her right hand, what is the probability that she wins the game?

2.96

If A and B are independent events with P(A) = .5 and P(B) = .2, ﬁnd the following: a b c

2.97

P(A ∪ B) P(A ∩ B) P(A ∪ B)

Consider the following portion of an electric circuit with three relays. Current will ﬂow from point a to point b if there is at least one closed path when the relays are activated. The relays may malfunction and not close when activated. Suppose that the relays act independently of one another and close properly when activated, with a probability of .9. a What is the probability that current will ﬂow when the relays are activated? b Given that current ﬂowed when the relays were activated, what is the probability that relay 1 functioned? c 1980 Sentinel Communications Co. All rights reserved. 2. Source: Copyright

Exercises

61

1 2 A

B 3

2.98

With relays operating as in Exercise 2.97, compare the probability of current ﬂowing from a to b in the series system shown A

1

2

B

with the probability of ﬂow in the parallel system shown. 1 A

B 2

2.99

Suppose that A and B are independent events such that the probability that neither occurs is a 1−b−a and the probability of B is b. Show that P(A) = . 1−b

*2.100

Show that Theorem 2.6, the additive law of probability, holds for conditional probabilities. That is, if A, B, and C are events such that P(C) > 0, prove that P(A ∪ B|C) = P(A|C) + P(B|C)− P(A∩B|C). [Hint: Make use of the distributive law (A∪B)∩C = (A∩C)∪(B∩C).]

2.101

Articles coming through an inspection line are visually inspected by two successive inspectors. When a defective article comes through the inspection line, the probability that it gets by the ﬁrst inspector is .1. The second inspector will “miss” ﬁve out of ten of the defective items that get past the ﬁrst inspector. What is the probability that a defective item gets by both inspectors?

2.102

Diseases I and II are prevalent among people in a certain population. It is assumed that 10% of the population will contract disease I sometime during their lifetime, 15% will contract disease II eventually, and 3% will contract both diseases. a Find the probability that a randomly chosen person from this population will contract at least one disease. b Find the conditional probability that a randomly chosen person from this population will contract both diseases, given that he or she has contracted at least one disease.

2.103

Refer to Exercise 2.50. Hours after the rigging of the Pennsylvania state lottery was announced, Connecticut state lottery ofﬁcials were stunned to learn that their winning number for the day was 666 (Los Angeles Times, September 21, 1980). a

b

All evidence indicates that the Connecticut selection of 666 was due to pure chance. What is the probability that a 666 would be drawn in Connecticut, given that a 666 had been selected in the April 24, 1980, Pennsylvania lottery? What is the probability of drawing a 666 in the April 24, 1980, Pennsylvania lottery (remember, this drawing was rigged) and a 666 in the September 19, 1980, Connecticut lottery?

62

Chapter 2

Probability

2.104

If A and B are two events, prove that P(A ∩ B) ≥ 1 − P(A)− P(B). [Note: This is a simpliﬁed version of the Bonferroni inequality.]

2.105

If the probability of injury on each individual parachute jump is .05, use the result in Exercise 2.104 to provide a lower bound for the probability of landing safely on both of two jumps.

2.106

If A and B are equally likely events and we require that the probability of their intersection be at least .98, what is P(A)?

2.107

Let A, B, and C be events such that P(A) > P(B) and P(C) > 0. Construct an example to demonstrate that it is possible that P(A|C) < P(B|C).

2.108

If A, B, and C are three events, use two applications of the result in Exercise 2.104 to prove that P(A ∩ B ∩ C) ≥ 1 − P( A) − P(B) − P(C).

2.109

If A, B, and C are three equally likely events, what is the smallest value for P(A) such that P(A ∩ B ∩ C) always exceeds 0.95?

2.9 Calculating the Probability of an Event: The Event-Composition Method We learned in Section 2.4 that sets (events) can often be expressed as unions, intersections, or complements of other sets. The event-composition method for calculating the probability of an event, A, expresses A as a composition involving unions and/or intersections of other events. The laws of probability are then applied to ﬁnd P(A). We will illustrate this method with an example.

E X A M PL E 2.17

Of the voters in a city, 40% are Republicans and 60% are Democrats. Among the Republicans 70% are in favor of a bond issue, whereas 80% of the Democrats favor the issue. If a voter is selected at random in the city, what is the probability that he or she will favor the bond issue?

Solution

Let F denote the event “favor the bond issue,” R the event “a Republican is selected,” and D the event “a Democrat is selected.” Then P(R) = .4, P(D) = .6, P(F|R) = .7, and P(F|D) = .8. Notice that P(F) = P[(F ∩ R) ∪ (F ∩ D)] = P(F ∩ R) + P(F ∩ D) because (F ∩ R) and (F ∩ D) are mutually exclusive events. Figure 2.11 will help you visualize the result that F = (F ∩ R) ∪ (F ∩ D). Now P(F ∩ R) = P(F|R)P(R) = (.7)(.4) = .28, It follows that

P(F ∩ D) = P(F|D)P(D) = (.8)(.6) = .48. P(F) = .28 + .48 = .76.

2.9

F I G U R E 2.11 Venn diagram for events of Example 2.17

Calculating the Probability of an Event: The Event-Composition Method

63

S

R

D F傽R

F傽D F

EXAMPLE 2.18

In Example 2.7 we considered an experiment wherein the birthdays of 20 randomly selected persons were recorded. Under certain conditions we found that P(A) = .5886, where A denotes the event that each person has a different birthday. Let B denote the event that at least one pair of individuals share a birthday. Find P(B).

Solution

The event B is the set of all sample points in S that are not in A, that is, B = A. Therefore, P(B) = 1 − P(A) = 1 − .5886 = .4114. (Most would agree that this probability is surprisingly high!)

Let us refer to Example 2.4, which involves the two tennis players, and let D1 and D2 denote the events that player A wins the ﬁrst and second games, respectively. The information given in the example implies that P(D1 ) = P(D2 ) = 2/3. Further, if we make the assumption that D1 and D2 are independent, it follows that P(D1 ∩ D2 ) = 2/3 × 2/3 = 4/9. In that example we identiﬁed the simple event E 1 , which we denoted A A, as meaning that player A won both games. With the present notation, E 1 = D1 ∩ D2 , and thus P(E 1 ) = 4/9. The probabilities assigned to the other simple events in Example 2.4 can be veriﬁed in a similar manner. The event-composition approach will not be successful unless the probabilities of the events that appear in P(A) (after the additive and multiplicative laws have been applied) are known. If one or more of these probabilities is unknown, the method fails. Often it is desirable to form compositions of mutually exclusive or independent events. Mutually exclusive events simplify the use of the additive law and the multiplicative law of probability is easier to apply to independent events.

64

Chapter 2

Probability

A summary of the steps used in the event-composition method follows: 1. Deﬁne the experiment. 2. Visualize the nature of the sample points. Identify a few to clarify your thinking. 3. Write an equation expressing the event of interest—say, A—as a composition of two or more events, using unions, intersections, and/or complements. (Notice that this equates point sets.) Make certain that event A and the event implied by the composition represent the same set of sample points. 4. Apply the additive and multiplicative laws of probability to the compositions obtained in step 3 to ﬁnd P(A). Step 3 is the most difﬁcult because we can form many compositions that will be equivalent to event A. The trick is to form a composition in which all the probabilities appearing in step 4 are known. The event-composition approach does not require listing the sample points in S, but it does require a clear understanding of the nature of a typical sample point. The major error students tend to make in applying the event-composition approach occurs in writing the composition. That is, the point-set equation that expresses A as union and/or intersection of other events is frequently incorrect. Always test your equality to make certain that the composition implies an event that contains the same set of sample points as those in A. A comparison of the sample-point and event-composition methods for calculating the probability of an event can be obtained by applying both methods to the same problem. We will apply the event-composition approach to the problem of selecting applicants that was solved by the sample-point method in Examples 2.11 and 2.12. E X A M PL E 2.19 Solution

Two applicants are randomly selected from among ﬁve who have applied for a job. Find the probability that exactly one of the two best applicants is selected, event A. Deﬁne the following two events: B: Draw the best and one of the three poorest applicants. C: Draw the second best and one of the three poorest applicants. Events B and C are mutually exclusive and A = B ∪ C. Also, let D1 = B1 ∩ B2 , where B1 = Draw the best on the ﬁrst draw, B2 = Draw one of the three poorest applicants on the second draw, and D2 = B3 ∩ B4 , where B3 = Draw one of the three poorest applicants on the ﬁrst draw, B4 = Draw the best on the second draw. Note that B = D1 ∪ D2 .

2.9

Calculating the Probability of an Event: The Event-Composition Method

65

Similarly, let G 1 = C1 ∩ C2 and G 2 = C3 ∩ C4 , where C1 , C2 , C3 , and C4 are deﬁned like B1 , B2 , B3 , and B4 , with the words second best replacing best. Notice that D1 and D2 and G 1 and G 2 are pairs of mutually exclusive events and that A = B ∪ C = (D1 ∪ D2 ) ∪ (G 1 ∪ G 2 ), A = (B1 ∩ B2 ) ∪ (B3 ∩ B4 ) ∪ (C1 ∩ C2 ) ∪ (C3 ∩ C4 ). Applying the additive law of probability to these four mutually exclusive events, we have P(A) = P(B1 ∩ B2 ) + P(B3 ∩ B4 ) + P(C1 ∩ C2 ) + P(C3 ∩ C4 ). Applying the multiplicative law, we have P(B1 ∩ B2 ) = P(B1 )P(B2 |B1 ). The probability of drawing the best on the ﬁrst draw is P(B1 ) = 1/5. Similarly, the probability of drawing one of the three poorest on the second draw, given that the best was drawn on the ﬁrst selection, is P(B2 |B1 ) = 3/4. Then P(B1 ∩ B2 ) = P(B1 )P(B2 |B1 ) = (1/5)(3/4) = 3/20. The probabilities of all other intersections in P(A), P(B3 ∩ B4 ), P(C1 ∩ C2 ), and P(C3 ∩ C4 ) are obtained in exactly the same manner, and all equal 3/20. Then P(A) = P(B1 ∩ B2 ) + P(B3 ∩ B4 ) + P(C1 ∩ C2 ) + P(C3 ∩ C4 ) = (3/20) + (3/20) + (3/20) + (3/20) = 3/5. This answer is identical to that obtained in Example 2.12, where P(A) was calculated by using the sample-point approach.

EXAMPLE 2.20

Solution

It is known that a patient with a disease will respond to treatment with probability equal to .9. If three patients with the disease are treated and respond independently, ﬁnd the probability that at least one will respond. Deﬁne the following events: A: At least one of the three patients will respond. B1 : The ﬁrst patient will not respond. B2 : The second patient will not respond. B3 : The third patient will not respond.

66

Chapter 2

Probability

Then observe that A = B1 ∩ B2 ∩ B3 . Theorem 2.7 implies that P(A) = 1 − P(A) = 1 − P(B1 ∩ B2 ∩ B3 ). Applying the multiplicative law, we have P(B1 ∩ B2 ∩ B3 ) = P(B1 )P(B2 |B1 )P(B3 |B1 ∩ B2 ), where, because the events are independent, P(B2 |B1 ) = P(B2 ) = 0.1

and

P(B3 |B1 ∩ B2 ) = P(B3 ) = 0.1.

Substituting P(Bi ) = .1, i = 1, 2, 3, we obtain P(A) = 1 − (.1)3 = .999. Notice that we have demonstrated the utility of complementary events. This result is important because frequently it is easier to ﬁnd the probability of the complement, P(A), than to ﬁnd P(A) directly.

E X A M PL E 2.21

Observation of a waiting line at a medical clinic indicates the probability that a new arrival will be an emergency case is p = 1/6. Find the probability that the r th patient is the ﬁrst emergency case. (Assume that conditions of arriving patients represent independent events.)

Solution

The experiment consists of watching patient arrivals until the ﬁrst emergency case appears. Then the sample points for the experiment are E i : The ith patient is the ﬁrst emergency case, for i = 1, 2, . . . . Because only one sample point falls in the event of interest, P(r th patient is the ﬁrst emergency case ) = P(Er ). Now deﬁne Ai to denote the event that the ith arrival is not an emergency case. Then we can represent Er as the intersection Er = A1 ∩ A2 ∩ A3 ∩ · · · ∩ Ar −1 ∩ Ar . Applying the multiplicative law, we have P(Er ) = P(A1 )P(A2 |A1 )P(A3 |A1 ∩ A2 ) · · · P(Ar |A1 ∩ · · · ∩ Ar −1 ), and because the events A1 , A2 , . . . , Ar −1 , and Ar are independent, it follows that P(Er ) = P(A1 )P(A2 ) · · · P(Ar −1 )P(Ar ) = (1 − p)r −1 p = (5/6)r −1 (1/6),

r = 1, 2, 3, . . . .

2.9

Calculating the Probability of an Event: The Event-Composition Method

67

Notice that P(S) = P(E 1 ) + P(E 2 ) + P(E 3 ) + · · · + P(E i ) + · · · = (1/6) + (5/6)(1/6) + (5/6)2 (1/6) + · · · + (5/6)i−1 (1/6) + · · · ∞ i 1 5 1/6 = = = 1. 6 i=0 6 1 − (5/6) This result follows from the formula for the sum of a geometric ∞ i series1 given in Appendix A1.11. This formula, which states that if |r | < 1, i=0 r = 1−r , is useful in many simple probability problems.

EXAMPLE 2.22

A monkey is to demonstrate that she recognizes colors by tossing one red, one black, and one white ball into boxes of the same respective colors, one ball to a box. If the monkey has not learned the colors and merely tosses one ball into each box at random, ﬁnd the probabilities of the following results: a There are no color matches. b There is exactly one color match.

Solution

This problem can be solved by listing sample points because only three balls are involved, but a more general method will be illustrated. Deﬁne the following events: A1 : A color match occurs in the red box. A2 : A color match occurs in the black box. A3 : A color match occurs in the white box. There are 3! = 6 equally likely ways of randomly tossing the balls into the boxes with one ball in each box. Also, there are only 2! = 2 ways of tossing the balls into the boxes if one particular box is required to have a color match. Hence, P(A1 ) = P(A2 ) = P(A3 ) = 2/6 = 1/3. Similarly, it follows that P(A1 ∩ A2 ) = P(A1 ∩ A3 ) = P(A2 ∩ A3 ) = P(A1 ∩ A2 ∩ A3 ) = 1/6. We can now answer parts (a) and (b) by using the event-composition method. a Notice that P(no color matches) = 1 − P(at least one color match) = 1 − P(A1 ∪ A2 ∪ A3 ) = 1 − [P(A1 ) + P(A2 ) + P(A3 ) − P(A1 ∩ A2 ) − P(A1 ∩ A3 ) − P(A2 ∩ A3 ) + P(A1 ∩ A2 ∩ A3 )] = 1 − [3(1/3) − 3(1/6) + (1/6)] = 2/6 = 1/3.

68

Chapter 2

Probability

b We leave it to you to show that P(exactly one match) = P(A1 ) + P(A2 ) + P(A3 ) − 2[P(A1 ∩ A2 ) + P(A1 ∩ A3 ) + P(A2 ∩ A3 )] + 3[P(A1 ∩ A2 ∩ A3 )] = (3)(1/3) − (2)(3)(1/6) + (3)(1/6) = 1/2.

The best way to learn how to solve probability problems is to learn by doing. To assist you in developing your skills, many exercises are provided at the end of this section, at the end of the chapter, and in the references.

Exercises 2.110

Of the items produced daily by a factory, 40% come from line I and 60% from line II. Line I has a defect rate of 8%, whereas line II has a defect rate of 10%. If an item is chosen at random from the day’s production, ﬁnd the probability that it will not be defective.

2.111

An advertising agency notices that approximately 1 in 50 potential buyers of a product sees a given magazine ad, and 1 in 5 sees a corresponding ad on television. One in 100 sees both. One in 3 actually purchases the product after seeing the ad, 1 in 10 without seeing it. What is the probability that a randomly selected potential customer will purchase the product?

2.112

Three radar sets, operating independently, are set to detect any aircraft ﬂying through a certain area. Each set has a probability of .02 of failing to detect a plane in its area. If an aircraft enters the area, what is the probability that it a goes undetected? b is detected by all three radar sets?

2.113

Consider one of the radar sets of Exercise 2.112. What is the probability that it will correctly detect exactly three aircraft before it fails to detect one, if aircraft arrivals are independent single events occurring at different times?

2.114

A lie detector will show a positive reading (indicate a lie) 10% of the time when a person is telling the truth and 95% of the time when the person is lying. Suppose two people are suspects in a one-person crime and (for certain) one is guilty and will lie. Assume further that the lie detector operates independently for the truthful person and the liar. What is the probability that the detector a shows a positive reading for both suspects? b shows a positive reading for the guilty suspect and a negative reading for the innocent suspect? c is completely wrong—that is, that it gives a positive reading for the innocent suspect and a negative reading for the guilty? d gives a positive reading for either or both of the two suspects?

Exercises

2.115

69

A state auto-inspection station has two inspection teams. Team 1 is lenient and passes all automobiles of a recent vintage; team 2 rejects all autos on a ﬁrst inspection because their “headlights are not properly adjusted.” Four unsuspecting drivers take their autos to the station for inspection on four different days and randomly select one of the two teams. a If all four cars are new and in excellent condition, what is the probability that three of the four will be rejected? b What is the probability that all four will pass?

2.116

A communications network has a built-in safeguard system against failures. In this system if line I fails, it is bypassed and line II is used. If line II also fails, it is bypassed and line III is used. The probability of failure of any one of these three lines is .01, and the failures of these lines are independent events. What is the probability that this system of three lines does not completely fail?

2.117

A football team has a probability of .75 of winning when playing any of the other four teams in its conference. If the games are independent, what is the probability the team wins all its conference games?

2.118

An accident victim will die unless in the next 10 minutes he receives some type A, Rh-positive blood, which can be supplied by a single donor. The hospital requires 2 minutes to type a prospective donor’s blood and 2 minutes to complete the transfer of blood. Many untyped donors are available, and 40% of them have type A, Rh-positive blood. What is the probability that the accident victim will be saved if only one blood-typing kit is available? Assume that the typing kit is reusable but can process only one donor at a time.

*2.119

Suppose that two balanced dice are tossed repeatedly and the sum of the two uppermost faces is determined on each toss. What is the probability that we obtain a a sum of 3 before we obtain a sum of 7? b a sum of 4 before we obtain a sum of 7?

2.120

Suppose that two defective refrigerators have been included in a shipment of six refrigerators. The buyer begins to test the six refrigerators one at a time. a What is the probability that the last defective refrigerator is found on the fourth test? b What is the probability that no more than four refrigerators need to be tested to locate both of the defective refrigerators? c When given that exactly one of the two defective refrigerators has been located in the ﬁrst two tests, what is the probability that the remaining defective refrigerator is found in the third or fourth test?

2.121

A new secretary has been given n computer passwords, only one of which will permit access to a computer ﬁle. Because the secretary has no idea which password is correct, he chooses one of the passwords at random and tries it. If the password is incorrect, he discards it and randomly selects another password from among those remaining, proceeding in this manner until he ﬁnds the correct password. a What is the probability that he obtains the correct password on the ﬁrst try? b What is the probability that he obtains the correct password on the second try? The third try? c A security system has been set up so that if three incorrect passwords are tried before the correct one, the computer ﬁle is locked and access to it denied. If n = 7, what is the probability that the secretary will gain access to the ﬁle?

70

Chapter 2

Probability

2.10 The Law of Total Probability and Bayes’ Rule The event-composition approach to solving probability problems is sometimes facilitated by viewing the sample space, S, as a union of mutually exclusive subsets and using the following law of total probability. The results of this section are based on the following construction. DEFINITION 2.11

For some positive integer k, let the sets B1 , B2 , . . . , Bk be such that 1. S = B1 ∪ B2 ∪ · · · ∪ Bk . 2. Bi ∩ B j = ∅, for i =

j. Then the collection of sets {B1 , B2 , . . . , Bk } is said to be a partition of S. If A is any subset of S and {B1 , B2 , . . . , Bk } is a partition of S, A can be decomposed as follows: A = (A ∩ B1 ) ∪ (A ∩ B2 ) ∪ · · · ∪ (A ∩ Bk ). Figure 2.12 illustrates this decomposition for k = 3.

THEOREM 2.8

Assume that {B1 , B2 , . . . , Bk } is a partition of S (see Deﬁnition 2.11) such that P(Bi ) > 0, for i = 1, 2, . . . , k. Then for any event A P(A) =

k

P(A|Bi )P(Bi ).

i=1

Proof

Any subset A of S can be written as A = A ∩ S = A ∩ (B1 ∪ B2 ∪ · · · ∪ Bk ) = (A ∩ B1 ) ∪ (A ∩ B2 ) ∪ · · · ∪ (A ∩ Bk ). Notice that, because {B1 , B2 , · · · , Bk } is a partition of S, if i =

j, (A ∩ Bi ) ∩ (A ∩ B j ) = A ∩ (Bi ∩ B j ) = A ∩ ∅ = ∅ and that (A ∩ Bi ) and (A ∩ B j ) are mutually exclusive events. Thus, P(A) = P(A ∩ B1 ) + P(A ∩ B2 ) + · · · + P(A ∩ Bk ) = P(A|B1 )P(B1 ) + P(A|B2 )P(B2 ) + · · · + P(A|Bk )P(Bk ) =

k

P(A|Bi )P(Bi ).

i=1

In the examples and exercises that follow, you will see that it is sometimes much easier to calculate the conditional probabilities P(A|Bi ) for suitably chosen Bi than it is to compute P(A) directly. In such cases, the law of total probability can be applied

2.10

F I G U R E 2.12 Decomposition of event A

The Law of Total Probability and Bayes’ Rule

71

S A 艚 B1

A 艚 B2

A 艚 B3

A B1

B3

B2

to determine P(A). Using the result of Theorem 2.8, it is a simple matter to derive the result known as Bayes’ rule. THEOREM 2.9

Bayes’ Rule Assume that {B1 , B2 , . . . , Bk } is a partition of S (see Deﬁnition 2.11) such that P(Bi ) > 0, for i = 1, 2, . . . , k. Then P(B j |A) =

P(A|B j )P(B j ) k

.

P(A|Bi )P(Bi )

i=1

Proof

The proof follows directly from the deﬁnition of conditional probability and the law of total probability. Note that P(B j |A) =

P(A|B j )P(B j ) P(A ∩ B j ) . = k P(A) P(A|Bi )P(Bi ) i=1

EXAMPLE 2.23

An electronic fuse is produced by ﬁve production lines in a manufacturing operation. The fuses are costly, are quite reliable, and are shipped to suppliers in 100-unit lots. Because testing is destructive, most buyers of the fuses test only a small number of fuses before deciding to accept or reject lots of incoming fuses. All ﬁve production lines produce fuses at the same rate and normally produce only 2% defective fuses, which are dispersed randomly in the output. Unfortunately, production line 1 suffered mechanical difﬁculty and produced 5% defectives during the month of March. This situation became known to the manufacturer after the fuses had been shipped. A customer received a lot produced in March and tested three fuses. One failed. What is the probability that the lot was produced on line 1? What is the probability that the lot came from one of the four other lines?

Solution

Let B denote the event that a fuse was drawn from line 1 and let A denote the event that a fuse was defective. Then it follows directly that P(B) = 0.2

and

P(A|B) = 3(.05)(.95)2 = .135375.

72

Chapter 2

Probability

F I G U R E 2.13 Tree diagram for calculations in Example 2.23. ∼ A and ∼ B are alternative notations for A and B, respectively.

A

0.0271

54 .13

0 B

0.86

000

46

0.2

~A 0.1729

P(B|A) = 0.0271 / (0.0271 + 0.0461) = 0.3700 A

0.80

00

0.0461

6 057

0. ~B

0.94

24

~A 0.7539

Similarly, P(B) = 0.8

and

P(A|B) = 3(.02)(.98)2 = .057624.

Note that these conditional probabilities were very easy to calculate. Using the law of total probability, P(A) = P(A|B)P(B) + P(A|B)P(B) = (.135375)(.2) + (.057624)(.8) = .0731742. Finally, P(B|A) =

P(B ∩ A) P(A|B)P(B) (.135375)(.2) = = = .37, P(A) P(A) .0731742

and P(B|A) = 1 − P(B|A) = 1 − .37 = .63. Figure 2.13, obtained using the applet Bayes’ Rule as a Tree, illustrates the various steps in the computation of P(B|A) .

Exercises 2.122

Applet Exercise Use the applet Bayes’ Rule as a Tree to obtain the results given in Figure 2.13.

2.123

Applet Exercise Refer to Exercise 2.122 and Example 2.23. Suppose that lines 2 through 5 remained the same, but line 1 was partially repaired and produced a smaller percentage of defects.

Exercises

73

a What impact would this have on P(A|B)? b Suppose that P(A|B) decreased to .12 and all other probabilities remained unchanged. Use the applet Bayes’ Rule as a Tree to re-evaluate P(B|A). c How does the answer you obtained in part (b) compare to that obtained in Exercise 2.122? Are you surprised by this result? d Assume that all probabilities remain the same except P(A|B). Use the applet and trial and error to ﬁnd the value of P(A|B) for which P(B|A) = .3000. e If line 1 produces only defective items but all other probabilities remain unchanged, what is P(B|A)? f A friend expected the answer to part (e) to be 1. Explain why, under the conditions of part (e), P(B|A) =

1.

2.124

A population of voters contains 40% Republicans and 60% Democrats. It is reported that 30% of the Republicans and 70% of the Democrats favor an election issue. A person chosen at random from this population is found to favor the issue in question. Find the conditional probability that this person is a Democrat.

2.125

A diagnostic test for a disease is such that it (correctly) detects the disease in 90% of the individuals who actually have the disease. Also, if a person does not have the disease, the test will report that he or she does not have it with probability .9. Only 1% of the population has the disease in question. If a person is chosen at random from the population and the diagnostic test indicates that she has the disease, what is the conditional probability that she does, in fact, have the disease? Are you surprised by the answer? Would you call this diagnostic test reliable?

2.126

Applet Exercise Refer to Exercise 2.125. The probability that the test detects the disease given that the patient has the disease is called the sensitivity of the test. The speciﬁcity of the test is the probability that the test indicates no disease given that the patient is disease free. The positive predictive value of the test is the probability that the patient has the disease given that the test indicates that the disease is present. In Exercise 2.125, the disease in question was relatively rare, occurring with probability .01, and the test described has sensitivity = speciﬁcity = .90 and positive predictive value = .0833. a In an effort to increase the positive predictive value of the test, the sensitivity was increased to .95 and the speciﬁcity remained at .90, what is the positive predictive value of the “improved” test? b Still not satisﬁed with the positive predictive value of the procedure, the sensitivity of the test is increased to .999. What is the positive predictive value of the (now twice) modiﬁed test if the speciﬁcity stays at .90? c Look carefully at the various numbers that were used to compute the positive predictive value of the tests. Why are all of the positive predictive values so small? [Hint: Compare the size of the numerator and the denominator used in the fraction that yields the value of the positive predictive value. Why is the denominator so (relatively) large?] d The proportion of individuals with the disease is not subject to our control. If the sensitivity of the test is .90, is it possible that the positive predictive value of the test can be increased to a value above .5? How? [Hint: Consider improving the speciﬁcity of the test.] e Based on the results of your calculations in the previous parts, if the disease in question is relatively rare, how can the positive predictive value of a diagnostic test be signiﬁcantly increased?

2.127

Applet Exercise Refer to Exercises 2.125 and 2.126. Suppose now that the disease is not particularly rare and occurs with probability .4 .

74

Chapter 2

Probability

a If, as in Exercise 2.125, a test has sensitivity = speciﬁcity = .90, what is the positive predictive value of the test? b Why is the value of the positive predictive value of the test so much higher that the value obtained in Exercise 2.125? [Hint: Compare the size of the numerator and the denominator used in the fraction that yields the value of the positive predictive value.] c If the speciﬁcity of the test remains .90, can the sensitivity of the test be adjusted to obtain a positive predictive value above .87? d If the sensitivity remains at .90, can the speciﬁcity be adjusted to obtain a positive predictive value above .95? How? e The developers of a diagnostic test want the test to have a high positive predictive value. Based on your calculations in previous parts of this problem and in Exercise 2.126, is the value of the speciﬁcity more or less critical when developing a test for a rarer disease?

2.128

A plane is missing and is presumed to have equal probability of going down in any of three regions. If a plane is actually down in region i, let 1 − αi denote the probability that the plane will be found upon a search of the ith region, i = 1, 2, 3. What is the conditional probability that the plane is in a region 1, given that the search of region 1 was unsuccessful? b region 2, given that the search of region 1 was unsuccessful? c region 3, given that the search of region 1 was unsuccessful?

2.129

Males and females are observed to react differently to a given set of circumstances. It has been observed that 70% of the females react positively to these circumstances, whereas only 40% of males react positively. A group of 20 people, 15 female and 5 male, was subjected to these circumstances, and the subjects were asked to describe their reactions on a written questionnaire. A response picked at random from the 20 was negative. What is the probability that it was that of a male?

2.130

A study of Georgia residents suggests that those who worked in shipyards during World War II were subjected to a signiﬁcantly higher risk of lung cancer (Wall Street Journal, September 21, 1978).3 It was found that approximately 22% of those persons who had lung cancer worked at some prior time in a shipyard. In contrast, only 14% of those who had no lung cancer worked at some prior time in a shipyard. Suppose that the proportion of all Georgians living during World War II who have or will have contracted lung cancer is .04%. Find the percentage of Georgians living during the same period who will contract (or have contracted) lung cancer, given that they have at some prior time worked in a shipyard.

2.131

The symmetric difference between two events A and B is the set of all sample points that are in exactly one of the sets and is often denoted A B. Note that A B = (A ∩ B) ∪ (A ∩ B). Prove that P(A B) = P(A) + P(B) − 2P(A ∩ B).

2.132

Use Theorem 2.8, the law of total probability, to prove the following: a If P(A|B) = P(A|B), then A and B are independent. b If P(A|C) > P(B|C) and P(A|C) > P(B|C), then P(A) > P(B).

2.133

A student answers a multiple-choice examination question that offers four possible answers. Suppose the probability that the student knows the answer to the question is .8 and the probability that the student will guess is .2. Assume that if the student guesses, the probability of c Dow Jones & Company, Inc. 1981. All rights reserved worldwide. 3. Source: Wall Street Journal,

2.11

Numerical Events and Random Variables

75

selecting the correct answer is .25. If the student correctly answers a question, what is the probability that the student really knew the correct answer?

2.134

Two methods, A and B, are available for teaching a certain industrial skill. The failure rate is 20% for A and 10% for B. However, B is more expensive and hence is used only 30% of the time. (A is used the other 70%.) A worker was taught the skill by one of the methods but failed to learn it correctly. What is the probability that she was taught by method A?

2.135

Of the travelers arriving at a small airport, 60% ﬂy on major airlines, 30% ﬂy on privately owned planes, and the remainder ﬂy on commercially owned planes not belonging to a major airline. Of those traveling on major airlines, 50% are traveling for business reasons, whereas 60% of those arriving on private planes and 90% of those arriving on other commercially owned planes are traveling for business reasons. Suppose that we randomly select one person arriving at this airport. What is the probability that the person a b c d

is traveling on business? is traveling for business on a privately owned plane? arrived on a privately owned plane, given that the person is traveling for business reasons? is traveling on business, given that the person is ﬂying on a commercially owned plane?

2.136

A personnel director has two lists of applicants for jobs. List 1 contains the names of ﬁve women and two men, whereas list 2 contains the names of two women and six men. A name is randomly selected from list 1 and added to list 2. A name is then randomly selected from the augmented list 2. Given that the name selected is that of a man, what is the probability that a woman’s name was originally selected from list 1?

2.137

Five identical bowls are labeled 1, 2, 3, 4, and 5. Bowl i contains i white and 5 − i black balls, with i = 1, 2, . . . , 5. A bowl is randomly selected and two balls are randomly selected (without replacement) from the contents of the bowl. a What is the probability that both balls selected are white? b Given that both balls selected are white, what is the probability that bowl 3 was selected?

*2.138

Following is a description of the game of craps. A player rolls two dice and computes the total of the spots showing. If the player’s ﬁrst toss is a 7 or an 11, the player wins the game. If the ﬁrst toss is a 2, 3, or 12, the player loses the game. If the player rolls anything else (4, 5, 6, 8, 9 or 10) on the ﬁrst toss, that value becomes the player’s point. If the player does not win or lose on the ﬁrst toss, he tosses the dice repeatedly until he obtains either his point or a 7. He wins if he tosses his point before tossing a 7 and loses if he tosses a 7 before his point. What is the probability that the player wins a game of craps? [Hint: Recall Exercise 2.119.]

2.11 Numerical Events and Random Variables Events of major interest to the scientist, engineer, or businessperson are those identiﬁed by numbers, called numerical events. The research physician is interested in the event that ten of ten treated patients survive an illness; the businessperson is interested in the event that sales next year will reach $5 million. Let Y denote a variable to be measured in an experiment. Because the value of Y will vary depending on the outcome of the experiment, it is called a random variable. To each point in the sample space we will assign a real number denoting the value of the variable Y . The value assigned to Y will vary from one sample point to another,

76

Chapter 2

Probability

F I G U R E 2.14 Partitioning S into subsets that deﬁne the events Y = 0, 1, 2, 3, and 4

2

3

0 4

1

S

but some points may be assigned the same numerical value. Thus, we have deﬁned a variable that is a function of the sample points in S, and {all sample points where Y = a} is the numerical event assigned the number a. Indeed, the sample space S can be partitioned into subsets so that points within a subset are all assigned the same value of Y . These subsets are mutually exclusive since no point is assigned two different numerical values. The partitioning of S is symbolically indicated in Figure 2.14 for a random variable that can assume values 0, 1, 2, 3, and 4.

DEFINITION 2.12

A random variable is a real-valued function for which the domain is a sample space.

E X A M PL E 2.24

Deﬁne an experiment as tossing two coins and observing the results. Let Y equal the number of heads obtained. Identify the sample points in S, assign a value of Y to each sample point, and identify the sample points associated with each value of the random variable Y .

Solution

Let H and T represent head and tail, respectively; and let an ordered pair of symbols identify the outcome for the ﬁrst and second coins. (Thus, H T implies a head on the ﬁrst coin and a tail on the second.) Then the four sample points in S are E 1: H H, E 2: H T, E 3 : T H and E 4 : T T . The values of Y assigned to the sample points depend on the number of heads associated with each point. For E 1 : H H , two heads were observed, and E 1 is assigned the value Y = 2. Similarly, we assign the values Y = 1 to E 2 and E 3 and Y = 0 to E 4 . Summarizing, the random variable Y can take three values, Y = 0, 1, and 2, which are events deﬁned by speciﬁc collections of sample points: {Y = 0} = {E 4 }, {Y = 1} = {E 2 , E 3 }, {Y = 2} = {E 1 }.

Let y denote an observed value of the random variable Y . Then P(Y = y) is the sum of the probabilities of the sample points that are assigned the value y.

2.12

EXAMPLE 2.25 Solution

Random Sampling

77

Compute the probabilities for each value of Y in Example 2.24. The event {Y = 0} results only from sample point E 4 . If the coins are balanced, the sample points are equally likely; hence, P(Y = 0) = P(E 4 ) = 1/4. Similarly, P(Y = 1) = P(E 2 ) + P(E 3 ) = 1/2

and

P(Y = 2) = P(E 1 ) = 1/4.

A more detailed examination of random variables will be undertaken in the next two chapters.

Exercises 2.139

Refer to Exercise 2.112. Let the random variable Y represent the number of radar sets that detect a particular aircraft. Compute the probabilities associated with each value of Y .

2.140

Refer to Exercise 2.120. Let the random variable Y represent the number of defective refrigerators found after three refrigerators have been tested. Compute the probabilities for each value of Y .

2.141

Refer again to Exercise 2.120. Let the random variable Y represent the number of the test in which the last defective refrigerator is identiﬁed. Compute the probabilities for each value of Y .

2.142

A spinner can land in any of four positions, A, B, C, and D, with equal probability. The spinner is used twice, and the position is noted each time. Let the random variable Y denote the number of positions on which the spinner did not land. Compute the probabilities for each value of Y .

2.12 Random Sampling As our ﬁnal topic in this chapter, we move from theory to application and examine the nature of experiments conducted in statistics. A statistical experiment involves the observation of a sample selected from a larger body of data, existing or conceptual, called a population. The measurements in the sample, viewed as observations of the values of one or more random variables, are then employed to make an inference about the characteristics of the target population. How are these inferences made? An exact answer to this question is deferred until later, but a general observation follows from our discussion in Section 2.2. There we learned that the probability of the observed sample plays a major role in making an inference and evaluating the credibility of the inference. Without belaboring the point, it is clear that the method of sampling will affect the probability of a particular sample outcome. For example, suppose that a ﬁctitious

78

Chapter 2

Probability

population contains only N = 5 elements, from which we plan to take a sample of size n = 2. You could mix the elements thoroughly and select two in such a way that all pairs of elements possess an equal probability of selection. A second sampling procedure might require selecting a single element, replacing it in the population, and then drawing a single element again. The two methods of sample selection are called sampling without and with replacement, respectively. If all the N = 5 population elements are distinctly different, the probability of drawing a speciﬁc pair, when sampling without replacement, is 1/10. The probability of drawing the same speciﬁc pair, when sampling with replacement, is 2/25. You can easily verify these results. The point that we make is that the method of sampling, known as the design of an experiment, affects both the quantity of information in a sample and the probability of observing a speciﬁc sample result. Hence, every sampling procedure must be clearly described if we wish to make valid inferences from sample to population. The study of the design of experiments, the various types of designs along with their properties, is a course in itself. Hence, at this early stage of study we introduce only the simplest sampling procedure, simple random sampling. The notion of simple random sampling will be needed in subsequent discussions of the probabilities associated with random variables, and it will inject some realism into our discussion of statistics. This is because simple random sampling is often employed in practice. Now let us deﬁne the term random sample. DEFINITION 2.13

Let N and n represent the numbers of elements in the population and sample, respectively. If the sampling is conducted in such a way that each of the Nn samples has an equal probability of being selected, the sampling is said to be random, and the result is said to be a random sample. Perfect random sampling is difﬁcult to achieve in practice. If the population is not too large, we might write each of the N numbers on a poker chip, mix all the chips, and select a sample of n chips. The numbers on the poker chips would specify the measurements to appear in the sample. Tables of random numbers have been formed by computer to expedite the selection of random samples. An example of such a table is Table 12, Appendix 3. A random number table is a set of integers (0, 1, . . . , 9) generated so that, in the long run, the table will contain all ten integers in approximately equal proportions, with no trends in the patterns in which the digits were generated. Thus, if one digit is selected from a random point on the table, it is equally likely to be any of the digits 0 through 9. Choosing numbers from the table is analogous to drawing numbered poker chips from the mixed pile, as mentioned earlier. Suppose we want a random sample of three persons to be selected from a population of seven persons. We could number the people from 1 to 7, put the numbers on chips, thoroughly mix the chips, and then draw three out. Analogously, we could drop a pencil point on a random starting point in Table 12, Appendix 3. Suppose the point falls on the 15th line of column 9 and we decide to use the rightmost digit of the group of ﬁve, which is a 5 in this case. This process is like drawing the chip numbered 5. We may now proceed in any direction to

2.13

Summary

79

obtain the remaining numbers in the sample. If we decide to proceed down the page, the next number (immediately below the 5) is a 2. So our second sampled person would be number 2. Proceeding, we next come to an 8, but there are only seven elements in the population. Thus, the 8 is ignored, and we continue down the column. Two more 5s then appear, but they must both be ignored because person 5 has already been selected. (The chip numbered 5 has been removed from the pile.) Finally, we come to a 1, and our sample of three is completed with persons numbered 5, 2, and 1. Any starting point can be used in a random number table, and we may proceed in any direction from the starting point. However, if more than one sample is to be used in any problem, each should have a unique starting point. In many situations the population is conceptual, as in an observation made during a laboratory experiment. Here the population is envisioned to be the inﬁnitely many measurements that would be obtained if the experiment were to be repeated over and over again. If we wish a sample of n = 10 measurements from this population, we repeat the experiment ten times and hope that the results represent, to a reasonable degree of approximation, a random sample. Although the primary purpose of this discussion was to clarify the meaning of a random sample, we would like to mention that some sampling techniques are only partially random. For instance, if we wish to determine the voting preference of the nation in a presidential election, we would not likely choose a random sample from the population of voters. By pure chance, all the voters appearing in the sample might be drawn from a single city—say, San Francisco—which might not be at all representative of the population of all voters in the United States. We would prefer a random selection of voters from smaller political districts, perhaps states, allotting a speciﬁed number to each state. The information from the randomly selected subsamples drawn from the respective states would be combined to form a prediction concerning the entire population of voters in the country. In general, we want to select a sample so as to obtain a speciﬁed quantity of information at minimum cost.

2.13 Summary This chapter has been concerned with providing a model for the repetition of an experiment and, consequently, a model for the population frequency distributions of Chapter 1. The acquisition of a probability distribution is the ﬁrst step in forming a theory to model reality and to develop the machinery for making inferences. An experiment was deﬁned as the process of making an observation. The concepts of an event, a simple event, the sample space, and the probability axioms have provided a probabilistic model for calculating the probability of an event. Numerical events and the deﬁnition of a random variable were introduced in Section 2.11. Inherent in the model is the sample-point approach for calculating the probability of an event (Section 2.5). Counting rules useful in applying the sample-point method were discussed in Section 2.6. The concept of conditional probability, the operations of set algebra, and the laws of probability set the stage for the event-composition method for calculating the probability of an event (Section 2.9). Of what value is the theory of probability? It provides the theory and the tools for calculating the probabilities of numerical events and hence the probability

80

Chapter 2

Probability

distributions for the random variables that will be discussed in Chapter 3. The numerical events of interest to us appear in a sample, and we will wish to calculate the probability of an observed sample to make an inference about the target population. Probability provides both the foundation and the tools for statistical inference, the objective of statistics.

References and Further Readings Cramer, H. 1973. The Elements of Probability Theory and Some of Its Applications, 2d ed. Huntington, N.Y.: Krieger. Feller, W. 1968. An Introduction to Probability Theory and Its Applications, 3d ed., vol. 1. New York: Wiley. ———. 1971. An Introduction to Probability Theory and Its Applications, 2d ed., vol. 2. New York: Wiley. Meyer, P. L. 1970. Introductory Probability and Statistical Applications, 2d ed. Reading, Mass.: Addison-Wesley. Parzen, E. 1992. Modern Probability Theory and Its Applications. New York: Wiley-Interscience. Riordan, J. 2002. Introduction to Combinatorial Analysis. Mineola, N.Y.: Dover Publications.

Supplementary Exercises 2.143

Show that Theorem 2.7 holds for conditional probabilities. That is, if P(B) > 0, then P(A|B) = 1 − P(A|B).

2.144

Let S contain four sample points, E 1 , E 2 , E 3 , and E 4 . a b c

List all possible events in S (include the null event). n n In Exercise 2.68(d), you showed that i=1 = 2n . Use this result to give the total number i of events in S. Let A and B be the events {E 1 , E 2 , E 3 } and {E 2 , E 4 }, respectively. Give the sample points in the following events: A ∪ B, A ∩ B, A ∩ B, and A ∪ B.

2.145

A patient receiving a yearly physical examination must have 18 checks or tests performed. The sequence in which the tests are conducted is important because the time lost between tests will vary depending on the sequence. If an efﬁciency expert were to study the sequences to ﬁnd the one that required the minimum length of time, how many sequences would be included in her study if all possible sequences were admissible?

2.146

Five cards are drawn from a standard 52-card playing deck. What is the probability that all 5 cards will be of the same suit?

2.147

Refer to Exercise 2.146. A gambler has been dealt ﬁve cards: two aces, one king, one ﬁve, and one 9. He discards the 5 and the 9 and is dealt two more cards. What is the probability that he ends up with a full house?

Supplementary Exercises

81

2.148

A bin contains three components from supplier A, four from supplier B, and ﬁve from supplier C. If four of the components are randomly selected for testing, what is the probability that each supplier would have at least one component tested?

2.149

A large group of people is to be checked for two common symptoms of a certain disease. It is thought that 20% of the people possess symptom A alone, 30% possess symptom B alone, 10% possess both symptoms, and the remainder have neither symptom. For one person chosen at random from this group, ﬁnd these probabilities: a The person has neither symptom. b The person has at least one symptom. c The person has both symptoms, given that he has symptom B.

2.150

Refer to Exercise 2.149. Let the random variable Y represent the number of symptoms possessed by a person chosen at random from the group. Compute the probabilities associated with each value of Y .

*2.151

A Model for the World Series Two teams A and B play a series of games until one team wins four games. We assume that the games are played independently and that the probability that A wins any game is p. What is the probability that the series lasts exactly ﬁve games?

2.152

We know the following about a colormetric method used to test lake water for nitrates. If water specimens contain nitrates, a solution dropped into the water will cause the specimen to turn red 95% of the time. When used on water specimens without nitrates, the solution causes the water to turn red 10% of the time (because chemicals other than nitrates are sometimes present and they also react to the agent). Past experience in a lab indicates that nitrates are contained in 30% of the water specimens that are sent to the lab for testing. If a water specimen is randomly selected a from among those sent to the lab, what is the probability that it will turn red when tested? b and turns red when tested, what is the probability that it actually contains nitrates?

2.153

Medical case histories indicate that different illnesses may produce identical symptoms. Suppose that a particular set of symptoms, denoted H , occurs only when any one of three illnesses, I1 , I2 , or I3 , occurs. Assume that the simultaneous occurrence of more that one of these illnesses is impossible and that P(I1 ) = .01,

P(I2 ) = .005,

P(I3 ) = .02.

The probabilities of developing the set of symptoms H , given each of these illnesses, are known to be P(H |I1 ) = .90,

P(H |I2 ) = .95,

P(H |I3 ) = .75.

Assuming that an ill person exhibits the symptoms, H , what is the probability that the person has illness I1 ?

2.154

2.155

a A drawer contains n = 5 different and distinguishable pairs of socks (a total of ten socks). If a person (perhaps in the dark) randomly selects four socks, what is the probability that there is no matching pair in the sample? *b A drawer contains n different and distinguishable pairs of socks (a total of 2n socks). A person randomly selects 2r of the socks, where 2r < n. In terms of n and r , what is the probability that there is no matching pair in the sample? A group of men possesses the three characteristics of being married (A), having a college degree (B), and being a citizen of a speciﬁed state (C), according to the fractions given in the accompanying Venn diagram. That is, 5% of the men possess all three characteristics, whereas

82

Chapter 2

Probability

20% have a college education but are not married and are not citizens of the speciﬁed state. One man is chosen at random from this group.

.15

.10 .05 .10 .20

C

.10 .25

B

A

Find the probability that he a b c d

2.156

is married. has a college degree and is married. is not from the speciﬁed state but is married and has a college degree. is not married or does not have a college degree, given that he is from the speciﬁed state.

The accompanying table lists accidental deaths by age and certain speciﬁc types for the United States in 2002. a A randomly selected person from the United States was known to have an accidental death in 2002. Find the probability that i he was over the age of 15 years. ii the cause of death was a motor vehicle accident. iii the cause of death was a motor vehicle accident, given that the person was between 15 and 24 years old. iv the cause of death was a drowning accident, given that it was not a motor vehicle accident and the person was 34 years old or younger. b From these ﬁgures can you determine the probability that a person selected at random from the U.S. population had a fatal motor vehicle accident in 2002? Type of Accident Age

All Types

Motor Vehicle

Falls

Drowning

Under 5 5–14 15–24 25–34 35–44 45–54 55–64 65–74 75 and over Total

2,707 2,979 14,113 11,769 15,413 12,278 7,505 7,698 23,438 97,900

819 1,772 10,560 6,884 6,927 5,361 3,506 3,038 4,487 43,354

44 37 237 303 608 871 949 1,660 8,613 13,322

568 375 646 419 480 354 217 179 244 3,482

Source: Compiled from National Vital Statistics Report 50, no. 15, 2002.

Supplementary Exercises

83

2.157

A study of the residents of a region showed that 20% were smokers. The probability of death due to lung cancer, given that a person smoked, was ten times the probability of death due to lung cancer, given that the person did not smoke. If the probability of death due to lung cancer in the region is .006, what is the probability of death due to lung cancer given that the person is a smoker?

2.158

A bowl contains w white balls and b black balls. One ball is selected at random from the bowl, its color is noted, and it is returned to the bowl along with n additional balls of the same color. Another single ball is randomly selected from the bowl (now containing w + b + n balls) and it is observed that the ball is black. Show that the (conditional) probability that the ﬁrst ball w selected was white is . w +b+n

2.159

It seems obvious that P(∅) = 0. Show that this result follows from the axioms in Deﬁnition 2.6.

2.160

A machine for producing a new experimental electronic component generates defectives from time to time in a random manner. The supervising engineer for a particular machine has noticed that defectives seem to be grouping (hence appearing in a nonrandom manner), thereby suggesting a malfunction in some part of the machine. One test for nonrandomness is based on the number of r uns of defectives and nondefectives (a run is an unbroken sequence of either defectives or nondefectives). The smaller the number of runs, the greater will be the amount of evidence indicating nonrandomness. Of 12 components drawn from the machine, the ﬁrst 10 were not defective, and the last 2 were defective (N N N N N N N N N N D D). Assume randomness. What is the probability of observing a this arrangement (resulting in two runs) given that 10 of the 12 components are not defective? b two runs?

2.161

Refer to Exercise 2.160. What is the probability that the number of runs, R, is less than or equal to 3?

2.162

Assume that there are nine parking spaces next to one another in a parking lot. Nine cars need to be parked by an attendant. Three of the cars are expensive sports cars, three are large domestic cars, and three are imported compacts. Assuming that the attendant parks the cars at random, what is the probability that the three expensive sports cars are parked adjacent to one another?

2.163

Relays used in the construction of electric circuits function properly with probability .9. Assuming that the circuits operate independently, which of the following circuit designs yields the higher probability that current will ﬂow when the relays are activated? 1

3

2

4

A

B

A

1

3

2

4

A

B

B

2.164

Refer to Exercise 2.163 and consider circuit A. If we know that current is ﬂowing, what is the probability that switches 1 and 4 are functioning properly?

2.165

Refer to Exercise 2.163 and consider circuit B. If we know that current is ﬂowing, what is the probability that switches 1 and 4 are functioning properly?

2.166

Eight tires of different brands are ranked from 1 to 8 (best to worst) according to mileage performance. If four of these tires are chosen at random by a customer, ﬁnd the probability that the best tire among those selected by the customer is actually ranked third among the original eight.

84

Chapter 2

Probability

2.167

Refer to Exercise 2.166. Let Y denote the actual quality rank of the best tire selected by the customer. In Exercise 2.166, you computed P(Y = 3). Give the possible values of Y and the probabilities associated with all of these values.

2.168

As in Exercises 2.166 and 2.167, eight tires of different brands are ranked from 1 to 8 (best to worst) according to mileage performance. a If four of these tires are chosen at random by a customer, what is the probability that the best tire selected is ranked 3 and the worst is ranked 7? b In part (a) you computed the probability that the best tire selected is ranked 3 and the worst is ranked 7. If that is the case, the range of the ranks, R = largest rank − smallest rank = 7 − 3 = 4. What is P(R = 4)? c Give all possible values for R and the probabilities associated with all of these values.

*2.169

Three beer drinkers (say I, II, and III) are to rank four different brands of beer (say A, B, C, and D) in a blindfold test. Each drinker ranks the four beers as 1 (for the beer that he or she liked best), 2 (for the next best), 3, or 4. a Carefully describe a sample space for this experiment (note that we need to specify the ranking of all four beers for all three drinkers). How many sample points are in this sample space? b Assume that the drinkers cannot discriminate between the beers so that each assignment of ranks to the beers is equally likely. After all the beers are ranked by all three drinkers, the ranks of each brand of beer are summed. What is the probability that some beer will receive a total rank of 4 or less?

2.170

Three names are to be selected from a list of seven names for a public opinion survey. Find the probability that the ﬁrst name on the list is selected for the survey.

2.171

An AP news service story, printed in the Gainesville Sun on May 20, 1979, states the following with regard to debris from Skylab striking someone on the ground: “The odds are 1 in 150 that a piece of Skylab will hit someone. But 4 billion people . . . live in the zone in which pieces could fall. So any one person’s chances of being struck are one in 150 times 4 billion—or one in 600 billion.” Do you see any inaccuracies in this reasoning?

2.172

Let A and B be any two events. Which of the following statements, in general, are false? a b c

P(A|B) + P(A|B) = 1. P(A|B) + P(A|B) = 1. P(A|B) + P(A|B) = 1.

2.173

As items come to the end of a production line, an inspector chooses which items are to go through a complete inspection. Ten percent of all items produced are defective. Sixty percent of all defective items go through a complete inspection, and 20% of all good items go through a complete inspection. Given that an item is completely inspected, what is the probability it is defective?

2.174

Many public schools are implementing a “no-pass, no-play” rule for athletes. Under this system, a student who fails a course is disqualiﬁed from participating in extracurricular activities during the next grading period. Suppose that the probability is .15 that an athlete who has not previously been disqualiﬁed will be disqualiﬁed next term. For athletes who have been previously disqualiﬁed, the probability of disqualiﬁcation next term is .5. If 30% of the athletes have been disqualiﬁed in previous terms, what is the probability that a randomly selected athlete will be disqualiﬁed during the next grading period?

Supplementary Exercises

2.175

85

Three events, A, B, and C, are said to be mutually independent if P(A ∩ B) = P(A) × P(B), P(A ∩ C) = P(A) × P(C),

P(B ∩ C) = P(B) × P(C), P(A ∩ B ∩ C) = P(A) × P(B) × P(C).

Suppose that a balanced coin is independently tossed two times. Deﬁne the following events: A: Head appears on the ﬁrst toss. B: Head appears on the second toss. C: Both tosses yield the same outcome. Are A, B, and C mutually independent?

2.176

Refer to Exercise 2.175 and suppose that events A, B, and C are mutually independent. a Show that (A ∪ B) and C are independent. b Show that A and (B ∩ C) are independent.

2.177

Refer to Exercise 2.90(b) where a friend claimed that if there is a 1 in 50 chance of injury on a single jump then there is a 100% chance of injury if a skydiver jumps 50 times. Assume that the results of repeated jumps are mutually independent. a b c

What is the probability that 50 jumps will be completed without an injury? What is the probability that at least one injury will occur in 50 jumps? What is the maximum number of jumps, n, the skydiver can make if the probability is at least .60 that all n jumps will be completed without injury?

*2.178

Suppose that the probability of exposure to the ﬂu during an epidemic is .6. Experience has shown that a serum is 80% successful in preventing an inoculated person from acquiring the ﬂu, if exposed to it. A person not inoculated faces a probability of .90 of acquiring the ﬂu if exposed to it. Two persons, one inoculated and one not, perform a highly specialized task in a business. Assume that they are not at the same location, are not in contact with the same people, and cannot expose each other to the ﬂu. What is the probability that at least one will get the ﬂu?

*2.179

Two gamblers bet $1 each on the successive tosses of a coin. Each has a bank of $6. What is the probability that a they break even after six tosses of the coin? b one player—say, Jones—wins all the money on the tenth toss of the coin?

*2.180

Suppose that the streets of a city are laid out in a grid with streets running north–south and east–west. Consider the following scheme for patrolling an area of 16 blocks by 16 blocks. An ofﬁcer commences walking at the intersection in the center of the area. At the corner of each block the ofﬁcer randomly elects to go north, south, east, or west. What is the probability that the ofﬁcer will a reach the boundary of the patrol area after walking the ﬁrst 8 blocks? b return to the starting point after walking exactly 4 blocks?

*2.181

Suppose that n indistinguishable balls are to be arranged in N distinguishable boxes so that each distinguishable arrangement is equally likely. If n ≥ N , show that the probability no box will be empty is given by n−1 N −1 . N +n−1 N −1

CHAPTER

3

Discrete Random Variables and Their Probability Distributions 3.1

Basic Deﬁnition

3.2

The Probability Distribution for a Discrete Random Variable

3.3

The Expected Value of a Random Variable or a Function of a Random Variable

3.4

The Binomial Probability Distribution

3.5

The Geometric Probability Distribution

3.6

The Negative Binomial Probability Distribution (Optional)

3.7

The Hypergeometric Probability Distribution

3.8

The Poisson Probability Distribution

3.9

Moments and Moment-Generating Functions

3.10 Probability-Generating Functions (Optional) 3.11 Tchebysheff’s Theorem 3.12 Summary References and Further Readings

3.1 Basic Deﬁnition As stated in Section 2.12, a random variable is a real-valued function deﬁned over a sample space. Consequently, a random variable can be used to identify numerical events that are of interest in an experiment. For example, the event of interest in an opinion poll regarding voter preferences is not usually the particular people sampled or the order in which preferences were obtained but Y = the number of voters favoring a certain candidate or issue. The observed value of this random variable must be zero 86

3.2

The Probability Distribution for a Discrete Random Variable

87

or an integer between 1 and the sample size. Thus, this random variable can take on only a ﬁnite number of values with nonzero probability. A random variable of this type is said to be discrete. DEFINITION 3.1

A random variable Y is said to be discrete if it can assume only a ﬁnite or countably inﬁnite1 number of distinct values. A less formidable characterization of discrete random variables can be obtained by considering some practical examples. The number of bacteria per unit area in the study of drug control on bacterial growth is a discrete random variable, as is the number of defective television sets in a shipment of 100 sets. Indeed, discrete random variables often represent counts associated with real phenomena. Let us now consider the relation of the material in Chapter 2 to this chapter. Why study the theory of probability? The answer is that the probability of an observed event is needed to make inferences about a population. The events of interest are often numerical events that correspond to values of discrete random variables. Hence, it is imperative that we know the probabilities of these numerical events. Because certain types of random variables occur so frequently in practice, it is useful to have at hand the probability for each value of a random variable. This collection of probabilities is called the probability distribution of the discrete random variable. We will ﬁnd that many experiments exhibit similar characteristics and generate random variables with the same type of probability distribution. Consequently, knowledge of the probability distributions for random variables associated with common types of experiments will eliminate the need for solving the same probability problems over and over again.

3.2 The Probability Distribution for a Discrete Random Variable Notationally, we will use an uppercase letter, such as Y , to denote a random variable and a lowercase letter, such as y, to denote a particular value that a random variable may assume. For example, let Y denote any one of the six possible values that could be observed on the upper face when a die is tossed. After the die is tossed, the number actually observed will be denoted by the symbol y. Note that Y is a random variable, but the speciﬁc observed value, y, is not random. The expression (Y = y) can be read, the set of all points in S assigned the value y by the random variable Y . It is now meaningful to talk about the probability that Y takes on the value y, denoted by P(Y = y). As in Section 2.11, this probability is deﬁned as the sum of the probabilities of appropriate sample points in S.

1. Recall that a set of elements is countably inﬁnite if the elements in the set can be put into one-to-one correspondence with the positive integers.

88

Chapter 3

Discrete Random Variables and Their Probability Distributions

DEFINITION 3.2

The probability that Y takes on the value y, P(Y = y), is deﬁned as the sum of the probabilities of all sample points in S that are assigned the value y. We will sometimes denote P(Y = y) by p(y). Because p(y) is a function that assigns probabilities to each value y of the random variable Y , it is sometimes called the probability function for Y .

DEFINITION 3.3

The probability distribution for a discrete variable Y can be represented by a formula, a table, or a graph that provides p(y) = P(Y = y) for all y. Notice that p(y) ≥ 0 for all y, but the probability distribution for a discrete random variable assigns nonzero probabilities to only a countable number of distinct y values. Any value y not explicitly assigned a positive probability is understood to be such that p(y) = 0. We illustrate these ideas with an example.

E X A M PL E 3.1

Solution

A supervisor in a manufacturing plant has three men and three women working for him. He wants to choose two workers for a special job. Not wishing to show any biases in his selection, he decides to select the two workers at random. Let Y denote the number of women in his selection. Find the probability distribution for Y . The supervisor can select two workers from six in 62 = 15 ways. Hence, S contains 15 sample points, which we assume to be equally likely because random sampling was employed. Thus, P(E i ) = 1/15, for i = 1, 2, . . . , 15. The values for Y that have nonzero probability are 0, 1, and 2. The number of ways of selecting Y = 0 women is 30 32 because the supervisor must select zero workers from the three women and two from the three men. Thus, there are 30 32 = 1 · 3 = 3 sample points in the event Y = 0, and 33 3 1 p(0) = P(Y = 0) = 0 2 = = . 15 15 5 Similarly, 33 9 3 p(1) = P(Y = 1) = 1 1 = = , 15 15 5 33 3 1 = . p(2) = P(Y = 2) = 2 0 = 15 15 5 Notice that (Y = 1) is by far the most likely outcome. This should seem reasonable since the number of women equals the number of men in the original group.

The table for the probability distribution of the random variable Y considered in Example 3.1 is summarized in Table 3.1. The same distribution is given in graphical form in Figure 3.1. If we regard the width at each bar in Figure 3.1 as one unit, then

3.2

The Probability Distribution for a Discrete Random Variable

89

Table 3.1 Probability distribution for Example 3.1

F I G U R E 3.1 Probability histogram for Table 3.1

y

p(y)

0 1 2

1/5 3/5 1/5

p ( y) 3/5

1/5

1

y

2

the area in a bar is equal to the probability that Y takes on the value over which the bar is centered. This concept of areas representing probabilities was introduced in Section 1.2. The most concise method of representing discrete probability distributions is by means of a formula. For Example 3.1 we see that the formula for p(y) can be written as 3 3 p(y) =

y

2−y

6

,

y = 0, 1, 2.

2

Notice that the probabilities associated with all distinct values of a discrete random variable must sum to 1. In summary, the following properties must hold for any discrete probability distribution: THEOREM 3.1

For any discrete probability distribution, the following must be true: 1. 0≤ p(y) ≤ 1 for all y. 2. y p(y) = 1, where the summation is over all values of y with nonzero probability. As mentioned in Section 1.5, the probability distributions we derive are models, not exact representations, for the frequency distributions of populations of real data that occur (or would be generated) in nature. Thus, they are models for real distributions of data similar to the distributions discussed in Chapter 1. For example, if we were to randomly select two workers from among the six described in Example 3.1, we would observe a single y value. In this instance the observed y value would be 0, 1, or 2. If the experiment were repeated many times, many y values would be generated. A relative frequency histogram for the resulting data, constructed in the manner described in Chapter 1, would be very similar to the probability histogram of Figure 3.1.

90

Chapter 3

Discrete Random Variables and Their Probability Distributions

Such simulation studies are very useful. By repeating some experiments over and over again, we can generate measurements of discrete random variables that possess frequency distributions very similar to the probability distributions derived in this chapter, reinforcing the conviction that our models are quite accurate.

Exercises 3.1

When the health department tested private wells in a county for two impurities commonly found in drinking water, it found that 20% of the wells had neither impurity, 40% had impurity A, and 50% had impurity B. (Obviously, some had both impurities.) If a well is randomly chosen from those in the county, ﬁnd the probability distribution for Y , the number of impurities found in the well.

3.2

You and a friend play a game where you each toss a balanced coin. If the upper faces on the coins are both tails, you win $1; if the faces are both heads, you win $2; if the coins do not match (one shows a head, the other a tail), you lose $1 (win (−$1)). Give the probability distribution for your winnings, Y , on a single play of this game.

3.3

A group of four components is known to contain two defectives. An inspector tests the components one at a time until the two defectives are located. Once she locates the two defectives, she stops testing, but the second defective is tested to ensure accuracy. Let Y denote the number of the test on which the second defective is found. Find the probability distribution for Y .

3.4

Consider a system of water ﬂowing through valves from A to B. (See the accompanying diagram.) Valves 1, 2, and 3 operate independently, and each correctly opens on signal with probability .8. Find the probability distribution for Y , the number of open paths from A to B after the signal is given. (Note that Y can take on the values 0, 1, and 2.) 1

A

B

2

3

3.5

A problem in a test given to small children asks them to match each of three pictures of animals to the word identifying that animal. If a child assigns the three words at random to the three pictures, ﬁnd the probability distribution for Y , the number of correct matches.

3.6

Five balls, numbered 1, 2, 3, 4, and 5, are placed in an urn. Two balls are randomly selected from the ﬁve, and their numbers noted. Find the probability distribution for the following: a The largest of the two sampled numbers b The sum of the two sampled numbers

3.7

Each of three balls are randomly placed into one of three bowls. Find the probability distribution for Y = the number of empty bowls.

3.8

A single cell can either die, with probability .1, or split into two cells, with probability .9, producing a new generation of cells. Each cell in the new generation dies or splits into two cells independently with the same probabilities as the initial cell. Find the probability distribution for the number of cells in the next generation.

3.3 The Expected Value of a Random Variable or a Function of a Random Variable

3.9

91

In order to verify the accuracy of their ﬁnancial accounts, companies use auditors on a regular basis to verify accounting entries. The company’s employees make erroneous entries 5% of the time. Suppose that an auditor randomly checks three entries. a Find the probability distribution for Y , the number of errors detected by the auditor. b Construct a probability histogram for p(y). c Find the probability that the auditor will detect more than one error.

3.10

A rental agency, which leases heavy equipment by the day, has found that one expensive piece of equipment is leased, on the average, only one day in ﬁve. If rental on one day is independent of rental on any other day, ﬁnd the probability distribution of Y , the number of days between a pair of rentals.

3.11

Persons entering a blood bank are such that 1 in 3 have type O+ blood and 1 in 15 have type O− blood. Consider three randomly selected donors for the blood bank. Let X denote the number of donors with type O+ blood and Y denote the number with type O− blood. Find the probability distributions for X and Y . Also ﬁnd the probability distribution for X + Y , the number of donors who have type O blood.

3.3 The Expected Value of a Random Variable or a Function of a Random Variable We have observed that the probability distribution for a random variable is a theoretical model for the empirical distribution of data associated with a real population. If the model is an accurate representation of nature, the theoretical and empirical distributions are equivalent. Consequently, as in Chapter 1, we attempt to ﬁnd the mean and the variance for a random variable and thereby to acquire numerical descriptive measures, parameters, for the probability distribution p(y) that are consistent with those discussed in Chapter 1. DEFINITION 3.4

Let Y be a discrete random variable with the probability function p(y). Then the expected value of Y , E(Y ), is deﬁned to be2 E(Y ) = yp(y). y

If p(y) is an accurate characterization of the population frequency distribution, then E(Y ) = µ, the population mean. Deﬁnition 3.4 is completely consistent with the deﬁnition of the mean of a set of measurements that was given in Deﬁnition 1.1. For example, consider a discrete 2. To be precise, the expected value of a discrete random variable is said to exist if the sum, as given earlier, is absolutely convergent—that is, if |y| p(y) < ∞. y

This absolute convergence will hold for all examples in this text and will not be mentioned each time an expected value is deﬁned.

92

Chapter 3

Discrete Random Variables and Their Probability Distributions

Table 3.2 Probability distribution for Y

F I G U R E 3.2 Probability distribution for Y

y

p(y)

0 1 2

1/4 1/2 1/4

p ( y)

.5

.25

1

y

2

random variable Y that can assume values 0, 1, and 2 with probability distribution p(y) as shown in Table 3.2 and the probability histogram shown in Figure 3.2. A visual inspection will reveal the mean of the distribution to be located at y = 1. To show that E(Y ) = y yp(y) is the mean of the probability distribution p(y), suppose that the experiment were conducted 4 million times, yielding 4 million observed values for Y . Noting p(y) in Figure 3.2, we would expect approximately 1 million of the 4 million repetitions to result in the outcome Y = 0, 2 million in Y = 1, and 1 million in Y = 2. To ﬁnd the mean value of Y , we average these 4 million measurements and obtain n yi (1,000,000)(0) + (2,000,000)(1) + (1,000,000)(2) µ ≈ i=1 = n 4,000,000 = (0)(1/4) + (1)(1/2) + (2)(1/4) =

2

yp(y) = 1.

y=0

Thus, E(Y ) is an average, and Deﬁnition 3.4 is consistent with the deﬁnition of a mean given in Deﬁnition 1.1. Similarly, we frequently are interested in the mean or expected value of a function of a random variable Y . For example, molecules in space move at varying velocities, where Y , the velocity of a given molecule, is a random variable. The energy imparted upon impact by a moving body is proportional to the square of the velocity. Consequently, to ﬁnd the mean amount of energy transmitted by a molecule upon impact, we must ﬁnd the mean value of Y 2 . More important, we note in Deﬁnition 1.2 that the variance of a set of measurements is the mean of the square of the differences between each value in the set of measurements and their mean, or the mean value of (Y − µ)2 .

3.3 The Expected Value of a Random Variable or a Function of a Random Variable

THEOREM 3.2

Let Y be a discrete random variable with probability function p(y) and g(Y ) be a real-valued function of Y . Then the expected value of g(Y ) is given by E[g(Y )] = g(y) p(y).

Proof

We prove the result in the case where the random variable Y takes on the ﬁnite number of values y1 , y2 , . . . , yn . Because the function g(y) may not be one to-one, suppose that g(Y ) takes on values g1 , g2 , . . . , gm (where m ≤ n). It follows that g(Y ) is a random variable such that for i = 1, 2, . . . , m, P[g(Y ) = gi ] = p(y j ) = p ∗ (gi ).

93

all y

all y j such that g(y j )=gi

Thus, by Deﬁnition 3.4, E[g(Y )] =

m

gi p ∗ (gi )

i=1

=

m i=1

=

=

m

gi

p(y j )

all y j such that g(y j )=gi

gi p(y j ) i=1 all y j such that g(y j )=gi

n

g(y j ) p(y j ).

j=1

Now let us return to our immediate objective, ﬁnding numerical descriptive measures (or parameters) to characterize p(y). As previously discussed, E(Y ) provides the mean of the population with distribution given by p(y). We next seek the variance and standard deviation of this population. You will recall from Chapter 1 that the variance of a set of measurements is the average of the square of the differences between the values in a set of measurements and their mean. Thus, we wish to ﬁnd the mean value of the function g(Y ) = (Y − µ)2 . DEFINITION 3.5

If Y is a random variable with mean E(Y ) = µ, the variance of a random variable Y is deﬁned to be the expected value of (Y − µ)2 . That is, V (Y ) = E[(Y − µ)2 ]. The standard deviation of Y is the positive square root of V (Y ). If p(y) is an accurate characterization of the population frequency distribution (and to simplify notation, we will assume this to be true), then E(Y ) = µ, V (Y ) = σ 2 , the population variance, and σ is the population standard deviation.

94

Chapter 3

Discrete Random Variables and Their Probability Distributions

E X A M PL E 3.2

The probability distribution for a random variable Y is given in Table 3.3. Find the mean, variance, and standard deviation of Y . Table 3.3 Probability distribution for Y

Solution

y

p(y)

0 1 2 3

1/8 1/4 3/8 1/4

By Deﬁnitions 3.4 and 3.5, µ = E(Y ) =

3

yp(y) = (0)(1/8) + (1)(1/4) + (2)(3/8) + (3)(1/4) = 1.75,

y=0

σ 2 = E[(Y − µ)2 ] =

3 (y − µ)2 p(y) y=0

= (0 − 1.75) (1/8) + (1 − 1.75)2 (1/4) + (2 − 1.75)2 (3/8) + (3 − 1.75)2 (1/4) 2

= .9375, √ √ σ = + σ 2 = .9375 = .97. The probability histogram is shown in Figure 3.3. Locate µ on the axis of measurement, and observe that it does locate the “center” of the nonsymmetrical probability distribution of Y . Also notice that the interval (µ ± σ ) contains the discrete points Y = 1 and Y = 2, which account for 5/8 of the probability. Thus, the empirical rule (Chapter 1) provides a reasonable approximation to the probability of a measurement falling in this interval. (Keep in mind that the probabilities are concentrated at the points Y = 0, 1, 2, and 3 because Y cannot take intermediate values.) F I G U R E 3.3 Probability histogram for Example 3.2

p ( y) 3/8

1/4

1/8

0 0

1

2

3

y

It will be helpful to acquire a few additional tools and deﬁnitions before attempting to ﬁnd the expected values and variances of more complicated discrete random variables, such as the binomial or Poisson. Hence, we present three useful expectation theorems that follow directly from the theory of summation. (Other useful techniques

3.3 The Expected Value of a Random Variable or a Function of a Random Variable

95

are presented in Sections 3.4 and 3.9.) For each theorem we assume that Y is a discrete random variable with probability function p(y). The ﬁrst theorem states the rather obvious result that the mean or expected value of a nonrandom quantity c is equal to c. THEOREM 3.3

Proof

Let Y be a discrete random variable with probability function p(y) and c be a constant. Then E(c) = c. Consider the function g(Y ) ≡ c. By Theorem 3.2, cp(y) = c p(y). E(c) = But

y

y

y

p(y) = 1 (Theorem 3.1) and, hence, E(c) = c(1) = c.

The second theorem states that the expected value of the product of a constant c times a function of a random variable is equal to the constant times the expected value of the function of the variable. THEOREM 3.4

Let Y be a discrete random variable with probability function p(y), g(Y ) be a function of Y , and c be a constant. Then E[cg(Y )] = cE[g(Y )].

Proof

By Theorem 3.2, E[cg(Y )] =

cg(y) p(y) = c

y

g(y) p(y) = cE[g(Y )].

y

The third theorem states that the mean or expected value of a sum of functions of a random variable Y is equal to the sum of their respective expected values. THEOREM 3.5

Let Y be a discrete random variable with probability function p(y) and g1 (Y ), g2 (Y ), . . . , gk (Y ) be k functions of Y . Then E[g1 (Y ) + g2 (Y ) + · · · + gk (Y )] = E[g1 (Y )] + E[g2 (Y )] + · · · + E[gk (Y )].

Proof

We will demonstrate the proof only for the case k = 2, but analogous steps will hold for any ﬁnite k. By Theorem 3.2, E[g1 (Y ) + g2 (Y )] = [g1 (y) + g2 (y)] p(y) y

=

y

g1 (y) p(y) +

g2 (y) p(y)

y

= E[g1 (Y )] + E[g2 (Y )]. Theorems 3.3, 3.4, and 3.5 can be used immediately to develop a theorem useful in ﬁnding the variance of a discrete random variable.

96

Chapter 3

Discrete Random Variables and Their Probability Distributions

THEOREM 3.6

Let Y be a discrete random variable with probability function p(y) and mean E(Y ) = µ; then V (Y ) = σ 2 = E[(Y − µ)2 ] = E(Y 2 ) − µ2 .

Proof

σ 2 = E[(Y − µ)2 ] = E(Y 2 − 2µY + µ2 ) = E(Y 2 ) − E(2µY ) + E(µ2 )

(by Theorem 3.5).

Noting that µ is a constant and applying Theorems 3.4 and 3.3 to the second and third terms, respectively, we have σ 2 = E(Y 2 ) − 2µE(Y ) + µ2 . But µ = E(Y ) and, therefore, σ 2 = E(Y 2 ) − 2µ2 + µ2 = E(Y 2 ) − µ2 . Theorem 3.6 often greatly reduces the labor in ﬁnding the variance of a discrete random variable. We will demonstrate the usefulness of this result by recomputing the variance of the random variable considered in Example 3.2.

E X A M PL E 3.3 Solution

Use Theorem 3.6 to ﬁnd the variance of the random variable Y in Example 3.2. The mean µ = 1.75 was found in Example 3.2. Because E(Y 2 ) =

y 2 p(y) = (0)2 (1/8) + (1)2 (1/4) + (2)2 (3/8) + (3)2 (1/4) = 4,

y

Theorem 3.6 yields that σ 2 = E(Y 2 ) − µ2 = 4 − (1.75)2 = .9375.

E X A M PL E 3.4

The manager of an industrial plant is planning to buy a new machine of either type A or type B. If t denotes the number of hours of daily operation, the number of daily repairs Y1 required to maintain a machine of type A is a random variable with mean and variance both equal to .10t. The number of daily repairs Y2 for a machine of type B is a random variable with mean and variance both equal to .12t. The daily cost of operating A is C A (t) = 10t + 30Y 21 ; for B it is C B (t) = 8t + 30Y 22 . Assume that the repairs take negligible time and that each night the machines are tuned so that they operate essentially like new machines at the start of the next day. Which machine minimizes the expected daily cost if a workday consists of (a) 10 hours and (b) 20 hours?

Exercises

Solution

97

The expected daily cost for A is E[C A (t)] = E 10t + 30Y 21 = 10t + 30E Y 21 = 10t + 30{V (Y1 ) + [E(Y1 )]2 } = 10t + 30[.10t + (.10t)2 ] = 13t + .3t 2 . In this calculation, we used the known values for V (Y1 ) and E(Y1 ) and the fact that V (Y1 ) = E(Y12 ) − [E(Y1 )]2 to obtain that E(Y12 ) = V (Y1 ) + [E(Y1 )]2 = .10t + (.10t)2 . Similarly, E[C B (t)] = E 8t + 30Y 22 = 8t + 30E Y 22 = 8t + 30{V (Y2 ) + [E(Y2 )]2 } = 8t + 30[.12t + (.12t)2 ] = 11.6t + .432t 2 . Thus, for scenario (a) where t = 10, E[C A (10)] = 160

and

E[C B (10)] = 159.2,

which results in the choice of machine B. For scenario (b), t = 20 and E[C A (20)] = 380

and

E[C B (20)] = 404.8,

resulting in the choice of machine A. In conclusion, machines of type B are more economical for short time periods because of their smaller hourly operating cost. For long time periods, however, machines of type A are more economical because they tend to be repaired less frequently.

The purpose of this section was to introduce the concept of an expected value and to develop some useful theorems for ﬁnding means and variances of random variables or functions of random variables. In the following sections, we present some speciﬁc types of discrete random variables and provide formulas for their probability distributions and their means and variances. As you will see, actually deriving some of these expected values requires skill in the summation of algebraic series and knowledge of a few tricks. We will illustrate some of these tricks in some of the derivations in the upcoming sections.

Exercises 3.12

Let Y be a random variable with p(y) given in the accompanying table. Find E(Y ), E(1/Y ), E(Y 2 − 1), and V (Y ). y

1

2

3

4

p(y)

.4

.3

.2

.1

98

Chapter 3

Discrete Random Variables and Their Probability Distributions

3.13

Refer to the coin-tossing game in Exercise 3.2. Calculate the mean and variance of Y , your winnings on a single play of the game. Note that E(Y ) > 0. How much should you pay to play this game if your net winnings, the difference between the payoff and cost of playing, are to have mean 0?

3.14

The maximum patent life for a new drug is 17 years. Subtracting the length of time required by the FDA for testing and approval of the drug provides the actual patent life for the drug—that is, the length of time that the company has to recover research and development costs and to make a proﬁt. The distribution of the lengths of actual patent lives for new drugs is given below: Years, y

3

4

5

6

7

8

9

10

11

12

13

p(y)

.03

.05

.07

.10

.14

.20

.18

.12

.07

.03

.01

a Find the mean patent life for a new drug. b Find the standard deviation of Y = the length of life of a randomly selected new drug. c What is the probability that the value of Y falls in the interval µ ± 2σ ?

3.15

An insurance company issues a one-year $1000 policy insuring against an occurrence A that historically happens to 2 out of every 100 owners of the policy. Administrative fees are $15 per policy and are not part of the company’s “proﬁt.” How much should the company charge for the policy if it requires that the expected proﬁt per policy be $50? [Hint: If C is the premium for the policy, the company’s “proﬁt” is C −15 if A does not occur and C −15−1000 if A does occur.]

3.16

The secretary in Exercise 2.121 was given n computer passwords and tries the passwords at random. Exactly one password will permit access to a computer ﬁle. Find the mean and the variance of Y , the number of trials required to open the ﬁle, if unsuccessful passwords are eliminated (as in Exercise 2.121).

3.17

Refer to Exercise 3.7. Find the mean and standard deviation for Y = the number of empty bowls. What is the probability that the value of Y falls within 2 standard deviations of the mean?

3.18

Refer to Exercise 3.8. What is the mean number of cells in the second generation?

3.19

Who is the king of late night TV? An Internet survey estimates that, when given a choice between David Letterman and Jay Leno, 52% of the population prefers to watch Jay Leno. Three late night TV watchers are randomly selected and asked which of the two talk show hosts they prefer. a Find the probability distribution for Y , the number of viewers in the sample who prefer Leno. b Construct a probability histogram for p(y). c What is the probability that exactly one of the three viewers prefers Leno? d What are the mean and standard deviation for Y ? e What is the probability that the number of viewers favoring Leno falls within 2 standard deviations of the mean?

3.20

A manufacturing company ships its product in two different sizes of truck trailers. Each shipment is made in a trailer with dimensions 8 feet × 10 feet × 30 feet or 8 feet × 10 feet × 40 feet. If 30% of its shipments are made by using 30-foot trailers and 70% by using 40-foot trailers, ﬁnd the mean volume shipped per trailer load. (Assume that the trailers are always full.)

3.21

The number N of residential homes that a ﬁre company can serve depends on the distance r (in city blocks) that a ﬁre engine can cover in a speciﬁed (ﬁxed) period of time. If we assume that

Exercises

99

N is proportional to the area of a circle R blocks from the ﬁrehouse, then N = Cπ R 2 , where C is a constant, π = 3.1416 . . . , and R, a random variable, is the number of blocks that a ﬁre engine can move in the speciﬁed time interval. For a particular ﬁre company, C = 8, the probability distribution for R is as shown in the accompanying table, and p(r ) = 0 for r ≤ 20 and r ≥ 27. r

21

22

23

24

25

26

p(r )

.05

.20

.30

.25

.15

.05

Find the expected value of N , the number of homes that the ﬁre department can serve.

3.22

A single fair die is tossed once. Let Y be the number facing up. Find the expected value and variance of Y .

3.23

In a gambling game a person draws a single card from an ordinary 52-card playing deck. A person is paid $15 for drawing a jack or a queen and $5 for drawing a king or an ace. A person who draws any other card pays $4. If a person plays this game, what is the expected gain?

3.24

Approximately 10% of the glass bottles coming off a production line have serious ﬂaws in the glass. If two bottles are randomly selected, ﬁnd the mean and variance of the number of bottles that have serious ﬂaws.

3.25

Two construction contracts are to be randomly assigned to one or more of three ﬁrms: I, II, and III. Any ﬁrm may receive both contracts. If each contract will yield a proﬁt of $90,000 for the ﬁrm, ﬁnd the expected proﬁt for ﬁrm I. If ﬁrms I and II are actually owned by the same individual, what is the owner’s expected total proﬁt?

*3.26

A heavy-equipment salesperson can contact either one or two customers per day with probability 1/3 and 2/3, respectively. Each contact will result in either no sale or a $50,000 sale, with the probabilities .9 and .1, respectively. Give the probability distribution for daily sales. Find the mean and standard deviation of the daily sales.3

3.27

A potential customer for an $85,000 ﬁre insurance policy possesses a home in an area that, according to experience, may sustain a total loss in a given year with probability of .001 and a 50% loss with probability .01. Ignoring all other partial losses, what premium should the insurance company charge for a yearly policy in order to break even on all $85,000 policies in this area?

3.28

Refer to Exercise 3.3. If the cost of testing a component is $2 and the cost of repairing a defective is $4, ﬁnd the expected total cost for testing and repairing the lot.

*3.29

If Y is a discrete random variable that assigns positive probabilities to only the positive integers, show that E(Y ) =

∞

P(Y ≥ k).

i=1

3.30

Suppose that Y is a discrete random variable with mean µ and variance σ 2 and let X = Y + 1. a Do you expect the mean of X to be larger than, smaller than, or equal to µ = E(Y )? Why? b Use Theorems 3.3 and 3.5 to express E(X ) = E(Y + 1) in terms of µ = E(Y ). Does this result agree with your answer to part (a)? c Recalling that the variance is a measure of spread or dispersion, do you expect the variance of X to be larger than, smaller than, or equal to σ 2 = V (Y )? Why? 3. Exercises preceded by an asterisk are optional.

100

Chapter 3

Discrete Random Variables and Their Probability Distributions

d

Use Deﬁnition 3.5 and the result in part (b) to show that V (X ) = E{[(X − E(X )]2 } = E[(Y − µ)2 ] = σ 2 ; that is, X = Y + 1 and Y have equal variances.

3.31

Suppose that Y is a discrete random variable with mean µ and variance σ 2 and let W = 2Y . a Do you expect the mean of W to be larger than, smaller than, or equal to µ = E(Y )? Why? b Use Theorem 3.4 to express E(W ) = E(2Y ) in terms of µ = E(Y ). Does this result agree with your answer to part (a)? c Recalling that the variance is a measure of spread or dispersion, do you expect the variance of W to be larger than, smaller than, or equal to σ 2 = V (Y )? Why? d Use Deﬁnition 3.5 and the result in part (b) to show that V (W ) = E{[W − E(W )]2 } = E[4(Y − µ)2 ] = 4σ 2 ; that is, W = 2Y has variance four times that of Y .

3.32

Suppose that Y is a discrete random variable with mean µ and variance σ 2 and let U = Y /10. a Do you expect the mean of U to be larger than, smaller than, or equal to µ = E(Y )? Why? b Use Theorem 3.4 to express E(U ) = E(Y/10) in terms of µ = E(Y ). Does this result agree with your answer to part (a)? c Recalling that the variance is a measure of spread or dispersion, do you expect the variance of U to be larger than, smaller than, or equal to σ 2 = V (Y )? Why? d Use Deﬁnition 3.5 and the result in part (b) to show that V (U ) = E{[U − E(U )]2 } = E[.01(Y − µ)2 ] = .01σ 2 ; that is, U = Y/10 has variance .01 times that of Y .

3.33

Let Y be a discrete random variable with mean µ and variance σ 2 . If a and b are constants, use Theorems 3.3 through 3.6 to prove that a b

3.34

E(aY + b) = a E(Y ) + b = aµ + b. V (aY + b) = a 2 V (Y ) = a 2 σ 2 .

The manager of a stockroom in a factory has constructed the following probability distribution for the daily demand (number of times used) for a particular tool. y

1

2

p(y)

.1

.5

.4

It costs the factory $10 each time the tool is used. Find the mean and variance of the daily cost for use of the tool.

3.4 The Binomial Probability Distribution Some experiments consist of the observation of a sequence of identical and independent trials, each of which can result in one of two outcomes. Each item leaving a manufacturing production line is either defective or nondefective. Each shot in a sequence of ﬁrings at a target can result in a hit or a miss, and each of n persons

3.4

The Binomial Probability Distribution

101

questioned prior to a local election either favors candidate Jones or does not. In this section we are concerned with experiments, known as binomial experiments, that exhibit the following characteristics.

DEFINITION 3.6

A binomial experiment possesses the following properties: 1. The experiment consists of a ﬁxed number, n, of identical trials. 2. Each trial results in one of two outcomes: success, S, or failure, F. 3. The probability of success on a single trial is equal to some value p and remains the same from trial to trial. The probability of a failure is equal to q = (1 − p). 4. The trials are independent. 5. The random variable of interest is Y , the number of successes observed during the n trials.

Determining whether a particular experiment is a binomial experiment requires examining the experiment for each of the characteristics just listed. Notice that the random variable of interest is the number of successes observed in the n trials. It is important to realize that a success is not necessarily “good” in the everyday sense of the word. In our discussions, success is merely a name for one of the two possible outcomes on a single trial of an experiment.

E X A M PL E 3.5

An early-warning detection system for aircraft consists of four identical radar units operating independently of one another. Suppose that each has a probability of .95 of detecting an intruding aircraft. When an intruding aircraft enters the scene, the random variable of interest is Y , the number of radar units that do not detect the plane. Is this a binomial experiment?

Solution

To decide whether this is a binomial experiment, we must determine whether each of the ﬁve requirements in Deﬁnition 3.6 is met. Notice that the random variable of interest is Y , the number of radar units that do not detect an aircraft. The random variable of interest in a binomial experiment is always the number of successes; consequently, the present experiment can be binomial only if we call the event do not detect a success. We now examine the experiment for the ﬁve characteristics of the binomial experiment. 1. The experiment involves four identical trials. Each trial consists of determining whether (or not) a particular radar unit detects the aircraft. 2. Each trial results in one of two outcomes. Because the random variable of interest is the number of successes, S denotes that the aircraft was not detected, and F denotes that it was detected. 3. Because all the radar units detect aircraft with equal probability, the probability of an S on each trial is the same, and p = P(S) = P(do not detect) = .05.

102

Chapter 3

Discrete Random Variables and Their Probability Distributions

4. The trials are independent because the units operate independently. 5. The random variable of interest is Y , the number of successes in four trials. Thus, the experiment is a binomial experiment, with n = 4, p = .05, and q = 1 − .05 = .95.

E X A M PL E 3.6

Suppose that 40% of a large population of registered voters favor candidate Jones. A random sample of n = 10 voters will be selected, and Y , the number favoring Jones, is to be observed. Does this experiment meet the requirements of a binomial experiment?

Solution

If each of the ten people is selected at random from the population, then we have ten nearly identical trials, with each trial resulting in a person either favoring Jones (S) or not favoring Jones (F). The random variable of interest is then the number of successes in the ten trials. For the ﬁrst person selected, the probability of favoring Jones (S) is .4. But what can be said about the unconditional probability that the second person will favor Jones? In Exercise 3.35 you will show that unconditionally the probability that the second person favors Jones is also .4. Thus, the probability of a success S stays the same from trial to trial. However, the conditional probability of a success on later trials depends on the number of successes in the previous trials. If the population of voters is large, removal of one person will not substantially change the fraction of voters favoring Jones, and the conditional probability that the second person favors Jones will be very close to .4. In general, if the population is large and the sample size is relatively small, the conditional probability of success on a later trial given the number of successes on the previous trials will stay approximately the same regardless of the outcomes on previous trials. Thus, the trials will be approximately independent and so sampling problems of this type are approximately binomial.

If the sample size in Example 3.6 was large relative to the population size (say, 10% of the population), the conditional probability of selecting a supporter of Jones on a later selection would be signiﬁcantly altered by the preferences of persons selected earlier in the experiment, and the experiment would not be binomial. The hypergeometric probability distribution, the topic of Section 3.7, is the appropriate probability model to be used when the sample size is large relative to the population size. You may wish to reﬁne your ability to identify binomial experiments by reexamining the exercises at the end of Chapter 2. Several of the experiments in those exercises are binomial or approximately binomial experiments. The binomial probability distribution p(y) can be derived by applying the samplepoint approach to ﬁnd the probability that the experiment yields y successes. Each sample point in the sample space can be characterized by an n-tuple involving the

3.4

The Binomial Probability Distribution

103

letters S and F, corresponding to success and failure. A typical sample point would thus appear as SS F S F F FS F S . . . F S, n positions where the letter in the ith position (proceeding from left to right) indicates the outcome of the ith trial. Now let us consider a particular sample point corresponding to y successes and hence contained in the numerical event Y = y. This sample point, . . . F F, SSSSS. . . SSS F F F y n−y represents the intersection of n independent events (the outcomes of the n trials), in which there were y successes followed by (n − y) failures. Because the trials were independent and the probability of S, p, stays the same from trial to trial, the probability of this sample point is ppppp · · · ppp qqq · · · qq = p y q n−y . y terms n − y terms Every other sample point in the event Y = y can be represented as an n-tuple containing y S’s and (n − y) F’s in some order. Any such sample point also has probability p y q n−y . Because the number of distinct n-tuples that contain y S’s and (n − y) F’s is (from Theorem 2.3) n n! , = y!(n − y)! y it follows that the event (Y = y) is made up of ny sample points, each with probability p y q n−y , and that p(y) = ny p y q n−y , y = 0, 1, 2, . . . , n. The result that we have just derived is the formula for the binomial probability distribution. DEFINITION 3.7

A random variable Y is said to have a binomial distribution based on n trials with success probability p if and only if n y n−y p(y) = p q , y = 0, 1, 2, . . . , n and 0 ≤ p ≤ 1. y Figure 3.4 portrays p(y) graphically as probability histograms, the ﬁrst for n = 10, p = .1; the second for n = 10, p = .5; and the third for n = 20, p = .5. Before we proceed, let us reconsider the representation for the sample points in this experiment. We have seen that a sample point can be represented by a sequence of n letters, each of which is either S or F. If the sample point contains exactly one S, the probability associated with that sample point is pq n−1 . If another sample point contains 2 S’s—and (n − 2)F’s—the probability of this sample point is p 2 q n−2 . Notice that the sample points for a binomial experiment are not equiprobable unless p = .5. The term binomial experiment derives from the fact each trial results in one of two possible outcomes and that the probabilities p(y), y = 0, 1, 2, . . . , n, are terms of

104

Chapter 3

Discrete Random Variables and Their Probability Distributions

F I G U R E 3.4 Binomial probability histograms

p ( y) .40 .30

n = 10, p = .1 .20 .10

0 0

1

2

4

3

5

6

7

8

9

10

y

(a) p ( y) .25

n = 10, p = .5 .20 .15 .10 .05

0 0

1

2

4

3

5

6

7

8

9

10 y

(b) p ( y) .18 .16 .14

n = 20, p = .5

.12 .10 .08 .06 .04 .02

0 0

2

4

6

8

10

12

14

16

18

20

y

(c)

the binomial expansion n n n 1 n−1 n 2 n−2 n n q + p q + p q + ··· + p . (q + p)n = 0 1 2 n You observe that n0 q n = p(0), n1 p 1 q n−1 = p(1), and, in general, p(y) = n ywill p q n−y . It also follows that p(y) satisﬁes the necessary properties for a probability y function because p(y) is positive for y = 0, 1, . . . , n and [because (q + p) = 1] n n y n−y p(y) = = (q + p)n = 1n = 1. p q y y y=0

3.4

The Binomial Probability Distribution

105

The binomial probability distribution has many applications because the binomial experiment occurs in sampling for defectives in industrial quality control, in the sampling of consumer preference or voting populations, and in many other physical situations. We will illustrate with a few examples. Other practical examples will appear in the exercises at the end of this section and at the end of the chapter. E X A M PL E 3.7

Suppose that a lot of 5000 electrical fuses contains 5% defectives. If a sample of 5 fuses is tested, ﬁnd the probability of observing at least one defective.

Solution

It is reasonable to assume that Y , the number of defectives observed, has an approximate binomial distribution because the lot is large. Removing a few fuses does not change the composition of those remaining enough to cause us concern. Thus, 5 0 5 P(at least one defective) = 1 − p(0) = 1 − p q 0 = 1 − (.95)5 = 1 − .774 = .226. Notice that there is a fairly large chance of seeing at least one defective, even though the sample is quite small.

E X A M PL E 3.8

Experience has shown that 30% of all persons afﬂicted by a certain illness recover. A drug company has developed a new medication. Ten people with the illness were selected at random and received the medication; nine recovered shortly thereafter. Suppose that the medication was absolutely worthless. What is the probability that at least nine of ten receiving the medication will recover?

Solution

Let Y denote the number of people who recover. If the medication is worthless, the probability that a single ill person will recover is p = .3. Then the number of trials is n = 10 and the probability of exactly nine recoveries is 10 P(Y = 9) = p(9) = (.3)9 (.7) = .000138. 9 Similarly, the probability of exactly ten recoveries is 10 P(Y = 10) = p(10) = (.3)10 (.7)0 = .000006, 10 and P(Y ≥ 9) = p(9) + p(10) = .000138 + .000006 = .000144. If the medication is ineffective, the probability of observing at least nine recoveries is extremely small. If we administered the medication to ten individuals and observed at least nine recoveries, then either (1) the medication is worthless and we have observed a rare event or (2) the medication is indeed useful in curing the illness. We adhere to conclusion 2.

106

Chapter 3

Discrete Random Variables and Their Probability Distributions

A tabulation of binomial probabilities in the form ay=0 p(y), presented in Table 1, Appendix 3, will greatly reduce the computations for some of the exercises. The references at the end of the chapter list several more extensive tabulations of binomial probabilities. Due to practical space limitations, printed tables typically apply for only selected values of n and p. Binomial probabilities can also be found using various computer software packages. If Y has a binomial distribution based on n trials with success probability p, P(Y = y0 ) = p(y0 ) can be found by using the R (or SPlus) command dbinom(y0 ,n,p), whereas P(Y ≤ y0 ) is found by using the R (or S-Plus) command pbinom(y0 ,n,p). A distinct advantage of using software to compute binomial probabilities is that (practically) any values for n and p can be used. We illustrate the use of Table 1 (and, simultaneously, the use of the output of the R command pbinom(y0 ,n,p)) in the following example.

E X A M PL E 3.9

The large lot of electrical fuses of Example 3.7 is supposed to contain only 5% defectives. If n = 20 fuses are randomly sampled from this lot, ﬁnd the probability that at least four defectives will be observed.

Solution

Letting Y denote the number of defectives in the sample, we assume the binomial model for Y , with p = .05. Thus, P(Y ≥ 4) = 1 − P(Y ≤ 3), and using Table 1, Appendix 3 [or the R command pbinom(3,20,.05)], we obtain P(Y ≤ 3) =

3

p(y) = .984.

y=0

The value .984 is found in the table labeled n = 20 in Table 1, Appendix 3. Speciﬁcally, it appears in the column labeled p = .05 and in the row labeled a = 3. It follows that P(Y ≥ 4) = 1 − .984 = .016. This probability is quite small. If we did indeed observe more than three defectives out of 20 fuses, we might suspect that the reported 5% defect rate is erroneous.

The mean and variance associated with a binomial random variable are derived in the following theorem. As you will see in the proof of the theorem, it is necessary to evaluate the sum of some arithmetic series. In the course of the proof, we illustrate some of the techniques that are available for summing such series. In particular, we use the fact that y p(y) = 1 for any discrete random variable.

3.4

THEOREM 3.7

The Binomial Probability Distribution

Let Y be a binomial random variable based on n trials and success probability p. Then µ = E(Y ) = np

Proof

107

and

σ 2 = V (Y ) = npq.

By Deﬁnitions 3.4 and 3.7, E(Y ) =

yp(y) =

y

n n y n−y y p q . y y=0

Notice that the ﬁrst term in the sum is 0 and hence that n n! p y q n−y y E(Y ) = (n − y)!y! y=1 =

n y=1

n! p y q n−y . (n − y)!(y − 1)!

The summands in this last expression bear a striking resemblance to binomial probabilities. In fact, if we factor np out of each term in the sum and let z = y−1, E(Y ) = np

n y=1

= np

n−1 z=0

= np

(n − 1)! p y−1 q n−y (n − y)!(y − 1)! (n − 1)! p z q n−1−z (n − 1 − z)!z!

n−1 z=0

n − 1 z n−1−z . pq z

z n−1−z pq is the binomial probability function based Notice that p(z) = n−1 z p(z) = 1, and it follows that on (n − 1) trials. Thus, z

µ = E(Y ) = np. From Theorem 3.6, we know that σ 2 = V (Y ) = E(Y 2 ) − µ2 . Thus, σ 2 can be calculated if we ﬁnd E(Y 2 ). Finding E(Y 2 ) directly is difﬁcult because n n n n! 2 2 2 n p y q n−y y p(y) = y y2 p y q n−y = E(Y ) = y y!(n − y)! y=0 y=0 y=0 and the quantity y 2 does not appear as a factor of y!. Where do we go from here? Notice that E[Y (Y − 1)] = E(Y 2 − Y ) = E(Y 2 ) − E(Y ) and, therefore, E(Y 2 ) = E[Y (Y − 1)] + E(Y ) = E[Y (Y − 1)] + µ.

108

Chapter 3

Discrete Random Variables and Their Probability Distributions

In this case, E[Y (Y − 1)] =

n

y(y − 1)

y=0

n! p y q n−y . y!(n − y)!

The ﬁrst and second terms of this sum equal zero (when y = 0 and y = 1). Then n n! p y q n−y . E[Y (Y − 1)] = (y − 2)!(n − y)! y=2 (Notice the cancellation that led to this last result. The anticipation of this cancellation is what actually motivated the consideration of E[Y (Y − 1)].) Again, the summands in the last expression look very much like binomial probabilities. Factor n(n − 1) p 2 out of each term in the sum and let z = y − 2 to obtain n (n − 2)! p y−2 q n−y E[Y (Y − 1)] = n(n − 1) p 2 (y − 2)!(n − y)! y=2 n−2

(n − 2)! p z q n−2−z z!(n − 2 − z)! z=0 n−2 n − 2 = n(n − 1) p 2 p z q n−2−z . z z=0 n−2 z n−2−z Again note that p(z) = z p q is the binomial probability function p(z) = 1 (again using the device illustrated based on (n − 2) trials. Then n−2 z=0 in the derivation of the mean) and = n(n − 1) p 2

E[Y (Y − 1)] = n(n − 1) p 2 . Thus, E(Y 2 ) = E[Y (Y − 1)] + µ = n(n − 1) p 2 + np and σ 2 = E(Y 2 ) − µ2 = n(n − 1) p 2 + np − n 2 p 2 = np[(n − 1) p + 1 − np] = np(1 − p) = npq. In addition to providing formulas for the mean and variance of a binomial random variable, the derivation of Theorem 3.7 illustrates the use of two fairly common tricks, namely, to use the fact that p(y) = 1 if p(y) is a valid probability function and to ﬁnd E(Y 2 ) by ﬁnding E[Y (Y − 1)]. These techniques also will be useful in the next sections where we consider other discrete probability distributions and the associated means and variances. A frequent source of error in applying the binomial probability distribution to practical problems is the failure to deﬁne which of the two possible results of a trial

3.4

The Binomial Probability Distribution

109

is the success. As a consequence, q may be used erroneously in place of p. Carefully deﬁne a success and make certain that p equals the probability of a success for each application. Thus far in this section we have assumed that the number of trials, n, and the probability of success, p, were known, and we used the formula p(y) = ny p y q n−y to compute probabilities associated with binomial random variables. In Example 3.8 we obtained a value for P(Y ≥ 9) and used this probability to reach a conclusion about the effectiveness of the medication. The next example exhibits another statistical, rather than probabilistic, use of the binomial distribution.

EXAMPLE 3.10

Suppose that we survey 20 individuals working for a large company and ask each whether they favor implementation of a new policy regarding retirement funding. If, in our sample, 6 favored the new policy, ﬁnd an estimate for p, the true but unknown proportion of employees that favor the new policy.

Solution

If Y denotes the number among the 20 who favor the new policy, it is reasonable to conclude that Y has a binomial distribution with n = 20 for some value of p. Whatever the true value for p, we conclude that the probability of observing 6 out of 20 in favor of the policy is 20 6 P(Y = 6) = p (1 − p)14 . 6 We will use as our estimate for p the value that maximizes the probability of observing the value that we actually observed (6 in favor in 20 trials). How do we ﬁnd the value of p that maximizes P(Y = 6)? Because 20 is a constant (relative to p) and is an increasing function of w, 6 ln(w) 6 p (1 − p)14 is the same as the value the value of p that maximizes P(Y = 6) = 20 6 6 14 of p that maximizes ln[ p (1 − p) ] = [6 ln( p) + 14 ln(1 − p)]. If we take the derivative of [6 ln( p) + 14 ln(1 − p)] with respect to p, we obtain 6 14 d[6 ln( p) + 14 ln(1 − p)] = − . dp p 1− p The value of p that maximizes (or minimizes) [6 ln( p) + 14 ln(1 − p)] [and, more important, P(Y = 6)] is the solution to the equation 6 14 − = 0. p 1− p Solving, we obtain p = 6/20. Because the second derivative of [6 ln( p) + 14 ln(1 − p)] is negative when p = 6/20, it follows that [6 ln( p) + 14 ln(1 − p)] [and P(Y = 6)] is maximized when p = 6/20. Our estimate for p, based on 6 “successes” in 20 trials is therefore 6/20. The ultimate answer that we obtained should look very reasonable to you. Because p is the probability of a “success” on any given trial, a reasonable estimate is, indeed,

110

Chapter 3

Discrete Random Variables and Their Probability Distributions

the proportion of “successes” in our sample, in this case 6/20. In the next section, we will apply this same technique to obtain an estimate that is not initially so intuitive. As we will see in Chapter 9, the estimate that we just obtained is the maximum likelihood estimate for p and the procedure used above is an example of the application of the method of maximum likelihood.

Exercises 3.35

Consider the population of voters described in Example 3.6. Suppose that there are N = 5000 voters in the population, 40% of whom favor Jones. Identify the event favors Jones as a success S. It is evident that the probability of S on trial 1 is .40. Consider the event B that S occurs on the second trial. Then B can occur two ways: The ﬁrst two trials are both successes or the ﬁrst trial is a failure and the second is a success. Show that P(B) = .4. What is P(B| the ﬁrst trial is S)? Does this conditional probability differ markedly from P(B)?

3.36

The manufacturer of a low-calorie dairy drink wishes to compare the taste appeal of a new formula (formula B) with that of the standard formula (formula A). Each of four judges is given three glasses in random order, two containing formula A and the other containing formula B. Each judge is asked to state which glass he or she most enjoyed. Suppose that the two formulas are equally attractive. Let Y be the number of judges stating a preference for the new formula. a Find the probability function for Y . b What is the probability that at least three of the four judges state a preference for the new formula? c Find the expected value of Y . d Find the variance of Y .

3.37

In 2003, the average combined SAT score (math and verbal) for college-bound students in the United States was 1026. Suppose that approximately 45% of all high school graduates took this test and that 100 high school graduates are randomly selected from among all high school grads in the United States. Which of the following random variables has a distribution that can be approximated by a binomial distribution? Whenever possible, give the values for n and p. a b c d e

3.38

The number of students who took the SAT The scores of the 100 students in the sample The number of students in the sample who scored above average on the SAT The amount of time required by each student to complete the SAT The number of female high school grads in the sample

a A meteorologist in Denver recorded Y = the number of days of rain during a 30-day period. Does Y have a binomial distribution? If so, are the values of both n and p given? b A market research ﬁrm has hired operators who conduct telephone surveys. A computer is used to randomly dial a telephone number, and the operator asks the answering person whether she has time to answer some questions. Let Y = the number of calls made until the ﬁrst person replies that she is willing to answer the questions. Is this a binomial experiment? Explain.

Exercises

3.39

111

A complex electronic system is built with a certain number of backup components in its subsystems. One subsystem has four identical components, each with a probability of .2 of failing in less than 1000 hours. The subsystem will operate if any two of the four components are operating. Assume that the components operate independently. Find the probability that a exactly two of the four components last longer than 1000 hours. b the subsystem operates longer than 1000 hours.

3.40

The probability that a patient recovers from a stomach disease is .8. Suppose 20 people are known to have contracted this disease. What is the probability that a b c d

exactly 14 recover? at least 10 recover? at least 14 but not more than 18 recover? at most 16 recover?

3.41

A multiple-choice examination has 15 questions, each with ﬁve possible answers, only one of which is correct. Suppose that one of the students who takes the examination answers each of the questions with an independent random guess. What is the probability that he answers at least ten questions correctly?

3.42

Refer to Exercise 3.41. What is the probability that a student answers at least ten questions correctly if a for each question, the student can correctly eliminate one of the wrong answers and subsequently answers each of the questions with an independent random guess among the remaining answers? b he can correctly eliminate two wrong answers for each question and randomly chooses from among the remaining answers?

3.43

Many utility companies promote energy conservation by offering discount rates to consumers who keep their energy usage below certain established subsidy standards. A recent EPA report notes that 70% of the island residents of Puerto Rico have reduced their electricity usage sufﬁciently to qualify for discounted rates. If ﬁve residential subscribers are randomly selected from San Juan, Puerto Rico, ﬁnd the probability of each of the following events: a All ﬁve qualify for the favorable rates. b At least four qualify for the favorable rates.

3.44

A new surgical procedure is successful with a probability of p. Assume that the operation is performed ﬁve times and the results are independent of one another. What is the probability that a all ﬁve operations are successful if p = .8? b exactly four are successful if p = .6? c less than two are successful if p = .3?

3.45

A ﬁre-detection device utilizes three temperature-sensitive cells acting independently of each other in such a manner that any one or more may activate the alarm. Each cell possesses a probability of p = .8 of activating the alarm when the temperature reaches 100◦ Celsius or more. Let Y equal the number of cells activating the alarm when the temperature reaches 100◦ . a Find the probability distribution for Y . b Find the probability that the alarm will function when the temperature reaches 100◦ .

112

Chapter 3

Discrete Random Variables and Their Probability Distributions

3.46

Construct probability histograms for the binomial probability distributions for n = 5, p = .1, .5, and .9. (Table 1, Appendix 3, will reduce the amount of calculation.) Notice the symmetry for p = .5 and the direction of skewness for p = .1 and .9.

3.47

Use Table 1, Appendix 3, to construct a probability histogram for the binomial probability distribution for n = 20 and p = .5. Notice that almost all the probability falls in the interval 5 ≤ y ≤ 15.

3.48

In Exercise 2.151, you considered a model for the World Series. Two teams A and B play a series of games until one team wins four games. We assume that the games are played independently and that the probability that A wins any game is p. Compute the probability that the series lasts exactly ﬁve games. [Hint: Use what you know about the random variable, Y , the number of games that A wins among the ﬁrst four games.]

3.49

Tay-Sachs disease is a genetic disorder that is usually fatal in young children. If both parents are carriers of the disease, the probability that their offspring will develop the disease is approximately .25. Suppose that a husband and wife are both carriers and that they have three children. If the outcomes of the three pregnancies are mutually independent, what are the probabilities of the following events? a All three children develop Tay-Sachs. b Only one child develops Tay-Sachs. c The third child develops Tay-Sachs, given that the ﬁrst two did not.

3.50

A missile protection system consists of n radar sets operating independently, each with a probability of .9 of detecting a missile entering a zone that is covered by all of the units. a If n = 5 and a missile enters the zone, what is the probability that exactly four sets detect the missile? At least one set? b How large must n be if we require that the probability of detecting a missile that enters the zone be .999?

3.51

In the 18th century, the Chevalier de Mere asked Blaise Pascal to compare the probabilities of two events. Below, you will compute the probability of the two events that, prior to contrary gambling experience, were thought by de Mere to be equally likely. a What is the probability of obtaining at least one 6 in four rolls of a fair die? b If a pair of fair dice is tossed 24 times, what is the probability of at least one double six?

3.52

The taste test for PTC (phenylthiocarbamide) is a favorite exercise in beginning human genetics classes. It has been established that a single gene determines whether or not an individual is a “taster.” If 70% of Americans are “tasters” and 20 Americans are randomly selected, what is the probability that a at least 17 are “tasters”? b fewer than 15 are “tasters”?

3.53

A manufacturer of ﬂoor wax has developed two new brands, A and B, which she wishes to subject to homeowners’ evaluation to determine which of the two is superior. Both waxes, A and B, are applied to ﬂoor surfaces in each of 15 homes. Assume that there is actually no difference in the quality of the brands. What is the probability that ten or more homeowners would state a preference for a brand A? b either brand A or brand B?

Exercises

3.54

113

Suppose that Y is a binomial random variable based on n trials with success probability p and consider Y = n − Y . a Argue that for y = 0, 1, . . . , n P(Y = y ) = P(n − Y = y ) = P(Y = n − y ). b

Use the result from part (a) to show that n n n−y y P(Y = y ) = p q y p n−y . q = n − y y

c

The result in part (b) implies that Y has a binomial distribution based on n trials and “success” probability p = q = 1 − p. Why is this result “obvious”?

3.55

Suppose that Y is a binomial random variable with n > 2 trials and success probability p. Use the technique presented in Theorem 3.7 and the fact that E{Y (Y − 1)(Y − 2)} = E(Y 3 ) − 3E(Y 2 ) + 2E(Y ) to derive E(Y 3 ).

3.56

An oil exploration ﬁrm is formed with enough capital to ﬁnance ten explorations. The probability of a particular exploration being successful is .1. Assume the explorations are independent. Find the mean and variance of the number of successful explorations.

3.57

Refer to Exercise 3.56. Suppose the ﬁrm has a ﬁxed cost of $20,000 in preparing equipment prior to doing its ﬁrst exploration. If each successful exploration costs $30,000 and each unsuccessful exploration costs $15,000, ﬁnd the expected total cost to the ﬁrm for its ten explorations.

3.58

A particular concentration of a chemical found in polluted water has been found to be lethal to 20% of the ﬁsh that are exposed to the concentration for 24 hours. Twenty ﬁsh are placed in a tank containing this concentration of chemical in water. a b c d

Find the probability that exactly 14 survive. Find the probability that at least 10 survive. Find the probability that at most 16 survive. Find the mean and variance of the number that survive.

3.59

Ten motors are packaged for sale in a certain warehouse. The motors sell for $100 each, but a double-your-money-back guarantee is in effect for any defectives the purchaser may receive. Find the expected net gain for the seller if the probability of any one motor being defective is .08. (Assume that the quality of any one motor is independent of that of the others.)

3.60

A particular sale involves four items randomly selected from a large lot that is known to contain 10% defectives. Let Y denote the number of defectives among the four sold. The purchaser of the items will return the defectives for repair, and the repair cost is given by C = 3Y 2 + Y + 2. Find the expected repair cost. [Hint: The result of Theorem 3.6 implies that, for any random variable Y, E(Y 2 ) = σ 2 + µ2 .]

3.61

Of the volunteers donating blood in a clinic, 80% have the Rhesus (Rh) factor present in their blood. a If ﬁve volunteers are randomly selected, what is the probability that at least one does not have the Rh factor? b If ﬁve volunteers are randomly selected, what is the probability that at most four have the Rh factor? c What is the smallest number of volunteers who must be selected if we want to be at least 90% certain that we obtain at least ﬁve donors with the Rh factor?

114

Chapter 3

Discrete Random Variables and Their Probability Distributions

3.62

Goranson and Hall (1980) explain that the probability of detecting a crack in an airplane wing is the product of p1 , the probability of inspecting a plane with a wing crack; p2 , the probability of inspecting the detail in which the crack is located; and p3 , the probability of detecting the damage. a What assumptions justify the multiplication of these probabilities? b Suppose p1 = .9, p2 = .8, and p3 = .5 for a certain ﬂeet of planes. If three planes are inspected from this ﬂeet, ﬁnd the probability that a wing crack will be detected on at least one of them.

*3.63

Consider the binomial distribution with n trials and P(S) = p. p(y) (n − y + 1) p = for y = 1, 2, . . . , n. Equivalently, for y = p(y − 1) yq (n − y + 1) p 1, 2, . . . , n, the equation p(y) = p(y − 1) gives a recursive relationship yq between the probabilities associated with successive values of Y . b If n = 90 and p = .04, use the above relationship to ﬁnd P(Y < 3). p(y) (n − y + 1) p p(y) c Show that = > 1 if y < (n + 1) p, that < 1 if y > p(y − 1) yq p(y − 1) p(y) (n +1) p, and that = 1 if (n +1) p is an integer and y = (n +1) p. This establishes p(y − 1) that p(y) > p(y − 1) if y is small (y < (n + 1) p) and p(y) < p(y − 1) if y is large (y > (n + 1) p). Thus, successive binomial probabilities increase for a while and decrease from then on. d Show that the value of y assigned the largest probability is equal to the greatest integer less than or equal to (n + 1) p. If (n + 1) p = m for some integer m, then p(m) = p(m − 1). a

Show that

*3.64

Consider an extension of the situation discussed in Example 3.10. If there are n trials in a binomial experiment and we observe y0 “successes,” show that P(Y = y0 ) is maximized when p = y0 /n. Again, we are determining (in general this time) the value of p that maximizes the probability of the value of Y that we actually observed.

*3.65

Refer to Exercise 3.64. The maximum likelihood estimator for p is Y /n (note that Y is the binomial random variable, not a particular value of it). a Derive E(Y /n). In Chapter 9, we will see that this result implies that Y/n is an unbiased estimator for p. b Derive V (Y /n). What happens to V (Y /n) as n gets large?

3.5 The Geometric Probability Distribution The random variable with the geometric probability distribution is associated with an experiment that shares some of the characteristics of a binomial experiment. This experiment also involves identical and independent trials, each of which can result in one of two outcomes: success or failure. The probability of success is equal to p and is constant from trial to trial. However, instead of the number of successes that occur in n trials, the geometric random variable Y is the number of the trial on which the ﬁrst success occurs. Thus, the experiment consists of a series of trials that concludes with the ﬁrst success. Consequently, the experiment could end with the ﬁrst trial if a success is observed on the very ﬁrst trial, or the experiment could go on indeﬁnitely.

3.5

The Geometric Probability Distribution

115

The sample space S for the experiment contains the countably inﬁnite set of sample points: E1 : S E2 : F S E3 : F F S E4 : F F F S . . . E k : F F F F . . . F S k−1 . . .

(success on ﬁrst trial) (failure on ﬁrst, success on second) (ﬁrst success on the third trial) (ﬁrst success on the fourth trial) (ﬁrst success on the k th trial)

Because the random variable Y is the number of trials up to and including the ﬁrst success, the events (Y = 1), (Y = 2), and (Y = 3) contain only the sample points E 1 , E 2 , and E 3 , respectively. More generally, the numerical event (Y = y) contains only E y . Because the trials are independent, for any y = 1, 2, 3, . . . , F . . . F S) = qqq · · · q p = q y−1 p. p(y) = P(Y = y) = P(E y ) = P(F F F y−1

DEFINITION 3.8

y−1

A random variable Y is said to have a geometric probability distribution if and only if p(y) = q y−1 p,

y = 1, 2, 3, . . . , 0 ≤ p ≤ 1.

A probability histogram for p(y), p = .5, is shown in Figure 3.5. Areas over intervals correspond to probabilities, as they did for the frequency distributions of data in Chapter 1, except that Y can assume only discrete values, y = 1, 2, . . . , ∞. That p(y) ≥ 0 is obvious by inspection of the respective values. In Exercise 3.66 you will show that these probabilities add up to 1, as is required for any valid discrete probability distribution. F I G U R E 3.5 The geometric probability distribution, p = .5

p ( y) .5 .4 .3 .2 .1

1

2

3

4

5

6

7

8

y

116

Chapter 3

Discrete Random Variables and Their Probability Distributions

The geometric probability distribution is often used to model distributions of lengths of waiting times. For example, suppose that a commercial aircraft engine is serviced periodically so that its various parts are replaced at different points in time and hence are of varying ages. Then the probability of engine malfunction, p, during any randomly observed one-hour interval of operation might be the same as for any other one-hour interval. The length of time prior to engine malfunction is the number of one-hour intervals, Y , until the ﬁrst malfunction. (For this application, engine malfunction in a given one-hour period is deﬁned as a success. Notice that, as in the case of the binomial experiment, either of the two outcomes of a trial can be deﬁned as a success. Again, a “success” is not necessarily what would be considered to be “good” in everyday conversation.)

E X A M PL E 3.11

Suppose that the probability of engine malfunction during any one-hour period is p = .02. Find the probability that a given engine will survive two hours.

Solution

Letting Y denote the number of one-hour intervals until the ﬁrst malfunction, we have ∞ p(y). P(survive two hours) = P(Y ≥ 3) = y=3

Because

∞

p(y) = 1,

y=1

P(survive two hours) = 1 −

2

p(y)

y=1

= 1 − p − q p = 1 − .02 − (.98)(.02) = .9604.

If you examine the formula for the geometric distribution given in Deﬁnition 3.8, you will see that larger values of p (and hence smaller values of q) lead to higher probabilities for the smaller values of Y and hence lower probabilities for the larger values of Y . Thus, the mean value of Y appears to be inversely proportional to p. As we show in the next theorem, the mean of a random variable with a geometric distribution is actually equal to 1/ p.

THEOREM 3.8

Proof

If Y is a random variable with a geometric distribution, 1− p 1 and σ 2 = V (Y ) = . µ = E(Y ) = p p2 E(Y ) =

∞ y=1

yq y−1 p = p

∞ y=1

yq y−1 .

3.5

The Geometric Probability Distribution

117

This series might seem to be difﬁcult to sum directly. Actually, it can be summed easily if we take into account that, for y ≥ 1, d y (q ) = yq y−1 , dq and, hence, ∞ ∞ d y q = yq y−1 . dq y=1 y=1 (The interchanging of derivative and sum here can be justiﬁed.) Substituting, we obtain ∞ ∞ d y−1 y yq =p q . E(Y ) = p dq y=1 y=1 The latter sum is the geometric series, q + q 2 + q 3 + · · ·, which is equal to q/(1 − q) (see Appendix A1.11). Therefore, q 1 p 1 d = 2 = . E(Y ) = p =p 2 dq 1 − q (1 − q) p p To summarize, our approach is to express a series that cannot be summed directly as the derivative of a series for which the sum can be readily obtained. Once we evaluate the more easily handled series, we differentiate to complete the process. The derivation of the variance is left as Exercise 3.85.

EXAMPLE 3.12

If the probability of engine malfunction during any one-hour period is p = .02 and Y denotes the number of one-hour intervals until the ﬁrst malfunction, ﬁnd the mean and standard deviation of Y .

Solution

As in Example 3.11, it follows that Y has a geometric distribution with p = .02. Thus, E(Y ) = 1/ p = 1/(.02) = 50, and we expect to wait quite a few hours before encountering a malfunction. Further, √ V (Y ) = .98/.0004 = 2450, and it follows that the standard deviation of Y is σ = 2450 = 49.497.

Although the computation of probabilities associated with geometric random variables can be accomplished by evaluating a single value or partial sums associated with a geometric series, these probabilities can also be found using various computer software packages. If Y has a geometric distribution with success probability p, P(Y = y0 ) = p(y0 ) can be found by using the R (or S-Plus) command dgeom(y0 -1,p), whereas P(Y ≤ y0 ) is found by using the R (or S-Plus) command pgeom(y0 -1,p). For example, the R (or S-Plus) command pgeom(1,0.02) yields the value for

118

Chapter 3

Discrete Random Variables and Their Probability Distributions

P(Y ≤ 2) that was implicitly used in Example 3.11. Note that the argument in these commands is the value y0 − 1, not the value y0 . This is because some authors prefer to deﬁne the geometric distribution to be that of the random variable Y = the number of failures before the ﬁrst success. In our formulation, the geometric random variable Y is interpreted as the number of the trial on which the ﬁrst success occurs. In Exercise 3.88, you will see that Y = Y −1. Due to this relationship between the two versions of geometric random variables, P(Y = y0 ) = P(Y − 1 = y0 − 1) = P(Y = y0 − 1). R computes probabilities associated with Y , explaining why the arguments for dgeom and pgeom are y0 − 1 instead of y0 . The next example, similar to Example 3.10, illustrates how knowledge of the geometric probability distribution can be used to estimate an unknown value of p, the probability of a success.

E X A M PL E 3.13

Suppose that we interview successive individuals working for the large company discussed in Example 3.10 and stop interviewing when we ﬁnd the ﬁrst person who likes the policy. If the ﬁfth person interviewed is the ﬁrst one who favors the new policy, ﬁnd an estimate for p, the true but unknown proportion of employees who favor the new policy.

Solution

If Y denotes the number of individuals interviewed until we ﬁnd the ﬁrst person who likes the new retirement plan, it is reasonable to conclude that Y has a geometric distribution for some value of p. Whatever the true value for p, we conclude that the probability of observing the ﬁrst person in favor of the policy on the ﬁfth trial is P(Y = 5) = (1 − p)4 p. We will use as our estimate for p the value that maximizes the probability of observing the value that we actually observed (the ﬁrst success on trial 5). To ﬁnd the value of p that maximizes P(Y = 5), we again observe that the value of p that maximizes P(Y = 5) = (1 − p)4 p is the same as the value of p that maximizes ln[(1 − p)4 p] = [4 ln(1 − p) + ln( p)]. If we take the derivative of [4 ln(1 − p) + ln( p)] with respect to p, we obtain −4 1 d[4 ln(1 − p) + ln( p)] = + . dp 1− p p Setting this derivative equal to 0 and solving, we obtain p = 1/5. Because the second derivative of [4 ln(1 − p) + ln( p)] is negative when p = 1/5, it follows that [4 ln(1 − p) + ln( p)] [and P(Y = 5)] is maximized when p = 1/5. Our estimate for p, based on observing the ﬁrst success on the ﬁfth trial is 1/5. Perhaps this result is a little more surprising than the answer we obtained in Example 3.10 where we estimated p on the basis of observing 6 in favor of the new plan in a sample of size 20. Again, this is an example of the use of the method of maximum likelihood that will be studied in more detail in Chapter 9.

Exercises

119

Exercises 3.66

Suppose that Y is a random variable with a geometric distribution. Show that ∞ y−1 a p = 1. y=1 q y p(y) = p(y) b = q, for y = 2, 3, . . . . This ratio is less than 1, implying that the geometp(y − 1) ric probabilities are monotonically decreasing as a function of y. If Y has a geometric distribution, what value of Y is the most likely (has the highest probability)?

3.67

Suppose that 30% of the applicants for a certain industrial job possess advanced training in computer programming. Applicants are interviewed sequentially and are selected at random from the pool. Find the probability that the ﬁrst applicant with advanced training in programming is found on the ﬁfth interview.

3.68

Refer to Exercise 3.67. What is the expected number of applicants who need to be interviewed in order to ﬁnd the ﬁrst one with advanced training?

3.69

About six months into George W. Bush’s second term as president, a Gallup poll indicated that a near record (low) level of 41% of adults expressed “a great deal” or “quite a lot” of conﬁdence in the U.S. Supreme Court (http://www.gallup.com/poll/content/default.aspx?ci=17011, June 2005). Suppose that you conducted your own telephone survey at that time and randomly called people and asked them to describe their level of conﬁdence in the Supreme Court. Find the probability distribution for Y , the number of calls until the ﬁrst person is found who does not express “a great deal” or “quite a lot” of conﬁdence in the U.S. Supreme Court.

3.70

An oil prospector will drill a succession of holes in a given area to ﬁnd a productive well. The probability that he is successful on a given trial is .2. a What is the probability that the third hole drilled is the ﬁrst to yield a productive well? b If the prospector can afford to drill at most ten wells, what is the probability that he will fail to ﬁnd a productive well?

3.71

Let Y denote a geometric random variable with probability of success p. a Show that for a positive integer a, P(Y > a) = q a . b

Show that for positive integers a and b, P(Y > a + b|Y > a) = q b = P(Y > b).

This result implies that, for example, P(Y > 7|Y > 2) = P(Y > 5). Why do you think this property is called the memoryless property of the geometric distribution? c In the development of the distribution of the geometric random variable, we assumed that the experiment consisted of conducting identical and independent trials until the ﬁrst success was observed. In light of these assumptions, why is the result in part (b) “obvious”?

3.72

Given that we have already tossed a balanced coin ten times and obtained zero heads, what is the probability that we must toss it at least two more times to obtain the ﬁrst head?

3.73

A certiﬁed public accountant (CPA) has found that nine of ten company audits contain substantial errors. If the CPA audits a series of company accounts, what is the probability that the ﬁrst account containing substantial errors a is the third one to be audited? b will occur on or after the third audited account?

120

Chapter 3

Discrete Random Variables and Their Probability Distributions

3.74

Refer to Exercise 3.73. What are the mean and standard deviation of the number of accounts that must be examined to ﬁnd the ﬁrst one with substantial errors?

3.75

The probability of a customer arrival at a grocery service counter in any one second is equal to .1. Assume that customers arrive in a random stream and hence that an arrival in any one second is independent of all others. Find the probability that the ﬁrst arrival a will occur during the third one-second interval. b will not occur until at least the third one-second interval.

3.76

Of a population of consumers, 60% are reputed to prefer a particular brand, A, of toothpaste. If a group of randomly selected consumers is interviewed, what is the probability that exactly ﬁve people have to be interviewed to encounter the ﬁrst consumer who prefers brand A? At least ﬁve people?

3.77

If Y has a geometric distribution with success probability p, show that P(Y = an odd integer ) =

p . 1 − q2

3.78

If Y has a geometric distribution with success probability .3, what is the largest value, y0 , such that P(Y > y0 ) ≥ .1?

3.79

How many times would you expect to toss a balanced coin in order to obtain the ﬁrst head?

3.80

Two people took turns tossing a fair die until one of them tossed a 6. Person A tossed ﬁrst, B second, A third, and so on. Given that person B threw the ﬁrst 6, what is the probability that B obtained the ﬁrst 6 on her second toss (that is, on the fourth toss overall)?

3.81

In responding to a survey question on a sensitive topic (such as “Have you ever tried marijuana?”), many people prefer not to respond in the afﬁrmative. Suppose that 80% of the population have not tried marijuana and all of those individuals will truthfully answer no to your question. The remaining 20% of the population have tried marijuana and 70% of those individuals will lie. Derive the probability distribution of Y , the number of people you would need to question in order to obtain a single afﬁrmative response.

3.82

Refer to Exercise 3.70. The prospector drills holes until he ﬁnds a productive well. How many holes would the prospector expect to drill? Interpret your answer intuitively.

3.83

The secretary in Exercises 2.121 and 3.16 was given n computer passwords and tries the passwords at random. Exactly one of the passwords permits access to a computer ﬁle. Suppose now that the secretary selects a password, tries it, and—if it does not work—puts it back in with the other passwords before randomly selecting the next password to try (not a very clever secretary!). What is the probability that the correct password is found on the sixth try?

3.84

Refer to Exercise 3.83. Find the mean and the variance of Y , the number of the trial on which the correct password is ﬁrst identiﬁed. ∞ y . Use this Find E[Y (Y − 1)] for a geometric random variable Y by ﬁnding d 2 /dq 2 y=1 q result to ﬁnd the variance of Y .

*3.85 *3.86

Consider an extension of the situation discussed in Example 3.13. If we observe y0 as the value for a geometric random variable Y , show that P(Y = y0 ) is maximized when p = 1/y0 . Again, we are determining (in general this time) the value of p that maximizes the probability of the value of Y that we actually observed.

3.6

The Negative Binomial Probability Distribution (Optional)

121

*3.87

Refer to Exercise 3.86. The maximum likelihood estimator for p is 1/Y (note that Y is the geometric ∞ i random variable, not a particular value of it). Derive E(1/Y ). [Hint: If |r | < 1, i=1 r /i = − ln(1 − r ).]

*3.88

If Y is a geometric random variable, deﬁne Y ∗ = Y − 1. If Y is interpreted as the number of the trial on which the ﬁrst success occurs, then Y ∗ can be interpreted as the number of failures before the ﬁrst success. If Y ∗ = Y − 1, P(Y ∗ = y) = P(Y − 1 = y) = P(Y = y + 1) for y = 0, 1, 2, . . . . Show that P(Y ∗ = y) = q y p,

y = 0, 1, 2, . . . .

The probability distribution of Y ∗ is sometimes used by actuaries as a model for the distribution of the number of insurance claims made in a speciﬁc time period.

*3.89

Refer to Exercise 3.88. Derive the mean and variance of the random variable Y ∗ a by using the result in Exercise 3.33 and the relationship Y ∗ = Y − 1, where Y is geometric. *b directly, using the probability distribution for Y ∗ given in Exercise 3.88.

3.6 The Negative Binomial Probability Distribution (Optional) A random variable with a negative binomial distribution originates from a context much like the one that yields the geometric distribution. Again, we focus on independent and identical trials, each of which results in one of two outcomes: success or failure. The probability p of success stays the same from trial to trial. The geometric distribution handles the case where we are interested in the number of the trial on which the ﬁrst success occurs. What if we are interested in knowing the number of the trial on which the second, third, or fourth success occurs? The distribution that applies to the random variable Y equal to the number of the trial on which the r th success occurs (r = 2, 3, 4, etc.) is the negative binomial distribution. The following steps closely resemble those in the previous section. Let us select ﬁxed values for r and y and consider events A and B, where A = {the ﬁrst (y − 1) trials contain (r − 1) successes} and B = {trial y results in a success}. Because we assume that the trials are independent, it follows that A and B are independent events, and previous assumptions imply that P(B) = p. Therefore, p(y) = p(Y = y) = P(A ∩ B) = P(A) × P(B). Notice that P(A) is 0 if (y − 1) < (r − 1) or, equivalently, if y < r . If y ≥ r , our previous work with the binomial distribution implies that y − 1 r −1 y−r P(A) = p q . r −1

122

Chapter 3

Discrete Random Variables and Their Probability Distributions

Finally,

DEFINITION 3.9

y − 1 r y−r p(y) = p q , r −1

y = r, r + 1, r + 2, . . . .

A random variable Y is said to have a negative binomial probability distribution if and only if y − 1 r y−r y = r, r + 1, r + 2, . . . , 0 ≤ p ≤ 1. p(y) = p q , r −1

E X A M PL E 3.14

A geological study indicates that an exploratory oil well drilled in a particular region should strike oil with probability .2. Find the probability that the third oil strike comes on the ﬁfth well drilled.

Solution

Assuming independent drillings and probability .2 of striking oil with any one well, let Y denote the number of the trial on which the third oil strike occurs. Then it is reasonable to assume that Y has a negative binomial distribution with p = .2. Because we are interested in r = 3 and y = 5, 4 P(Y = 5) = p(5) = (.2)3 (.8)2 2 = 6(.008)(.64) = .0307.

If r = 2, 3, 4, . . . and Y has a negative binomial distribution with success probability p, P(Y = y0 ) = p(y0 ) can be found by using the R (or S-Plus) command dnbinom(y0 -r,r,p). If we wanted to use R to obtain p(5) in Example 3.14, we use the command dnbinom(2,3,.2). Alternatively, P(Y ≤ y0 ) is found by using the R (or S-Plus) command pnbinom(y0 -r,r,p). Note that the ﬁrst argument in these commands is the value y0 − r , not the value y0 . This is because some authors prefer to deﬁne the negative binomial distribution to be that of the random variable Y = the number of failures before the rth success. In our formulation, the negative binomial random variable, Y , is interpreted as the number of the trial on which the rth success occurs. In Exercise 3.100, you will see that Y = Y − r . Due to this relationship between the two versions of negative binomial random variables, P(Y = y0 ) = P(Y − r = y0 − r ) = P(Y = y0 − r ). R computes probabilities associated with Y , explaining why the arguments for dnbinom and pnbinom are y0 − r instead of y0 . The mean and variance of a random variable with a negative binomial distribution can be derived directly from Deﬁnitions 3.4 and 3.5 by using techniques like those previously illustrated. However, summing the resulting inﬁnite series is somewhat tedious. These derivations will be much easier after we have developed some of the techniques of Chapter 5. For now, we state the following theorem without proof.

Exercises

THEOREM 3.9

123

If Y is a random variable with a negative binomial distribution, µ = E(Y ) =

r p

and σ 2 = V (Y ) =

r (1 − p) . p2

EXAMPLE 3.15

A large stockpile of used pumps contains 20% that are in need of repair. A maintenance worker is sent to the stockpile with three repair kits. She selects pumps at random and tests them one at a time. If the pump works, she sets it aside for future use. However, if the pump does not work, she uses one of her repair kits on it. Suppose that it takes 10 minutes to test a pump that is in working condition and 30 minutes to test and repair a pump that does not work. Find the mean and variance of the total time it takes the maintenance worker to use her three repair kits.

Solution

Let Y denote the number of the trial on which the third nonfunctioning pump is found. It follows that Y has a negative binomial distribution with p = .2. Thus, E(Y ) = 3/(.2) = 15 and V (Y ) = 3(.8)/(.2)2 = 60. Because it takes an additional 20 minutes to repair each defective pump, the total time necessary to use the three kits is T = 10Y + 3(20). Using the result derived in Exercise 3.33, we see that E(T ) = 10E(Y ) + 60 = 10(15) + 60 = 210 and V (T ) = 102 V (Y ) = 100(60) = 6000. Thus, the total time necessary to use all three kits has mean 210 and standard deviation √ 6000 = 77.46.

Exercises 3.90

The employees of a ﬁrm that manufactures insulation are being tested for indications of asbestos in their lungs. The ﬁrm is requested to send three employees who have positive indications of asbestos on to a medical center for further testing. If 40% of the employees have positive indications of asbestos in their lungs, ﬁnd the probability that ten employees must be tested in order to ﬁnd three positives.

3.91

Refer to Exercise 3.90. If each test costs $20, ﬁnd the expected value and variance of the total cost of conducting the tests necessary to locate the three positives.

3.92

Ten percent of the engines manufactured on an assembly line are defective. If engines are randomly selected one at a time and tested, what is the probability that the ﬁrst nondefective engine will be found on the second trial?

124

Chapter 3

Discrete Random Variables and Their Probability Distributions

3.93

Refer to Exercise 3.92. What is the probability that the third nondefective engine will be found a on the ﬁfth trial? b on or before the ﬁfth trial?

3.94

Refer to Exercise 3.92. Find the mean and variance of the number of the trial on which a the ﬁrst nondefective engine is found. b the third nondefective engine is found.

3.95

Refer to Exercise 3.92. Given that the ﬁrst two engines tested were defective, what is the probability that at least two more engines must be tested before the ﬁrst nondefective is found?

3.96

The telephone lines serving an airline reservation ofﬁce are all busy about 60% of the time. a If you are calling this ofﬁce, what is the probability that you will complete your call on the ﬁrst try? The second try? The third try? b If you and a friend must both complete calls to this ofﬁce, what is the probability that a total of four tries will be necessary for both of you to get through?

3.97

A geological study indicates that an exploratory oil well should strike oil with probability .2. a b c d

What is the probability that the ﬁrst strike comes on the third well drilled? What is the probability that the third strike comes on the seventh well drilled? What assumptions did you make to obtain the answers to parts (a) and (b)? Find the mean and variance of the number of wells that must be drilled if the company wants to set up three producing wells.

*3.98

Consider the negative binomial distribution given in Deﬁnition 3.9. p(y) y−1 a Show that if y ≥ r + 1, = q. This establishes a recursive relationp(y − 1) y −r ship successive negative binomial probabilities, because p(y) = p(y − 1) × between y−1 q. y −r p(y) r −q p(y) y−1 b Show that = q > 1 if y < . Similarly, < 1 if p(y − 1) y −r 1−q p(y − 1) r −q y> . 1−q c Apply the result in part (b) for the case r = 7, p = .5 to determine the values of y for which p(y) > p(y − 1).

*3.99

In a sequence of independent identical trials with two possible outcomes on each trial, S and F, and with P(S) = p, what is the probability that exactly y trials will occur before the r th success?

*3.100

If Y is a negative binomial random variable, deﬁne Y ∗ = Y − r . If Y is interpreted as the number of the trial on which the r th success occurs, then Y ∗ can be interpreted as the number of failures before the r th success. a If Y ∗ = Y − r , P(Y ∗ = y) = P(Y − r = y) = P(Y = y + r ) for y = 0, 1, 2, . . . , show y +r −1 r y that P(Y ∗ = y) = p q , y = 0, 1, 2, . . . . r −1 b Derive the mean and variance of the random variable Y ∗ by using the relationship Y ∗ = Y − r , where Y is negative binomial and the result in Exercise 3.33.

3.7

*3.101

The Hypergeometric Probability Distribution

125

a We observe a sequence of independent identical trials with two possible outcomes on each trial, S and F, and with P(S) = p. The number of the trial on which we observe the ﬁfth success, Y , has a negative binomial distribution with parameters r = 5 and p. Suppose that we observe the ﬁfth success on the eleventh trial. Find the value of p that maximizes P(Y = 11). b Generalize the result from part (a) to ﬁnd the value of p that maximizes P(Y = y0 ) when Y has a negative binomial distribution with parameters r (known) and p.

3.7 The Hypergeometric Probability Distribution In Example 3.6 we considered a population of voters, 40% of whom favored candidate Jones. A sample of voters was selected, and Y (the number favoring Jones) was to be observed. We concluded that if the sample size n was small relative to the population size N , the distribution of Y could be approximated by a binomial distribution. We also determined that if n was large relative to N , the conditional probability of selecting a supporter of Jones on a later draw would be signiﬁcantly affected by the observed preferences of persons selected on earlier draws. Thus the trials were not independent and the probability distribution for Y could not be approximated adequately by a binomial probability distribution. Consequently, we need to develop the probability distribution for Y when n is large relative to N . Suppose that a population contains a ﬁnite number N of elements that possess one of two characteristics. Thus, r of the elements might be red and b = N − r , black. A sample of n elements is randomly selected from the population, and the random variable of interest is Y , the number of red elements in the sample. This random variable has what is known as the hypergeometric probability distribution. For example, the number of workers who are women, Y , in Example 3.1 has the hypergeometric distribution. The hypergeometric probability distribution can be derived by using the combinatorial theorems given in Section 2.6 and the sample-point approach. A sample point in the sample space S will correspond to a unique selection of n elements, some red and the remainder black. As in the binomial experiment, each sample point can be characterized by an n-tuple whose elements correspond to a selection of n elements from the total of N . If each element in the population were numbered from 1 to N , the sample point indicating the selection of items 5, 7, 8, 64, 17, . . . , 87 would appear as the n-tuple (5, 7, 8, 64, 17, . . . , 87). n positions The total number of sample points in S, therefore, will equal the number of ways of selecting a subset of n elements from a population of N , or nN . Because random selection implies that all sample points are equiprobable, the probability of a sample

126

Chapter 3

Discrete Random Variables and Their Probability Distributions

point in S is 1 P(E i ) = , N n

all E i ∈ S.

The total number of sample points in the numerical event Y = y is the number of sample points in S that contain y red and (n − y) black elements. This number can be obtained by applying the mn rule (Section 2.6). The number of ways of selecting y red elements to ﬁll y positions in the n-tuple representing r a sam. [We use ple point is thenumber of ways of selecting y from a total of r , or y the convention ab = 0 if b > a.] The total number of ways of selecting (n − y) black elements to ﬁll the remaining (n − y) positions in the n-tuple is the number −r . of ways of selecting (n − y) black elements from a possible (N − r ), or Nn−y Then the number of sample points in the numerical event Y = y is the number of ways of combining y red and (n − y) black elements. By the mn rule, this a set−rof . Summing the probabilities of the sample points in the is the product ry × Nn−y numerical event Y = y (multiplying the number of sample points by the common probability per sample point), we obtain the hypergeometric probability function. DEFINITION 3.10

A random variable Y is said to have a hypergeometric probability distribution if and only if r N −r y n−y , p(y) = N n where y is an integer 0, 1, 2, . . . , n, subject to the restrictions y ≤ r and n − y ≤ N − r. With the convention ab = 0 if b > a, it is clear that p(y) ≥ 0 for the hypergeometric probabilities. The fact that the hypergeometric probabilities sum to 1 follows from the fact that n r N −r N = . i n − i n i=0 A sketch of the proof of this result is outlined in Exercise 3.216.

E X A M PL E 3.16

An important problem encountered by personnel directors and others faced with the selection of the best in a ﬁnite set of elements is exempliﬁed by the following scenario. From a group of 20 Ph.D. engineers, 10 are randomly selected for employment. What is the probability that the 10 selected include all the 5 best engineers in the group of 20?

Solution

For this example N = 20, n = 10, and r = 5. That is, there are only 5 in the set of 5 best engineers, and we seek the probability that Y = 5, where Y denotes the number

3.7

The Hypergeometric Probability Distribution

127

of best engineers among the ten selected. Then 515 p(5) = 5205 = 10

15! 5!10!

10!10! 20!

=

21 = .0162. 1292

Suppose that a population of size N consists of r units with the attribute and N − r without. If a sample of size n it taken, without replacement, and Y is the number of items with the attribute in the sample, P(Y = y0 ) = p(y0 ) can be found by using the R (or S-Plus) command dhyper(y0 ,r,N-r,n). The command dhyper(5,5,15,10) yields the value for p(5) in Example 3.16. Alternatively, P(Y ≤ y0 ) is found by using the R (or S-Plus) command phyper(y0 ,r,N-r,n). The mean and variance of a random variable with a hypergeometric distribution can be derived directly from Deﬁnitions 3.4 and 3.5. However, deriving closed form expressions for the resulting summations is somewhat tedious. In Chapter 5 we will develop methods that permit a much simpler derivation of the results presented in the following theorem. THEOREM 3.10

If Y is a random variable with a hypergeometric distribution, r N −r N − n nr 2 and σ = V (Y ) = n . µ = E(Y ) = N N N N −1 Although the mean and the variance of the hypergeometric random variable seem to be rather complicated, they bear a striking resemblance to the mean and variance of a binomial random variable. Indeed, if we deﬁne p = Nr and q = 1 − p = NN−r , we can re-express the mean and variance of the hypergeometric as µ = np and N −n σ 2 = npq . N −1 You can view the factor N −n N −1 in V (Y ) as an adjustment that is appropriate when n is large relative to N . For ﬁxed n, as N → ∞, N −n → 1. N −1

EXAMPLE 3.17

An industrial product is shipped in lots of 20. Testing to determine whether an item is defective is costly, and hence the manufacturer samples his production rather than using a 100% inspection plan. A sampling plan, constructed to minimize the number of defectives shipped to customers, calls for sampling ﬁve items from each lot and rejecting the lot if more than one defective is observed. (If the lot is rejected, each item in it is later tested.) If a lot contains four defectives, what is the probability that

128

Chapter 3

Discrete Random Variables and Their Probability Distributions

it will be rejected? What is the expected number of defectives in the sample of size 5? What is the variance of the number of defectives in the sample of size 5? Solution

Let Y equal the number of defectives in the sample. Then N = 20, r = 4, and n = 5. The lot will be rejected if Y = 2, 3, or 4. Then P(rejecting the lot) = P(Y ≥ 2) = p(2) + p(3) + p(4) = 1 − p(0) − p(1) 416 416 = 1 − 0205 − 1204 5

5

= 1 − .2817 − .4696 = .2487. The mean and variance of the number of defectives in the sample of size 5 are (5)(4) 20 − 4 20 − 5 4 µ= = 1 and σ 2 = 5 = .632. 20 20 20 20 − 1

Example 3.17 involves sampling a lot of N industrial products, of which r are defective. The random variable of interest is Y , the number of defectives in a sample of size n. As noted in the beginning of this section, Y possesses an approximately binomial distribution when N is large and n is relatively small. Consequently, we would expect the probabilities assigned to values of Y by the hypergeometric distribution to approach those assigned by the binomial distribution as N becomes large and r/N , the fraction defective in the population, is held constant and equal to p. You can verify this expectation by using limit theorems encountered in your calculus courses to show that r N −r n y y n−y p (1 − p)n−y , lim N = N →∞ y n where r = p. N (The proof of this result is omitted.) Hence, for a ﬁxed fraction defective p = r/N , the hypergeometric probability function converges to the binomial probability function as N becomes large.

Exercises 3.102

An urn contains ten marbles, of which ﬁve are green, two are blue, and three are red. Three marbles are to be drawn from the urn, one at a time without replacement. What is the probability that all three marbles drawn will be green?

3.103

A warehouse contains ten printing machines, four of which are defective. A company selects ﬁve of the machines at random, thinking all are in working condition. What is the probability that all ﬁve of the machines are nondefective?

Exercises

129

3.104

Twenty identical looking packets of white power are such that 15 contain cocaine and 5 do not. Four packets were randomly selected, and the contents were tested and found to contain cocaine. Two additional packets were selected from the remainder and sold by undercover police ofﬁcers to a single buyer. What is the probability that the 6 packets randomly selected are such that the ﬁrst 4 all contain cocaine and the 2 sold to the buyer do not?

3.105

In southern California, a growing number of individuals pursuing teaching credentials are choosing paid internships over traditional student teaching programs. A group of eight candidates for three local teaching positions consisted of ﬁve who had enrolled in paid internships and three who enrolled in traditional student teaching programs. All eight candidates appear to be equally qualiﬁed, so three are randomly selected to ﬁll the open positions. Let Y be the number of internship trained candidates who are hired. a Does Y have a binomial or hypergeometric distribution? Why? b Find the probability that two or more internship trained candidates are hired. c What are the mean and standard deviation of Y ?

3.106

Refer to Exercise 3.103. The company repairs the defective ones at a cost of $50 each. Find the mean and variance of the total repair cost.

3.107

Seed are often treated with fungicides to protect them in poor draining, wet environments. A small-scale trial, involving ﬁve treated and ﬁve untreated seeds, was conducted prior to a large-scale experiment to explore how much fungicide to apply. The seeds were planted in wet soil, and the number of emerging plants were counted. If the solution was not effective and four plants actually sprouted, what is the probability that a all four plants emerged from treated seeds? b three or fewer emerged from treated seeds? c at least one emerged from untreated seeds?

3.108

A shipment of 20 cameras includes 3 that are defective. What is the minimum number of cameras that must be selected if we require that P(at least 1 defective) ≥ .8?

3.109

A group of six software packages available to solve a linear programming problem has been ranked from 1 to 6 (best to worst). An engineering ﬁrm, unaware of the rankings, randomly selected and then purchased two of the packages. Let Y denote the number of packages purchased by the ﬁrm that are ranked 3, 4, 5, or 6. Give the probability distribution for Y.

3.110

A corporation is sampling without replacement for n = 3 ﬁrms to determine the one from which to purchase certain supplies. The sample is to be selected from a pool of six ﬁrms, of which four are local and two are not local. Let Y denote the number of nonlocal ﬁrms among the three selected. a b c

3.111

P(Y = 1). P(Y ≥ 1). P(Y ≤ 1).

Speciﬁcations call for a thermistor to test out at between 9000 and 10,000 ohms at 25◦ Celcius. Ten thermistors are available, and three of these are to be selected for use. Let Y denote the number among the three that do not conform to speciﬁcations. Find the probability distributions for Y (in tabular form) under the following conditions: a b

Two thermistors do not conform to speciﬁcations among the ten that are available. Four thermistors do not conform to speciﬁcations among the ten that are available.

130

Chapter 3

Discrete Random Variables and Their Probability Distributions

3.112

Used photocopy machines are returned to the supplier, cleaned, and then sent back out on lease agreements. Major repairs are not made, however, and as a result, some customers receive malfunctioning machines. Among eight used photocopiers available today, three are malfunctioning. A customer wants to lease four machines immediately. To meet the customer’s deadline, four of the eight machines are randomly selected and, without further checking, shipped to the customer. What is the probability that the customer receives a no malfunctioning machines? b at least one malfunctioning machine?

3.113

A jury of 6 persons was selected from a group of 20 potential jurors, of whom 8 were African American and 12 were white. The jury was supposedly randomly selected, but it contained only 1 African American member. Do you have any reason to doubt the randomness of the selection?

3.114

Refer to Exercise 3.113. If the selection process were really random, what would be the mean and variance of the number of African American members selected for the jury?

3.115

Suppose that a radio contains six transistors, two of which are defective. Three transistors are selected at random, removed from the radio, and inspected. Let Y equal the number of defectives observed, where Y = 0, 1, or 2. Find the probability distribution for Y . Express your results graphically as a probability histogram.

3.116

Simulate the experiment described in Exercise 3.115 by marking six marbles or coins so that two represent defectives and four represent nondefectives. Place the marbles in a hat, mix, draw three, and record Y , the number of defectives observed. Replace the marbles and repeat the process until n = 100 observations of Y have been recorded. Construct a relative frequency histogram for this sample and compare it with the population probability distribution (Exercise 3.115).

3.117

In an assembly-line production of industrial robots, gearbox assemblies can be installed in one minute each if holes have been properly drilled in the boxes and in ten minutes if the holes must be redrilled. Twenty gearboxes are in stock, 2 with improperly drilled holes. Five gearboxes must be selected from the 20 that are available for installation in the next ﬁve robots. a Find the probability that all 5 gearboxes will ﬁt properly. b Find the mean, variance, and standard deviation of the time it takes to install these 5 gearboxes.

3.118

Five cards are dealt at random and without replacement from a standard deck of 52 cards. What is the probability that the hand contains all 4 aces if it is known that it contains at least 3 aces?

3.119

Cards are dealt at random and without replacement from a standard 52 card deck. What is the probability that the second king is dealt on the ﬁfth card?

*3.120

The sizes of animal populations are often estimated by using a capture–tag–recapture method. In this method k animals are captured, tagged, and then released into the population. Some time later n animals are captured, and Y , the number of tagged animals among the n, is noted. The probabilities associated with Y are a function of N , the number of animals in the population, so the observed value of Y contains information on this unknown N . Suppose that k = 4 animals are tagged and then released. A sample of n = 3 animals is then selected at random from the same population. Find P(Y = 1) as a function of N . What value of N will maximize P(Y = 1)?

3.8

The Poisson Probability Distribution

131

3.8 The Poisson Probability Distribution Suppose that we want to ﬁnd the probability distribution of the number of automobile accidents at a particular intersection during a time period of one week. At ﬁrst glance this random variable, the number of accidents, may not seem even remotely related to a binomial random variable, but we will see that an interesting relationship exists. Think of the time period, one week in this example, as being split up into n subintervals, each of which is so small that at most one accident could occur in it with probability different from zero. Denoting the probability of an accident in any subinterval by p, we have, for all practical purposes, P(no accidents occur in a subinterval) = 1 − p, P(one accident occurs in a subinterval) = p, P(more than one accident occurs in a subinterval) = 0. Then the total number of accidents in the week is just the total number of subintervals that contain one accident. If the occurrence of accidents can be regarded as independent from interval to interval, the total number of accidents has a binomial distribution. Although there is no unique way to choose the subintervals, and we therefore know neither n nor p, it seems reasonable that as we divide the week into a greater number n of subintervals, the probability p of one accident in one of these shorter subintervals will decrease. Letting λ = np and taking the limit of the binomial probability p(y) = ny p y (1 − p)n−y as n → ∞, we have n y n(n − 1) · · · (n − y + 1) λ y λ n−y lim p (1 − p)n−y = lim 1− n→∞ y n→∞ y! n n λy λ −y λ n n(n − 1) · · · (n − y + 1) = lim 1 − 1− n→∞ y! n ny n λy λ n λ −y 1 lim 1 − = 1− 1− y! n→∞ n n n 2 y−1 × ··· × 1 − . × 1− n n Noting that

lim

n→∞

1−

λ n

n

= e−λ

and all other terms to the right of the limit have a limit of 1, we obtain p(y) =

λ y −λ e . y!

(Note: e = 2.718. . . .) Random variables possessing this distribution are said to have a Poisson distribution. Hence, Y , the number of accidents per week, has the Poisson distribution just derived.

132

Chapter 3

Discrete Random Variables and Their Probability Distributions

Because the binomial probability function converges to the Poisson, the Poisson probabilities can be used to approximate their binomial counterparts for large n, small p, and λ = np less than, roughly, 7. Exercise 3.134 requires you to calculate corresponding binomial and Poisson probabilities and will demonstrate the adequacy of the approximation. The Poisson probability distribution often provides a good model for the probability distribution of the number Y of rare events that occur in space, time, volume, or any other dimension, where λ is the average value of Y . As we have noted, it provides a good model for the probability distribution of the number Y of automobile accidents, industrial accidents, or other types of accidents in a given unit of time. Other examples of random variables with approximate Poisson distributions are the number of telephone calls handled by a switchboard in a time interval, the number of radioactive particles that decay in a particular time period, the number of errors a typist makes in typing a page, and the number of automobiles using a freeway access ramp in a ten-minute interval. DEFINITION 3.11

A random variable Y is said to have a Poisson probability distribution if and only if λ y −λ e , y = 0, 1, 2, . . . , λ > 0. p(y) = y! As we will see in Theorem 3.11, the parameter λ that appears in the formula for the Poisson distribution is actually the mean of the distribution.

E X A M PL E 3.18

Show that the probabilities assigned by the Poisson probability distribution satisfy the requirements that 0 ≤ p(y) ≤ 1 for all y and y p(y) = 1.

Solution

Because λ > 0, it is obvious that p(y) > 0 for y = 0, 1, 2, . . . , and that p(y) = 0 otherwise. Further, ∞ y=0

p(y) =

∞ λy y=0

y!

e−λ = e−λ

∞ λy y=0

y!

= e−λ eλ = 1

y λ because the inﬁnite sum ∞ y=0 λ /y! is a series expansion of e . Sums of special series are given in Appendix A1.11.

E X A M PL E 3.19

Suppose that a random system of police patrol is devised so that a patrol ofﬁcer may visit a given beat location Y = 0, 1, 2, 3, . . . times per half-hour period, with each location being visited an average of once per time period. Assume that Y possesses, approximately, a Poisson probability distribution. Calculate the probability that the patrol ofﬁcer will miss a given location during a half-hour period. What is the probability that it will be visited once? Twice? At least once?

3.8

Solution

The Poisson Probability Distribution

133

For this example the time period is a half-hour, and the mean number of visits per half-hour interval is λ = 1. Then p(y) =

e−1 (1) y e−1 = , y! y!

y = 0, 1, 2, . . . .

The event that a given location is missed in a half-hour period corresponds to (Y = 0), and P(Y = 0) = p(0) =

e−1 = e−1 = .368. 0!

Similarly, p(1) =

e−1 = e−1 = .368, 1!

and e−1 e−1 = = .184. 2! 2 The probability that the location is visited at least once is the event (Y ≥ 1). Then p(2) =

P(Y ≥ 1) =

∞

p(y) = 1 − p(0) = 1 − e−1 = .632.

y=1

If Y has a Poisson distribution with mean λ, P(Y = y0 ) = p(y0 ) can be found by using the R (or S-Plus) command dpois(y0 , λ). If we wanted to use R to obtain p(2) in Example 3.19, we use the command dpois(2,1). Alternatively, P(Y ≤ y0 ) is found by using the R (or S-Plus) command ppois(y0 , λ).

EXAMPLE 3.20

A certain type of tree has seedlings randomly dispersed in a large area, with the mean density of seedlings being approximately ﬁve per square yard. If a forester randomly locates ten 1-square-yard sampling regions in the area, ﬁnd the probability that none of the regions will contain seedlings.

Solution

If the seedlings really are randomly dispersed, the number of seedlings per region, Y , can be modeled as a Poisson random variable with λ = 5. (The average density is ﬁve per square yard.) Thus, λ0 e−λ = e−5 = .006738. 0! The probability that Y = 0 on ten independently selected regions is (e−5 )10 because the probability of the intersection of independent events is equal to the product of the respective probabilities. The resulting probability is extremely small. Thus, if this event actually occurred, we would seriously question the assumption of randomness, the stated average density of seedlings, or both. P(Y = 0) = p(0) =

134

Chapter 3

Discrete Random Variables and Their Probability Distributions

aFor your convenience, we provide in Table 3, Appendix 3, the partial sums y=0 p(y) for the Poisson probability distribution for many values of λ between .02 and 25. This table is laid out similarly to the table of partial sums for the binomial distribution, Table 1, Appendix 3. The following example illustrates the use of Table 3 and demonstrates that the Poisson probability distribution can approximate the binomial probability distribution. E X A M PL E 3.21

Suppose that Y possesses a binomial distribution with n = 20 and p = .1. Find the exact value of P(Y ≤ 3) using the table of binomial probabilities, Table 1, Appendix 3. Use Table 3, Appendix 3, to approximate this probability, using a corresponding probability given by the Poisson distribution. Compare the exact and approximate values for P(Y ≤ 3).

Solution

According to Table 1, Appendix 3, the exact (accurate to three decimal places) value of P(Y ≤ 3) = .867. If W is a Poisson-distributed random variable with λ = np = 20(.1) = 2, previous discussions indicate that P(Y ≤ 3) is approximately equal to P(W ≤ 3). Table 3, Appendix 3, [or the R command ppois(3,2)], gives P(W ≤ 3) = .857. Thus, you can see that the Poisson approximation is quite good, yielding a value that differs from the exact value by only .01.

In our derivation of the mean and variance of a random variable with the Poisson distribution, we again use the fundamental property that y p(y) = 1 for any discrete probability distribution. THEOREM 3.11

If Y is a random variable possessing a Poisson distribution with parameter λ, then µ = E(Y ) = λ

Proof

and σ 2 = V (Y ) = λ.

By deﬁnition, E(Y ) =

y

yp(y) =

∞ y=0

y

λ y e−λ . y!

Notice that the ﬁrst term in this sum is equal to 0 (when y = 0), and, hence, ∞ ∞ λ y e−λ λ y e−λ = . y E(Y ) = y! (y − 1)! y=1 y=1 As it stands, this quantity is not equal to the sum of the values of a probability function p(y) over all values of y, but we can change it to the proper form by factoring λ out of the expression and letting z = y − 1. Then the limits of summation become z = 0 (when y = 1) and z = ∞ (when y = ∞), and E(Y ) = λ

∞ ∞ λ y−1 e−λ λz e−λ =λ . (y − 1)! z! y=1 z=0

3.8

The Poisson Probability Distribution

135

Notice that p(z) = λz e−λ /z! is the probability function for a Poisson random ∞ variable, and z=0 p(z) = 1. Therefore, E(Y ) = λ. Thus, the mean of a Poisson random variable is the single parameter λ that appears in the expression for the Poisson probability function. We leave the derivation of the variance as Exercise 3.138.

A common way to encounter a random variable with a Poisson distribution is through a model called a Poisson process. A Poisson process is an appropriate model for situations as described at the beginning of this section. If we observe a Poisson process and λ is the mean number of occurrences per unit (length, area, etc.), then Y = the number of occurrences in a units has a Poisson distribution with mean aλ. A key assumption in the development of the theory of Poisson process is independence of the numbers of occurrences in disjoint intervals (areas, etc.). See Hogg, Craig, and McKean (2005) for a theoretical development of the Poisson process.

EXAMPLE 3.22

Industrial accidents occur according to a Poisson process with an average of three accidents per month. During the last two months, ten accidents occurred. Does this number seem highly improbable if the mean number of accidents per month, µ, is still equal to 3? Does it indicate an increase in the mean number of accidents per month?

Solution

The number of accidents in two months, Y , has a Poisson probability distribution with mean λ = 2(3) = 6. The probability that Y is as large as 10 is P(Y ≥ 10) =

∞ 6 y e−6 . y! y=10

The tedious calculation required to ﬁnd P(Y ≥ 10) can be avoided by using Table 3, Appendix 3, software such as R [ppois(9,6) yields P(Y ≤ 9)]; or the empirical rule. From Theorem 3.11, µ = λ = 6,

σ 2 = λ = 6,

σ =

√ 6 = 2.45.

The empirical rule tells us that we should expect Y to take values in the interval µ ± 2σ with a high probability. Notice that µ + 2σ = 6 + (2)(2.45) = 10.90. The observed number of accidents, Y = 10, does not lie more than 2σ from µ, but it is close to the boundary. Thus, the observed result is not highly improbable, but it may be sufﬁciently improbable to warrant an investigation. See Exercise 3.210 for the exact probability P(|Y − λ| ≤ 2σ ).

136

Chapter 3

Discrete Random Variables and Their Probability Distributions

Exercises 3.121

Let Y denote a random variable that has a Poisson distribution with mean λ = 2. Find a b c d

3.122

P(Y P(Y P(Y P(Y

= 4). ≥ 4). < 4). ≥ 4|Y ≥ 2).

Customers arrive at a checkout counter in a department store according to a Poisson distribution at an average of seven per hour. During a given hour, what are the probabilities that a b c

no more than three customers arrive? at least two customers arrive? exactly ﬁve customers arrive?

3.123

The random variable Y has a Poisson distribution and is such that p(0) = p(1). What is p(2)?

3.124

Approximately 4% of silicon wafers produced by a manufacturer have fewer than two large ﬂaws. If Y , the number of ﬂaws per wafer, has a Poisson distribution, what proportion of the wafers have more than ﬁve large ﬂaws? [Hint: Use Table 3, Appendix 3.]

3.125

Refer to Exercise 3.122. If it takes approximately ten minutes to serve each customer, ﬁnd the mean and variance of the total service time for customers arriving during a 1-hour period. (Assume that a sufﬁcient number of servers are available so that no customer must wait for service.) Is it likely that the total service time will exceed 2.5 hours?

3.126

Refer to Exercise 3.122. Assume that arrivals occur according to a Poisson process with an average of seven per hour. What is the probability that exactly two customers arrive in the two-hour period of time between a 2:00 P.M. and 4:00 P.M. (one continuous two-hour period)? b 1:00 P.M. and 2:00 P.M. or between 3:00 P.M. and 4:00 P.M. (two separate one-hour periods that total two hours)?

3.127

The number of typing errors made by a typist has a Poisson distribution with an average of four errors per page. If more than four errors appear on a given page, the typist must retype the whole page. What is the probability that a randomly selected page does not need to be retyped?

3.128

Cars arrive at a toll both according to a Poisson process with mean 80 cars per hour. If the attendant makes a one-minute phone call, what is the probability that at least 1 car arrives during the call?

*3.129

Refer to Exercise 3.128. How long can the attendant’s phone call last if the probability is at least .4 that no cars arrive during the call?

3.130

A parking lot has two entrances. Cars arrive at entrance I according to a Poisson distribution at an average of three per hour and at entrance II according to a Poisson distribution at an average of four per hour. What is the probability that a total of three cars will arrive at the parking lot in a given hour? (Assume that the numbers of cars arriving at the two entrances are independent.)

3.131

The number of knots in a particular type of wood has a Poisson distribution with an average of 1.5 knots in 10 cubic feet of the wood. Find the probability that a 10-cubic-foot block of the wood has at most 1 knot.

3.132

The mean number of automobiles entering a mountain tunnel per two-minute period is one. An excessive number of cars entering the tunnel during a brief period of time produces a hazardous

Exercises

137

situation. Find the probability that the number of autos entering the tunnel during a two-minute period exceeds three. Does the Poisson model seem reasonable for this problem?

3.133

Assume that the tunnel in Exercise 3.132 is observed during ten two-minute intervals, thus giving ten independent observations Y1 , Y2 , . . . , Y10 , on the Poisson random variable. Find the probability that Y > 3 during at least one of the ten two-minute intervals.

3.134

Consider a binomial experiment for n = 20, p = .05. Use Table 1, Appendix 3, to calculate the binomial probabilities for Y = 0, 1, 2, 3, and 4. Calculate the same probabilities by using the Poisson approximation with λ = np. Compare.

3.135

A salesperson has found that the probability of a sale on a single contact is approximately .03. If the salesperson contacts 100 prospects, what is the approximate probability of making at least one sale?

3.136

Increased research and discussion have focused on the number of illnesses involving the organism Escherichia coli (10257:H7), which causes a breakdown of red blood cells and intestinal hemorrhages in its victims (http://www.hsus.org/ace/11831, March 24, 2004). Sporadic outbreaks of E.coli have occurred in Colorado at a rate of approximately 2.4 per 100,000 for a period of two years. a If this rate has not changed and if 100,000 cases from Colorado are reviewed for this year, what is the probability that at least 5 cases of E.coli will be observed? b If 100,000 cases from Colorado are reviewed for this year and the number of E.coli cases exceeded 5, would you suspect that the state’s mean E.coli rate has changed? Explain.

3.137

The probability that a mouse inoculated with a serum will contract a certain disease is .2. Using the Poisson approximation, ﬁnd the probability that at most 3 of 30 inoculated mice will contract the disease.

3.138

Let Y have a Poisson distribution with mean λ. Find E[Y (Y − 1)] and then use this to show that V (Y ) = λ.

3.139

In the daily production of a certain kind of rope, the number of defects per foot Y is assumed to have a Poisson distribution with mean λ = 2. The proﬁt per foot when the rope is sold is given by X , where X = 50 − 2Y − Y 2 . Find the expected proﬁt per foot.

∗

3.140

A store owner has overstocked a certain item and decides to use the following promotion to decrease the supply. The item has a marked price of $100. For each customer purchasing the item during a particular day, the owner will reduce the price by a factor of one-half. Thus, the ﬁrst customer will pay $50 for the item, the second will pay $25, and so on. Suppose that the number of customers who purchase the item during the day has a Poisson distribution with mean 2. Find the expected cost of the item at the end of the day. [Hint: The cost at the end of the day is 100(1/2)Y , where Y is the number of customers who have purchased the item.]

3.141

A food manufacturer uses an extruder (a machine that produces bite-size cookies and snack food) that yields revenue for the ﬁrm at a rate of $200 per hour when in operation. However, the extruder breaks down an average of two times every day it operates. If Y denotes the number of breakdowns per day, the daily revenue generated by the machine is R = 1600 − 50Y 2 . Find the expected daily revenue for the extruder.

∗

3.142

Let p(y) denote the probability function associated with a Poisson random variable with mean λ. a Show that the ratio of successive probabilities satisﬁes b

For which values of y is p(y) > p(y − 1)?

p(y) λ = , for y = 1, 2, . . . . p(y − 1) y

138

Chapter 3

Discrete Random Variables and Their Probability Distributions

c Notice that the result in part (a) implies that Poisson probabilities increase for awhile as y increases and decrease thereafter. Show that p(y) maximized when y = the greatest integer less than or equal to λ.

3.143

Refer to Exercise 3.142 (c). If the number of phone calls to the ﬁre department, Y , in a day has a Poisson distribution with mean 5.3, what is the most likely number of phone calls to the ﬁre department on any day?

3.144

Refer to Exercises 3.142 and 3.143. If the number of phone calls to the ﬁre department, Y , in a day has a Poisson distribution with mean 6, show that p(5) = p(6) so that 5 and 6 are the two most likely values for Y .

3.9 Moments and Moment-Generating Functions The parameters µ and σ are meaningful numerical descriptive measures that locate the center and describe the spread associated with the values of a random variable Y . They do not, however, provide a unique characterization of the distribution of Y . Many different distributions possess the same means and standard deviations. We now consider a set of numerical descriptive measures that (at least under certain conditions) uniquely determine p(y). DEFINITION 3.12

The kth moment of a random variable Y taken about the origin is deﬁned to be E(Y k ) and is denoted by µk . Notice in particular that the ﬁrst moment about the origin, is E(Y ) = µ1 = µ and that µ2 = E(Y 2 ) is employed in Theorem 3.6 for ﬁnding σ 2 . Another useful moment of a random variable is one taken about its mean.

DEFINITION 3.13

The kth moment of a random variable Y taken about its mean, or the kth central moment of Y , is deﬁned to be E[(Y − µ)k ] and is denoted by µk . In particular, σ 2 = µ2 . Let us concentrate on moments µk about the origin where k = 1, 2, 3, . . . . Suppose that two random variables Y and Z possess ﬁnite moments with µ1Y = µ1Z , µ2Y = µ2Z , . . . , µjY = µj Z , where j can assume any integer value. That is, the two random variables possess identical corresponding moments about the origin. Under some fairly general conditions, it can be shown that Y and Z have identical probability distributions. Thus, a major use of moments is to approximate the probability distribution of a random variable (usually an estimator or a decision maker). Consequently, the moments µk , where k = 1, 2, 3, . . . , are primarily of theoretical value for k > 3. Yet another interesting expectation is the moment-generating function for a random variable, which, ﬁguratively speaking, packages all the moments for a random variable

3.9

Moments and Moment-Generating Functions

139

into one simple expression. We will ﬁrst deﬁne the moment-generating function and then explain how it works. DEFINITION 3.14

The moment-generating function m(t) for a random variable Y is deﬁned to be m(t) = E(etY ). We say that a moment-generating function for Y exists if there exists a positive constant b such that m(t) is ﬁnite for |t| ≤ b. Why is E(etY ) called the moment-generating function for Y ? From a series expansion for et y , we have (t y)3 (t y)4 (t y)2 + + + ···. 2! 3! 4! Then, assuming that µk is ﬁnite for k = 1, 2, 3, . . . , we have (t y)3 (t y)2 + + · · · p(y) et y p(y) = 1 + ty + E(etY ) = 2! 3! y y et y = 1 + t y +

=

p(y) + t

y

yp(y) +

y

t2 2 t3 3 y p(y) + y p(y) + · · · 2! y 3! y

t t3 µ2 + µ3 + · · · . 2! 3! This argument involves an interchange of summations, which is justiﬁable if m(t) exists. Thus, E(etY ) is a function of all the moments µk about the origin, for k = 1, 2, 3, . . . . In particular, µk is the coefﬁcient of t k /k! in the series expansion of m(t). The moment-generating function possesses two important applications. First, if we can ﬁnd E(etY ), we can ﬁnd any of the moments for Y . = 1 + tµ1 +

THEOREM 3.12

2

If m(t) exists, then for any positive integer k, d k m(t) = m (k) (0) = µk . dt k t=0 In other words, if you ﬁnd the kth derivative of m(t) with respect to t and then set t = 0, the result will be µk .

Proof

d k m(t)/dt k , or m (k) (t), is the kth derivative of m(t) with respect to t. Because m(t) = E(etY ) = 1 + tµ1 +

t2 t3 µ2 + µ3 + · · · , 2! 3!

it follows that 2t µ + 2! 2 2t m (2) (t) = µ2 + µ3 + 2! m (1) (t) = µ1 +

3t 2 µ + ···, 3! 3 3t 2 µ + ···, 3! 4

140

Chapter 3

Discrete Random Variables and Their Probability Distributions

and, in general, 2t 3t 2 µk+1 + µ + ···. 2! 3! k+2 Setting t = 0 in each of the above derivatives, we obtain m (k) (t) = µk +

m (1) (0) = µ1 ,

m (2) (0) = µ2 ,

and, in general, m (k) (0) = µk . These operations involve interchanging derivatives and inﬁnite sums, which can be justiﬁed if m(t) exists.

E X A M PL E 3.23

Find the moment-generating function m(t) for a Poisson distributed random variable with mean λ.

Solution

m(t) = E(etY ) =

∞

et y p(y) =

y=0

=

∞ (λet ) y e−λ y=0

y!

∞

et y

y=0

= e−λ

∞ (λet ) y y=0

y!

λ y e−λ y! .

To complete the summation, consult Appendix A1.11 to ﬁnd the Taylor series expansion ∞ (λet ) y y=0

y!

= eλe

t

or employ the method of Theorem 3.11. Thus, multiply and divide by eλe . Then t

−λ λet

m(t) = e e

t ∞ (λet ) y e−λe

y=0

y!

.

The quantity to the right of the summation sign is the probability function for a Poisson random variable with mean λet . Hence, t t p(y) = 1 and m(t) = e−λ eλe (1) = eλ(e −1) . y

The calculations in Example 3.23 are no more difﬁcult than those in Theorem 3.11, where only the expected value for a Poisson random variable Y was calculated. Direct evaluation of the variance of Y through the use of Theorem 3.6 required that E(Y 2 ) be found by summing another series [actually, we obtained E(Y 2 ) from E[Y (Y − 1)] in Exercise 3.138]. Example 3.24 illustrates the use of the moment-generating function of the Poisson random variable to calculate its mean and variance.

3.9

Moments and Moment-Generating Functions

141

EXAMPLE 3.24

Use the moment-generating function of Example 3.23 and Theorem 3.12 to ﬁnd the mean, µ, and variance, σ 2 , for the Poisson random variable.

Solution

According to Theorem 3.12, µ = µ1 = m (1) (0) and µ2 = m (2) (0). Taking the ﬁrst and second derivatives of m(t), we obtain d λ(et −1) t [e ] = eλ(e −1) · λet , dt d2 d t t m (2) (t) = 2 [eλ(e −1) ] = [eλ(e −1) · λet ] dt dt m (1) (t) =

= eλ(e −1) · (λet )2 + eλ(e −1) · λet . t

t

Then, because t µ = m (1) (0) = eλ(e −1) · λet = λ, t=0 t t µ2 = m (2) (0) = eλ(e −1) · (λet )2 + eλ(e −1) · λet

t=0

= λ2 + λ,

Theorem 3.6 tells us that σ 2 = E(Y 2 ) − µ2 = µ2 − µ2 = λ2 + λ − (λ)2 = λ. Notice how easily we obtained µ2 from m(t).

The second (but primary) application of a moment-generating function is to prove that a random variable possesses a particular probability distribution p(y). If m(t) exists for a probability distribution p(y), it is unique. Also, if the moment-generating functions for two random variables Y and Z are equal (for all |t| < b for some b > 0), then Y and Z must have the same probability distribution. It follows that, if we can recognize the moment-generating function of a random variable Y to be one associated with a speciﬁc distribution, then Y must have that distribution. In summary, a moment-generating function is a mathematical expression that sometimes (but not always) provides an easy way to ﬁnd moments associated with random variables. More important, it can be used to establish the equivalence of two probability distributions. EXAMPLE 3.25

Suppose that Y is a random variable with moment-generating function m Y (t) = t e3.2(e −1) . What is the distribution of Y ?

Solution

In Example 3.23, we showed that the moment-generating function of a Poisson dist tributed random variable with mean λ is m(t) = eλ(e −1) . Note that the momentgenerating function of Y is exactly equal to the moment-generating function of a Poisson distributed random variable with λ = 3.2. Because moment-generating functions are unique, Y must have a Poisson distribution with mean 3.2.

142

Chapter 3

Discrete Random Variables and Their Probability Distributions

Exercises 3.145

3.146 3.147

3.148

If Y has a binomial distribution with n trials and probability of success p, show that the moment-generating function for Y is m(t) = ( pet + q)n , where q = 1 − p. Differentiate the moment-generating function in Exercise 3.145 to ﬁnd E(Y ) and E(Y 2 ). Then ﬁnd V (Y ). If Y has a geometric distribution with probability of success p, show that the moment-generating function for Y is pet m(t) = , where q = 1 − p. 1 − qet Differentiate the moment-generating function in Exercise 3.147 to ﬁnd E(Y ) and E(Y 2 ). Then ﬁnd V (Y ).

3.149

Refer to Exercise 3.145. Use the uniqueness of moment-generating functions to give the distribution of a random variable with moment-generating function m(t) = (.6et + .4)3 .

3.150

Refer to Exercise 3.147. Use the uniqueness of moment-generating functions to give the dis.3et tribution of a random variable with moment-generating function m(t) = . 1 − .7et

3.151

Refer to Exercise 3.145. If Y has moment-generating function m(t) = (.7et + .3)10 , what is P(Y ≤ 5)?

3.152

Refer to Example 3.23. If Y has moment-generating function m(t) = e6(e −1) , what is P(|Y − µ| ≤ 2σ )?

3.153

Find the distributions of the random variables that have each of the following momentgenerating functions:

t

a m(t) = [(1/3)et + (2/3)]5 . et . b m(t) = 2 − et t c m(t) = e2(e −1) .

3.154

Refer to Exercise 3.153. By inspection, give the mean and variance of the random variables associated with the moment-generating functions given in parts (a), (b), and (c).

3.155

Let m(t) = (1/6)et + (2/6)e2t + (3/6)e3t . Find the following: a E(Y ) b V (Y ) c The distribution of Y

3.156

Suppose that Y is a random variable with moment-generating function m(t). a What is m(0)? b If W = 3Y , show that the moment-generating function of W is m(3t). c If X = Y − 2, show that the moment-generating function of X is e−2t m(t).

3.157

Refer to Exercise 3.156. a If W = 3Y , use the moment-generating function of W to show that E(W ) = 3E(Y ) and V (W ) = 9V (Y ). b If X = Y − 2, use the moment-generating function of X to show that E(X ) = E(Y ) − 2 and V (X ) = V (Y ).

3.10

Probability-Generating Functions (Optional)

143

3.158

If Y is a random variable with moment-generating function m(t) and if W is given by W = aY + b, show that the moment-generating function of W is etb m(at).

3.159

Use the result in Exercise 3.158 to prove that, if W = aY + b, then E(W ) = a E(Y ) + b and V (W ) = a 2 V (Y ).

3.160

Suppose that Y is a binomial random variable based on n trials with success probability p and let Y = n − Y . a Use the result in Exercise 3.159 to show that E(Y ) = nq and V (Y ) = npq, where q = 1 − p. b Use the result in Exercise 3.158 to show that the moment-generating function of Y is m (t) = (qet + p)n , where q = 1 − p. c Based on your answer to part (b), what is the distribution of Y ? d If Y is interpreted as the number of successes in a sample of size n, what is the interpretation of Y ? e Based on your answer in part (d), why are the answers to parts (a), (b), and (c) “obvious”?

3.161

Refer to Exercises 3.147 and 3.158. If Y has a geometric distribution with success probability p, p consider Y = Y − 1. Show that the moment-generating function of Y is m (t) = , 1 − qet where q = 1 − p.

∗

3.162

Let r (t) = ln[m(t)] and r (k) (0) denote the kth derivative of r (t) evaluated for t = 0. Show that r (1) (0) = µ1 = µ and r (2) (0) = µ2 − (µ1 )2 = σ 2 [Hint: m(0) = 1.]

∗

3.163

Use the results of Exercise 3.162 to ﬁnd the mean and variance of a Poisson random variable t with m(t) = e5(e −1) . Notice that r (t) is easier to differentiate than m(t) in this case.

3.10 Probability-Generating Functions (Optional) An important class of discrete random variables is one in which Y represents a count and consequently takes integer values: Y = 0, 1, 2, 3, . . . . The binomial, geometric, hypergeometric, and Poisson random variables all fall in this class. The following examples give practical situations that result in integer-valued random variables. One, involving the theory of queues (waiting lines), is concerned with the number of persons (or objects) awaiting service at a particular point in time. Knowledge of the behavior of this random variable is important in designing manufacturing plants where production consists of a sequence of operations, each taking a different length of time to complete. An insufﬁcient number of service stations for a particular production operation can result in a bottleneck, the formation of a queue of products waiting to be serviced, and a resulting slowdown in the manufacturing operation. Queuing theory is also important in determining the number of checkout counters needed for a supermarket and in designing hospitals and clinics. Integer-valued random variables are also important in studies of population growth. For example, epidemiologists are interested in the growth of bacterial populations and the growth of the number of persons afﬂicted by a particular disease. The numbers of elements in each of these populations are integer-valued random variables.

144

Chapter 3

Discrete Random Variables and Their Probability Distributions

A mathematical device useful in ﬁnding the probability distributions and other properties of integer-valued random variables is the probability-generating function. DEFINITION 3.15

Let Y be an integer-valued random variable for which P(Y = i) = pi , where i = 0, 1, 2, . . . . The probability-generating function P(t) for Y is deﬁned to be ∞ P(t) = E(t Y ) = p0 + p1 t + p2 t 2 + · · · = pi t i i=0

for all values of t such that P(t) is ﬁnite. The reason for calling P(t) a probability-generating function is clear when we compare P(t) with the moment-generating function m(t). In particular, the coefﬁcient of t i in P(t) is the probability pi . Correspondingly, the coefﬁcient of t i for m(t) is a constant times the ith moment µi . If we know P(t) and can expand it into a series, we can determine p(y) as the coefﬁcient of t y . Repeated differentiation of P(t) yields factorial moments for the random variable Y . DEFINITION 3.16

The kth factorial moment for a random variable Y is deﬁned to be µ[k] = E[Y (Y − 1)(Y − 2) · · · (Y − k + 1)], where k is a positive integer. Notice that µ[1] = E(Y ) = µ. The second factorial moment, µ[2] = E[Y (Y − 1)], was useful in ﬁnding the variance for binomial, geometric, and Poisson random variables in Theorem 3.7, Exercise 3.85, and Exercise 3.138, respectively.

THEOREM 3.13

Proof

If P(t) is the probability-generating function for an integer-valued random variable, Y , then the kth factorial moment of Y is given by d k P(t) = P (k) (1) = µ[k] . dt k t=1 Because P(t) = p0 + p1 t + p2 t 2 + p3 t 3 + p4 t 4 + · · · , it follows that d P(t) = p1 + 2 p 2 t + 3 p 3 t 2 + 4 p 4 t 3 + · · · , dt d 2 P(t) = (2)(1) p2 + (3)(2) p3 t + (4)(3) p4 t 2 + · · · , P (2) (t) = dt 2

P (1) (t) =

3.10

and, in general, P (k) (t) =

Probability-Generating Functions (Optional)

145

∞ d k P(t) = y(y − 1)(y − 2) · · · (y − k + 1) p(y)t y−k . dt k y=k

Setting t = 1 in each of these derivatives, we obtain P (1) (1) = p1 + 2 p2 + 3 p3 + 4 p4 + · · · = µ[1] = E(Y ), P (2) (1) = (2)(1) p2 + (3)(2) p3 + (4)(3) p4 + · · · = µ[2] = E[Y (Y − 1)], and, in general, P (k) (1) =

∞

y(y − 1)(y − 2) · · · (y − k + 1) p(y)

y=k

= E[Y (Y − 1)(Y − 2) · · · (Y − k + 1)] = µ[k] .

EXAMPLE 3.26 Solution

Find the probability-generating function for a geometric random variable. Notice that p0 = 0 because Y cannot assume this value. Then P(t) = E(t Y ) =

∞ y=1

=

t y q y−1 p =

∞ p (qt) y q y=1

p [qt + (qt)2 + (qt)3 + · · ·]. q

The terms in the series are those of an inﬁnite geometric progression. If qt < 1, then qt pt p , if t < 1/q. = P(t) = q 1 − qt 1 − qt (For summation of the series, consult Appendix A1.11.)

EXAMPLE 3.27 Solution

Use P(t), Example 3.26, to ﬁnd the mean of a geometric random variable. From Theorem 3.13, µ[1] = µ = P (1) (1). Using the result in Example 3.26, d pt (1 − qt) p − ( pt)(−q) (1) P (t) = . = dt 1 − qt (1 − qt)2 Setting t = 1, we obtain P (1) (1) =

p 2 + pq p( p + q) 1 = = . 2 2 p p p

146

Chapter 3

Discrete Random Variables and Their Probability Distributions

Because we already have the moment-generating function to assist in ﬁnding the moments of a random variable, of what value is P(t)? The answer is that it may be difﬁcult to ﬁnd m(t) but much easier to ﬁnd P(t). Thus, P(t) provides an additional tool for ﬁnding the moments of a random variable. It may or may not be useful in a given situation. Finding the moments of a random variable is not the major use of the probabilitygenerating function. Its primary application is in deriving the probability function (and hence the probability distribution) for other related integer-valued random variables. For these applications, see Feller (1968) and Parzen (1992).

Exercises ∗

3.164

Let Y denote a binomial random variable with n trials and probability of success p. Find the probability-generating function for Y and use it to ﬁnd E(Y ).

∗

3.165

Let Y denote a Poisson random variable with mean λ. Find the probability-generating function for Y and use it to ﬁnd E(Y ) and V (Y ).

∗

3.166

Refer to Exercise 3.165. Use the probability-generating function found there to ﬁnd E(Y 3 ).

3.11 Tchebysheff’s Theorem We have seen in Section 1.3 and Example 3.22 that if the probability or population histogram is roughly bell-shaped and the mean and variance are known, the empirical rule is of great help in approximating the probabilities of certain intervals. However, in many instances, the shapes of probability histograms differ markedly from a mound shape, and the empirical rule may not yield useful approximations to the probabilities of interest. The following result, known as Tchebysheff’s theorem, can be used to determine a lower bound for the probability that the random variable Y of interest falls in an interval µ ± kσ . THEOREM 3.14

Tchebysheff’s Theorem Let Y be a random variable with mean µ and ﬁnite variance σ 2 . Then, for any constant k > 0, 1 1 P(|Y − µ| < kσ ) ≥ 1 − 2 or P(|Y − µ| ≥ kσ ) ≤ 2 . k k Two important aspects of this result should be pointed out. First, the result applies for any probability distribution, whether the probability histogram is bell-shaped or not. Second, the results of the theorem are very conservative in the sense that the actual probability that Y is in the interval µ ± kσ usually exceeds the lower bound for the probability, 1 − 1/k 2 , by a considerable amount. However, as discussed in Exercise 3.169, for any k > 1, it is possible to construct a probability distribution so that, for that k, the bound provided by Tchebysheff’s theorem is actually attained. (You should verify that the results of the empirical rule do not contradict those given by Theorem 3.14.) The proof of this theorem will be deferred to Section 4.10. The usefulness of this theorem is illustrated in the following example.

Exercises

147

EXAMPLE 3.28

The number of customers per day at a sales counter, Y , has been observed for a long period of time and found to have mean 20 and standard deviation 2. The probability distribution of Y is not known. What can be said about the probability that, tomorrow, Y will be greater than 16 but less than 24?

Solution

We want to ﬁnd P(16 < Y < 24). From Theorem 3.14 we know that, for any k ≥ 0, P(|Y − µ| < kσ ) ≥ 1 − 1/k 2 , or 1 . k2 Because µ = 20 and σ = 2, it follows that µ − kσ = 16 and µ + kσ = 24 if k = 2. Thus, 3 1 = . P(16 < Y < 24) = P(µ − 2σ < Y < µ + 2σ ) ≥ 1 − 2 (2) 4 P[(µ − kσ ) < Y < (µ + kσ )] ≥ 1 −

In other words, tomorrow’s customer total will be between 16 and 24 with a fairly high probability (at least 3/4). Notice that if σ were 1, k would be 4, and 15 1 . = P(16 < Y < 24) = P(µ − 4σ < Y < µ + 4σ ) ≥ 1 − 2 (4) 16 Thus, the value of σ has considerable effect on probabilities associated with intervals.

Exercises 3.167

Let Y be a random variable with mean 11 and variance 9. Using Tchebysheff’s theorem, ﬁnd a a lower bound for P(6 < Y < 16). b the value of C such that P(|Y − 11| ≥ C) ≤ .09.

3.168

Would you rather take a multiple-choice test or a full-recall test? If you have absolutely no knowledge of the test material, you will score zero on a full-recall test. However, if you are given 5 choices for each multiple-choice question, you have at least one chance in ﬁve of guessing each correct answer! Suppose that a multiple-choice exam contains 100 questions, each with 5 possible answers, and guess the answer to each of the questions. a b c d

3.169

What is the expected value of the number Y of questions that will be correctly answered? Find the standard deviation of Y . Calculate the intervals µ ± 2σ and µ ± 3σ . If the results of the exam are curved so that 50 correct answers is a passing score, are you likely to receive a passing score? Explain.

This exercise demonstrates that, in general, the results provided by Tchebysheff’s theorem cannot be improved upon. Let Y be a random variable such that p(−1) =

1 , 18

p(0) =

16 , 18

p(1) =

1 . 18

148

Chapter 3

Discrete Random Variables and Their Probability Distributions

a Show that E(Y ) = 0 and V (Y ) = 1/9. b Use the probability distribution of Y to calculate P(|Y − µ| ≥ 3σ ). Compare this exact probability with the upper bound provided by Tchebysheff’s theorem to see that the bound provided by Tchebysheff’s theorem is actually attained when k = 3. *c In part (b) we guaranteed E(Y ) = 0 by placing all probability mass on the values −1, 0, and 1, with p(−1) = p(1). The variance was controlled by the probabilities assigned to p(−1) and p(1). Using this same basic idea, construct a probability distribution for a random variable X that will yield P(|X − µ X | ≥ 2σ X ) = 1/4. *d If any k > 1 is speciﬁed, how can a random variable W be constructed so that P(|W −µW | ≥ kσW ) = 1/k 2 ?

3.170

The U.S. mint produces dimes with an average diameter of .5 inch and standard deviation .01. Using Tchebysheff’s theorem, ﬁnd a lower bound for the number of coins in a lot of 400 coins that are expected to have a diameter between .48 and .52.

3.171

For a certain type of soil the number of wireworms per cubic foot has a mean of 100. Assuming a Poisson distribution of wireworms, give an interval that will include at least 5/9 of the sample values of wireworm counts obtained from a large number of 1-cubic-foot samples.

3.172

Refer to Exercise 3.115. Using the probability histogram, ﬁnd the fraction of values in the population that fall within 2 standard deviations of the mean. Compare your result with that of Tchebysheff’s theorem.

3.173

A balanced coin is tossed three times. Let Y equal the number of heads observed. a

Use the formula for the binomial probability distribution to calculate the probabilities associated with Y = 0, 1, 2, and 3. b Construct a probability distribution similar to the one in Table 3.1. c Find the expected value and standard deviation of Y , using the formulas E(Y ) = np and V (Y ) = npq. d Using the probability distribution from part (b), ﬁnd the fraction of the population measurements lying within 1 standard deviation of the mean. Repeat for 2 standard deviations. How do your results compare with the results of Tchebysheff’s theorem and the empirical rule?

3.174

Suppose that a coin was deﬁnitely unbalanced and that the probability of a head was equal to p = .1. Follow instructions (a), (b), (c), and (d) as stated in Exercise 3.173. Notice that the probability distribution loses its symmetry and becomes skewed when p is not equal to 1/2.

3.175

In May 2005, Tony Blair was elected to an historic third term as the British prime minister. A Gallop U.K. poll (http://gallup.com/poll/content/default.aspx?ci=1710, June 28, 2005) conducted after Blair’s election indicated that only 32% of British adults would like to see their son or daughter grow up to become prime minister. If the same proportion of Americans would prefer that their son or daughter grow up to be president and 120 American adults are interviewed, a what is the expected number of Americans who would prefer their child grow up to be president? b what is the standard deviation of the number Y who would prefer that their child grow up to be president? c is it likely that the number of Americans who prefer that their child grow up to be president exceeds 40?

3.176

A national poll of 549 teenagers (aged 13 to 17) by the Gallop poll (http://gallup.com/content/ default.aspx?ci=17110), April, 2005) indicated that 85% “think that clothes that display gang symbols” should be banned at school. If teenagers were really evenly split in their opinions

3.12

Summary

149

regarding banning of clothes that display gang symbols, comment on the probability of observing this survey result (that is, observing 85% or more in a sample of 549 who are in favor of banning clothes that display gang symbols). What assumption must be made about the sampling procedure in order to calculate this probability? [Hint: Recall Tchebysheff’s theorem and the empirical rule.]

3.177

For a certain section of a pine forest, the number of diseased trees per acre, Y , has a Poisson distribution with mean λ = 10. The diseased trees are sprayed with an insecticide at a cost of $3 per tree, plus a ﬁxed overhead cost for equipment rental of $50. Letting C denote the total spraying cost for a randomly selected acre, ﬁnd the expected value and standard deviation for C. Within what interval would you expect C to lie with probability at least .75?

3.178

It is known that 10% of a brand of television tubes will burn out before their guarantee has expired. If 1000 tubes are sold, ﬁnd the expected value and variance of Y , the number of original tubes that must be replaced. Within what limits would Y be expected to fall?

3.179

Refer to Exercise 3.91. In this exercise, we determined that the mean and variance of the costs necessary to ﬁnd three employees with positive indications of asbestos poisoning were 150 and 4500, respectively. Do you think it is highly unlikely that the cost of completing the tests will exceed $350?

3.12 Summary This chapter has explored discrete random variables, their probability distributions, and their expected values. Calculating the probability distribution for a discrete random variable requires the use of the probabilistic methods of Chapter 2 to evaluate the probabilities of numerical events. Probability functions, p(y) = P(Y = y), were derived for binomial, geometric, negative binomial, hypergeometric, and Poisson random variables. These probability functions are sometimes called probability mass functions because they give the probability (mass) assigned to each of the ﬁnite or countably inﬁnite possible values for these discrete random variables. The expected values of random variables and functions of random variables provided a method for ﬁnding the mean and variance of Y and consequently measures of centrality and variation for p(y). Much of the remaining material in the chapter was devoted to the techniques for acquiring expectations, which sometimes involved summing apparently intractable series. The techniques for obtaining closed-form expressions for some of the resulting expected values included (1) use of the fact that p(y) = 1 for any discrete random variable and (2) E(Y 2 ) = E[Y (Y −1)]+ E(Y ). y The means and variances of several of the more common discrete distributions are summarized in Table 3.4. These results and more are also found in Table A2.1 in Appendix 2 and inside the back cover of this book. Table 3.5 gives the R (and S-Plus) procedures that yield p(y0 ) = P(Y = y0 ) and P(Y ≤ y0 ) for random variables with binomial, geometric, negative binomial, hypergeometric, and Poisson distributions. We then discussed the moment-generating function associated with a random variable. Although sometimes useful in ﬁnding µ and σ , the moment-generating function is of primary value to the theoretical statistician for deriving the probability distribution of a random variable. The moment-generating functions for most of the common random variables are found in Appendix 2 and inside the back cover of this book.

150

Chapter 3

Discrete Random Variables and Their Probability Distributions

Table 3.4 Means and variances for some common discrete random variables

Distribution

E(Y )

Binomial

np

np(1 − p) = npq

1 p r n N

1− p q = 2 p2 p r N −r N − n n N N N −1

Poisson

λ

λ

Negative binomial

r p

rq r (1 − p) = 2 p2 p

Geometric Hypergeometric

V (Y )

Table 3.5 R (and S-Plus) procedures giving probabilities for some common discrete distributions

Distribution

P(Y = y0 ) = p(y0 )

P(Y ≤ y0 )

Binomial

dbinom(y0 ,n,p)

pbinom(y0 ,n,p)

Geometric

dgeom(y0 -1,p)

pgeom(y0 -1,p)

Hypergeometric

dhyper(y0 ,r,N-r,n)

phyper(y0 ,r,N-r,n)

Poisson

dpois(y0 , λ)

ppois(y0 , λ)

Negative binomial

dnbinom(y0 -r,r,p)

pnbinom(y0 -r,r,p)

The probability-generating function is a useful device for deriving moments and probability distributions of integer-valued random variables. Finally, we gave Tchebysheff’s theorem a very useful result that permits approximating certain probabilities when only the mean and variance are known. To conclude this summary, we recall the primary objective of statistics: to make an inference about a population based on information contained in a sample. Drawing the sample from the population is the experiment. The sample is often a set of measurements of one or more random variables, and it is the observed event resulting from a single repetition of the experiment. Finally, making the inference about the population requires knowledge of the probability of occurrence of the observed sample, which in turn requires knowledge of the probability distributions of the random variables that generated the sample.

References and Further Readings Feller, W. 1968. An Introduction to Probability Theory and Its Applications, 3d ed., vol. 1. New York: Wiley. Goranson, U. G., and J. Hall. 1980. “Airworthiness of Long-Life Jet Transport Structures,” Aeronautical Journal 84(838): 279–80.

Supplementary Exercises

151

Hogg, R. V., A. T. Craig, and J. W. McKean. 2005. Introduction to Mathematical Statistics, 6th ed. Upper Saddle River, N.J.: Pearson Prentice Hall. Johnson, N. L., S. Kotz, and A. W. Kemp. 1993. Univariate Discrete Distributions, 2d ed. New York: Wiley. Mosteller, F., R. E. K. Rourke, and G. B. Thomas. 1970. Probability with Statistical Applications, 2d ed. Reading, Mass. Addison-Wesley. Parzen, E. 1964. Stochastic Processes. San Francisco: Holden-Day. ———. 1992. Modern Probability Theory and Its Applications. New York: Wiley-Interscience. Zwilliger, D. 2002. CRC Standard Mathematical Tables, 31st ed. Boca Raton, Fla.: CRC Press.

Supplementary Exercises 3.180

Four possibly winning numbers for a lottery—AB-4536, NH-7812, SQ-7855, and ZY-3221— arrive in the mail. You will win a prize if one of your numbers matches one of the winning numbers contained on a list held by those conducting the lottery. One ﬁrst prize of $100,000, two second prizes of $50,000 each, and ten third prizes of $1000 each will be awarded. To be eligible to win, you need to mail the coupon back to the company at a cost of 33¢ for postage. No purchase is required. From the structure of the numbers that you received, it is obvious the numbers sent out consist of two letters followed by four digits. Assuming that the numbers you received were generated at random, what are your expected winnings from the lottery? Is it worth 33¢ to enter this lottery?

3.181

Sampling for defectives from large lots of manufactured product yields a number of defectives, Y , that follows a binomial probability distribution. A sampling plan consists of specifying the number of items n to be included in a sample and an acceptance number a. The lot is accepted if Y ≤ a and rejected if Y > a. Let p denote the proportion of defectives in the lot. For n = 5 and a = 0, calculate the probability of lot acceptance if (a) p = 0, (b) p = .1, (c) p = .3, (d) p = .5, (e) p = 1.0. A graph showing the probability of lot acceptance as a function of lot fraction defective is called the operating characteristic curve for the sample plan. Construct the operating characteristic curve for the plan n = 5, a = 0. Notice that a sampling plan is an example of statistical inference. Accepting or rejecting a lot based on information contained in the sample is equivalent to concluding that the lot is either good or bad. “Good” implies that a low fraction is defective and that the lot is therefore suitable for shipment.

3.182

Refer to Exercise 3.181. Use Table 1, Appendix 3, to construct the operating characteristic curves for the following sampling plans: a n = 10, a = 0. b n = 10, a = 1. c n = 10, a = 2. For each sampling plan, calculate P(lot acceptance) for p = 0, .05, .1, .3, .5, and 1.0. Our intuition suggests that sampling plan (a) would be much less likely to accept bad lots than plans (b) and (c). A visual comparison of the operating characteristic curves will conﬁrm this intuitive conjecture.

152

Chapter 3

Discrete Random Variables and Their Probability Distributions

3.183

A quality control engineer wishes to study alternative sampling plans: n = 5, a = 1 and n = 25, a = 5. On a sheet of graph paper, construct the operating characteristic curves for both plans, making use of acceptance probabilities at p = .05, p = .10, p = .20, p = .30, and p = .40 in each case. If you were a seller producing lots with fraction defective ranging from p = 0 to p = .10, which of the two sampling plans would you prefer? b If you were a buyer wishing to be protected against accepting lots with fraction defective exceeding p = .30, which of the two sampling plans would you prefer? a

3.184

A city commissioner claims that 80% of the people living in the city favor garbage collection by contract to a private company over collection by city employees. To test the commissioner’s claim, 25 city residents are randomly selected, yielding 22 who prefer contracting to a private company. a If the commissioner’s claim is correct, what is the probability that the sample would contain at least 22 who prefer contracting to a private company? b If the commissioner’s claim is correct, what is the probability that exactly 22 would prefer contracting to a private company? c Based on observing 22 in a sample of size 25 who prefer contracting to a private company, what do you conclude about the commissioner’s claim that 80% of city residents prefer contracting to a private company?

3.185

Twenty students are asked to select an integer between 1 and 10. Eight choose either 4, 5 or 6. a If the students make their choices independently and each is as likely to pick one integer as any other, what is the probability that 8 or more will select 4,5 or 6? b Having observed eight students who selected 4, 5, or 6, what conclusion do you draw based on your answer to part (a)?

3.186

Refer to Exercises 3.67 and 3.68. Let Y denote the number of the trial on which the ﬁrst applicant with computer training was found. If each interview costs $30, ﬁnd the expected value and variance of the total cost incurred interviewing candidates until an applicant with advanced computer training is found. Within what limits would you expect the interview costs to fall?

3.187

Consider the following game: A player throws a fair die repeatedly until he rolls a 2, 3, 4, 5, or 6. In other words, the player continues to throw the die as long as he rolls 1s. When he rolls a “non-1,” he stops. a What is the probability that the player tosses the die exactly three times? b What is the expected number of rolls needed to obtain the ﬁrst non-1? c If he rolls a non-1 on the ﬁrst throw, the player is paid $1. Otherwise, the payoff is doubled for each 1 that the player rolls before rolling a non-1. Thus, the player is paid $2 if he rolls a 1 followed by a non-1; $4 if he rolls two 1s followed by a non-1; $8 if he rolls three 1s followed by a non-1; etc. In general, if we let Y be the number of throws needed to obtain the ﬁrst non-1, then the player rolls (Y − 1) 1s before rolling his ﬁrst non-1, and he is paid 2Y −1 dollars. What is the expected amount paid to the player?

3.188

If Y is a binomial random variable based on n trials and success probability p, show that P(Y > 1|Y ≥ 1) =

1 − (1 − p)n − np(1 − p)n−1 . 1 − (1 − p)n

Supplementary Exercises

153

3.189

A starter motor used in a space vehicle has a high rate of reliability and was reputed to start on any given occasion with probability .99999. What is the probability of at least one failure in the next 10,000 starts?

3.190

Refer to Exercise 3.115. Find µ, the expected value of Y , for the theoretical population by using the probability distribution obtained in Exercise 3.115. Find the sample mean y for the n = 100 measurements generated in Exercise 3.116. Does y provide a good estimate of µ?

3.191

Find the population variance σ 2 for Exercise 3.115 and the sample variance s 2 for Exercise 3.116. Compare.

3.192

Toss a balanced die and let Y be the number of dots observed on the upper face. Find the mean and variance of Y . Construct a probability histogram, and locate the interval µ ± 2σ . Verify that Tchebysheff’s theorem holds.

3.193

Two assembly lines I and II have the same rate of defectives in their production of voltage regulators. Five regulators are sampled from each line and tested. Among the total of ten tested regulators, four are defective. Find the probability that exactly two of the defective regulators came from line I.

3.194

One concern of a gambler is that she will go broke before achieving her ﬁrst win. Suppose that she plays a game in which the probability of winning is .1 (and is unknown to her). It costs her $10 to play and she receives $80 for a win. If she commences with $30, what is the probability that she wins exactly once before she loses her initial capital?

3.195

The number of imperfections in the weave of a certain textile has a Poisson distribution with a mean of 4 per square yard. Find the probability that a a 1-square-yard sample will contain at least one imperfection. b 3-square-yard sample will contain at least one imperfection.

3.196

Refer to Exercise 3.195. The cost of repairing the imperfections in the weave is $10 per imperfection. Find the mean and standard deviation of the repair cost for an 8-square-yard bolt of the textile.

3.197

The number of bacteria colonies of a certain type in samples of polluted water has a Poisson distribution with a mean of 2 per cubic centimeter (cm3 ). If four 1-cm3 samples are independently selected from this water, ﬁnd the probability that at least one sample will contain one or more bacteria colonies. b How many 1-cm3 samples should be selected in order to have a probability of approximately .95 of seeing at least one bacteria colony? a

3.198

One model for plant competition assumes that there is a zone of resource depletion around each plant seedling. Depending on the size of the zones and the density of the plants, the zones of resource depletion may overlap with those of other seedlings in the vicinity. When the seeds are randomly dispersed over a wide area, the number of neighbors that any seedling has within an area of size A usually follows a Poisson distribution with mean equal to A × d, where d is the density of seedlings per unit area. Suppose that the density of seedlings is four per square meter. What is the probability that a speciﬁed seeding has a no neighbors within 1 meter? b at most three neighbors within 2 meters?

3.199

Insulin-dependent diabetes (IDD) is a common chronic disorder in children. The disease occurs most frequently in children of northern European descent, but the incidence ranges from a low

154

Chapter 3

Discrete Random Variables and Their Probability Distributions

of 1–2 cases per 100,000 per year to a high of more than 40 cases per 100,000 in parts of Finland.4 Let us assume that a region in Europe has an incidence of 30 cases per 100,000 per year and that we randomly select 1000 children from this region. a

b

3.200

Can the distribution of the number of cases of IDD among those in the sample be approximated by a Poisson distribution? If so, what is the mean of the approximating Poisson distribution? What is the probability that we will observe at least two cases of IDD among the 1000 children in the sample?

Using the fact that z3 z4 z2 + + + ···, 2! 3! 4! expand the moment-generating function for the binomial distribution ez = 1 + z +

m(t) = (q + pet )n into a power series in t. (Acquire only the low-order terms in t.) Identify µi as the coefﬁcient of t i /i! appearing in the series. Speciﬁcally, ﬁnd µ1 and µ2 and compare them with the results of Exercise 3.146.

3.201 ∗

3.202

Refer to Exercises 3.103 and 3.106. In what interval would you expect the repair costs on these ﬁve machines to lie? (Use Tchebysheff’s theorem.) The number of cars driving past a parking area in a one-minute time interval has a Poisson distribution with mean λ. The probability that any individual driver actually wants to park his or her car is p. Assume that individuals decide whether to park independently of one another. a If one parking place is available and it will take you one minute to reach the parking area, what is the probability that a space will still be available when you reach the lot? (Assume that no one leaves the lot during the one-minute interval.) b Let W denote the number of drivers who wish to park during a one-minute interval. Derive the probability distribution of W .

3.203

A type of bacteria cell divides at a constant rate λ over time. (That is, the probability that a cell divides in a small interval of time t is approximately λt.) Given that a population starts out at time zero with k cells of this bacteria and that cell divisions are independent of one another, the size of the population at time t, Y (t), has the probability distribution n−k n − 1 −λkt P[Y (t) = n] = e 1 − e−λt , n = k, k + 1, . . . . k−1 a Find the expected value and variance of Y (t) in terms of λ and t. b If, for a type of bacteria cell, λ = .1 per second and the population starts out with two cells at time zero, ﬁnd the expected value and variance of the population after ﬁve seconds.

3.204

The probability that any single driver will turn left at an intersection is .2. The left turn lane at this intersection has room for three vehicles. If the left turn lane is empty when the light turns red and ﬁve vehicles arrive at this intersection while the light is red, ﬁnd the probability that the left turn lane will hold the vehicles of all of the drivers who want to turn left.

3.205

An experiment consists of tossing a fair die until a 6 occurs four times. What is the probability that the process ends after exactly ten tosses with a 6 occurring on the ninth and tenth tosses? 4. M. A. Atkinson,“Diet, Genetics, and Diabetes,” Food Technology 51(3), (1997): 77.

Supplementary Exercises

155

3.206

Accident records collected by an automobile insurance company give the following information. The probability that an insured driver has an automobile accident is .15. If an accident has occurred, the damage to the vehicle amounts to 20% of its market value with a probability of .80, to 60% of its market value with a probability of .12, and to a total loss with a probability of .08. What premium should the company charge on a $12,000 car so that the expected gain by the company is zero?

3.207

The number of people entering the intensive care unit at a hospital on any single day possesses a Poisson distribution with a mean equal to ﬁve persons per day. a What is the probability that the number of people entering the intensive care unit on a particular day is equal to 2? Is less than or equal to 2? b Is it likely that Y will exceed 10? Explain.

3.208

A recent survey suggests that Americans anticipate a reduction in living standards and that a steadily increasing level of consumption no longer may be as important as it was in the past. Suppose that a poll of 2000 people indicated 1373 in favor of forcing a reduction in the size of American automobiles by legislative means. Would you expect to observe as many as 1373 in favor of this proposition if, in fact, the general public was split 50–50 on the issue? Why?

3.209

A supplier of heavy construction equipment has found that new customers are normally obtained through customer requests for a sales call and that the probability of a sale of a particular piece of equipment is .3. If the supplier has three pieces of the equipment available for sale, what is the probability that it will take fewer than ﬁve customer contacts to clear the inventory?

3.210

Calculate P(|Y − λ| ≤ 2σ ) for the Poisson probability distribution of Example 3.22. Does this agree with the empirical rule?

*3.211

A merchant stocks a certain perishable item. She knows that on any given day she will have a demand for either two, three, or four of these items with probabilities .1, .4, and .5, respectively. She buys the items for $1.00 each and sells them for $1.20 each. If any are left at the end of the day, they represent a total loss. How many items should the merchant stock in order to maximize her expected daily proﬁt?

*3.212

Show that the hypergeometric probability function approaches the binomial in the limit as N → ∞ and p = r/N remains constant. That is, show that r N −r lim

N →∞

y

n−y

N n

=

n y n−y p q , y

for p = r/N constant.

3.213

A lot of N = 100 industrial products contains 40 defectives. Let Y be the number of defectives in a random sample of size 20. Find p(10) by using (a) the hypergeometric probability distribution and (b) the binomial probability distribution. Is N large enough that the value for p(10) obtained from the binomial distribution is a good approximation to that obtained using the hypergeometric distribution?

*3.214

For simplicity, let us assume that there are two kinds of drivers. The safe drivers, who are 70% of the population, have probability .1 of causing an accident in a year. The rest of the population are accident makers, who have probability .5 of causing an accident in a year. The insurance premium is $400 times one’s probability of causing an accident in the following year. A new subscriber has an accident during the ﬁrst year. What should be his insurance premium for the next year?

156

Chapter 3

Discrete Random Variables and Their Probability Distributions

*3.215

It is known that 5% of the members of a population have disease A, which can be discovered by a blood test. Suppose that N (a large number) people are to be tested. This can be done in two ways: (1) Each person is tested separately, or (2) the blood samples of k people are pooled together and analyzed. (Assume that N = nk, with n an integer.) If the test is negative, all of them are healthy (that is, just this one test is needed). If the test is positive, each of the k persons must be tested separately (that is, a total of k + 1 tests are needed). a For ﬁxed k, what is the expected number of tests needed in option 2? b Find the k that will minimize the expected number of tests in option 2. c If k is selected as in part (b), on the average how many tests does option 2 save in comparison with option 1?

*3.216

Let Y have a hypergeometric distribution r N −r p(y) =

y

n−y

N

,

y = 0, 1, 2, . . . , n.

n

a

Show that P(Y = n) = p(n) =

r r − 1 r − 2 r −n+1 ··· . N N −1 N −2 N −n+1

b Write p(y) as p(y|r ). Show that if r1 < r2 , then p(y|r1 ) p(y + 1|r1 ) > . p(y|r2 ) p(y + 1|r2 ) c Apply the binomial expansion to each factor in the following equation: (1 + a) N1 (1 + a) N2 = (1 + a) N1 +N2 . Now compare the coefﬁcients of a n on both sides to prove that N1 N2 N1 N2 N1 + N2 N1 N2 + + ··· + = . 0 n 1 n−1 n 0 n d Using the result of part (c), conclude that n

p(y) = 1.

y=0

*3.217

Use the result derived in Exercise 3.216(c) and Deﬁnition 3.4 to derive directly the mean of a hypergeometric random variable.

*3.218

Use the results of Exercises 3.216(c) and 3.217 to show that, for a hypergeometric random variable, r (r − 1)n(n − 1) . E[Y (Y − 1)] = N (N − 1)

CHAPTER

4

Continuous Variables and Their Probability Distributions 4.1

Introduction

4.2

The Probability Distribution for a Continuous Random Variable

4.3

Expected Values for Continuous Random Variables

4.4

The Uniform Probability Distribution

4.5

The Normal Probability Distribution

4.6

The Gamma Probability Distribution

4.7

The Beta Probability Distribution

4.8

Some General Comments

4.9

Other Expected Values

4.10 Tchebysheff’s Theorem 4.11 Expectations of Discontinuous Functions and Mixed Probability Distributions (Optional) 4.12 Summary References and Further Readings

4.1 Introduction A moment of reﬂection on random variables encountered in the real world should convince you that not all random variables of interest are discrete random variables. The number of days that it rains in a period of n days is a discrete random variable because the number of days must take one of the n + 1 values 0, 1, 2, . . . , or n. Now consider the daily rainfall at a speciﬁed geographical point. Theoretically, with measuring equipment of perfect accuracy, the amount of rainfall could take on any value between 0 and 5 inches. As a result, each of the uncountably inﬁnite number of points in the interval (0, 5) represents a distinct possible value of the amount of 157

158

Chapter 4

Continuous Variables and Their Probability Distributions

rainfall in a day. A random variable that can take on any value in an interval is called continuous, and the purpose of this chapter is to study probability distributions for continuous random variables. The yield of an antibiotic in a fermentation process is a continuous random variable, as is the length of life, in years, of a washing machine. The line segments over which these two random variables are deﬁned are contained in the positive half of the real line. This does not mean that, if we observed enough washing machines, we would eventually observe an outcome corresponding to every value in the interval (3, 7); rather it means that no value between 3 and 7 can be ruled out as as a possible value for the number of years that a washing machine remains in service. The probability distribution for a discrete random variable can always be given by assigning a nonnegative probability to each of the possible values the variable may assume. In every case, of course, the sum of all the probabilities that we assign must be equal to 1. Unfortunately, the probability distribution for a continuous random variable cannot be speciﬁed in the same way. It is mathematically impossible to assign nonzero probabilities to all the points on a line interval while satisfying the requirement that the probabilities of the distinct possible values sum to 1. As a result, we must develop a different method to describe the probability distribution for a continuous random variable.

4.2 The Probability Distribution for a Continuous Random Variable Before we can state a formal deﬁnition for a continuous random variable, we must deﬁne the distribution function (or cumulative distribution function) associated with a random variable. DEFINITION 4.1

Let Y denote any random variable. The distribution function of Y , denoted by F(y), is such that F(y) = P(Y ≤ y) for −∞ < y < ∞. The nature of the distribution function associated with a random variable determines whether the variable is continuous or discrete. Consequently, we will commence our discussion by examining the distribution function for a discrete random variable and noting the characteristics of this function.

E X A M PL E 4.1 Solution

Suppose that Y has a binomial distribution with n = 2 and p = 1/2. Find F(y). The probability function for Y is given by y 2−y 2 1 1 p(y) = , y 2 2 which yields

y = 0, 1, 2,

p(0) = 1/4,

p(2) = 1/4.

p(1) = 1/2,

4.2

F I G U R E 4.1 Binomial distribution function, n = 2, p = 1/2

The Probability Distribution for a Continuous Random Variable

159

F( y)

1 3/4 1/2 1/4

1

2

y

What is F(−2) = P(Y ≤ −2)? Because the only values of Y that are assigned positive probabilities are 0, 1, and 2 and none of these values are less than or equal to −2, F(−2) = 0. Using similar logic, F(y) = 0 for all y < 0. What is F(1.5)? The only values of Y that are less than or equal to 1.5 and have nonzero probabilities are the values 0 and 1. Therefore, F(1.5) = P(Y ≤ 1.5) = P(Y = 0) + P(Y = 1) = (1/4) + (1/2) = 3/4. In general,

0, for y < 0, 1/4, for 0 ≤ y < 1, F(y) = P(Y ≤ y) = 3/4, for 1 ≤ y < 2, 1, for y ≥ 2.

A graph of F(y) is given in Figure 4.1.

In Example 4.1 the points between 0 and 1 or between 1 and 2 all had probability 0 and contributed nothing to the cumulative probability depicted by the distribution function. As a result, the cumulative distribution function stayed ﬂat between the possible values of Y and increased in jumps or steps at each of the possible values of Y . Functions that behave in such a manner are called step functions. Distribution functions for discrete random variables are always step functions because the cumulative distribution function increases only at the ﬁnite or countable number of points with positive probabilities. Because the distribution function associated with any random variable is such that F(y) = P(Y ≤ y), from a practical point of view it is clear that F(−∞) = lim y→−∞ P(Y ≤ y) must equal zero. If we consider any two values y1 < y2 , then P(Y ≤ y1 ) ≤ P(Y ≤ y2 )—that is, F(y1 ) ≤ F(y2 ). So, a distribution function, F(y), is always a monotonic, nondecreasing function. Further, it is clear that F(∞) = lim y→∞ P(Y ≤ y) = 1. These three characteristics deﬁne the properties of any distribution function and are summarized in the following theorem.

160

Chapter 4

Continuous Variables and Their Probability Distributions

THEOREM 4.1

Properties of a Distribution Function1 If F(y) is a distribution function, then 1. F(−∞) ≡ lim F(y) = 0. y→−∞

2. F(∞) ≡ lim F(y) = 1. y→∞

3. F(y) is a nondecreasing function of y. [If y1 and y2 are any values such that y1 < y2 , then F(y1 ) ≤ F(y2 ).]

You should check that the distribution function developed in Example 4.1 has each of these properties. Let us now examine the distribution function for a continuous random variable. Suppose that, for all practical purposes, the amount of daily rainfall, Y , must be less than 6 inches. For every 0 ≤ y1 < y2 ≤ 6, the interval (y1 , y2 ) has a positive probability of including Y , no matter how close y1 gets to y2 . It follows that F(y) in this case should be a smooth, increasing function over some interval of real numbers, as graphed in Figure 4.2. We are thus led to the deﬁnition of a continuous random variable.

DEFINITION 4.2

F I G U R E 4.2 Distribution function for a continuous random variable

A random variable Y with distribution function F(y) is said to be continuous if F(y) is continuous, for −∞ < y < ∞.2

F( y)

1

F ( y2)

F ( y1)

y1

y2

y

1. To be mathematically rigorous, if F(y) is a valid distribution function, then F(y) also must be right continuous. 2. To be mathematically precise, we also need the ﬁrst derivative of F(y) to exist and be continuous except for, at most, a ﬁnite number of points in any ﬁnite interval. The distribution functions for the continuous random variables discussed in this text satisfy this requirement.

4.2

The Probability Distribution for a Continuous Random Variable

161

If Y is a continuous random variable, then for any real number y, P(Y = y) = 0. If this were not true and P(Y = y0 ) = p0 > 0, then F(y) would have a discontinuity ( jump) of size p0 at the point y0 , violating the assumption that Y was continuous. Practically speaking, the fact that continuous random variables have zero probability at discrete points should not bother us. Consider the example of measuring daily rainfall. What is the probability that we will see a daily rainfall measurement of exactly 2.193 inches? It is quite likely that we would never observe that exact value even if we took rainfall measurements for a lifetime, although we might see many days with measurements between 2 and 3 inches. The derivative of F(y) is another function of prime importance in probability theory and statistics. DEFINITION 4.3

Let F(y) be the distribution function for a continuous random variable Y . Then f (y), given by dF(y) = F (y) dy wherever the derivative exists, is called the probability density function for the random variable Y . f (y) =

It follows from Deﬁnitions 4.2 and 4.3 that F(y) can be written as " y f (t) dt, F(y) = −∞

where f (·) is the probability density function and t is used as the variable of integration. The relationship between the distribution and density functions is shown graphically in Figure 4.3. The probability density function is a theoretical model for the frequency distribution (histogram) of a population of measurements. For example, observations of the lengths of life of washers of a particular brand will generate measurements that can be characterized by a relative frequency histogram, as discussed in Chapter 1. Conceptually, the experiment could be repeated ad inﬁnitum, thereby generating a relative frequency distribution (a smooth curve) that would characterize the population of interest to the manufacturer. This theoretical relative frequency distribution corresponds to the probability density function for the length of life of a single machine, Y . F I G U R E 4.3 The distribution function

f ( y)

F ( y0 ) y0

y

162

Chapter 4

Continuous Variables and Their Probability Distributions

Because the distribution function F(y) for any random variable always has the properties given in Theorem 4.1, density functions must have some corresponding properties. Because F(y) is a nondecreasing function, the derivative # ∞ f (y) is never negative. Further, we know that F(∞) = 1 and, therefore, that −∞ f (t) dt = 1. In summary, the properties of a probability density function are as given in the following theorem. THEOREM 4.2

Properties of a Density Function If f (y) is a density function for a continuous random variable, then 1. f (y) ≥ 0 for all y, −∞ < y < ∞. #∞ 2. −∞ f (y) dy = 1. The next example gives the distribution function and density function for a continuous random variable.

E X A M PL E 4.2

Suppose that

0, for y < 0, F(y) = y, for 0 ≤ y ≤ 1, 1, for y > 1.

Find the probability density function for Y and graph it. Solution

Because the density function f (y) is the derivative of the distribution function F(y), when the derivative exists, d(0) = 0, for y < 0, dy dF(y) d(y) = 1, for 0 < y < 1, f (y) = = dy dy d(1) = 0, for y > 1, dy and f (y) is undeﬁned at y = 0 and y = 1. A graph of F(y) is shown in Figure 4.4.

F I G U R E 4.4 Distribution function F (y) for Example 4.2

F( y) 1

1

y

The graph of f (y) for Example 4.2 is shown in Figure 4.5. Notice that the distribution and density functions given in Example 4.2 have all the properties required

4.2

F I G U R E 4.5 Density function f (y) for Example 4.2

The Probability Distribution for a Continuous Random Variable

163

f (y) 1

y

1

of distribution and density functions, respectively. Moreover, F(y) is a continuous function of y, but f (y) is discontinuous at the points y = 0, 1. In general, the distribution function for a continuous random variable must be continuous, but the density function need not be everywhere continuous. E X A M PL E 4.3

Solution

Let Y be a continuous random variable with probability density function given by $ 2 3y , 0 ≤ y ≤ 1, f (y) = 0, elsewhere. Find F(y). Graph both f (y) and F(y). The graph of f (y) appears in Figure 4.6. Because " y F(y) = f (t) dt, −∞

we have, for this example, # y 0 dt = 0, for y < 0, −∞ #0 #y 2 3 y 3 for 0 ≤ y ≤ 1, F(y) = −∞ 0 dt + 0 3t dt = 0 + t 0 = y , #1 2 #y #0 3 1 −∞ 0 dt + 0 3t dt + 1 0 dt = 0 + t 0 + 0 = 1, for 1 < y. Notice that some of the integrals that we evaluated yield a value of 0. These are included for completeness in this initial example. In future calculations, we will not explicitly display any integral that has value 0. The graph of F(y) is given in Figure 4.7. F I G U R E 4.6 Density function for Example 4.3

f (y) 3 2 1 0

1

y

F(y0 ) gives the probability that Y ≤ y0 . As you will see in subsequent chapters, it is often of interest to determine the value, y, of a random variable Y that is such that P(Y ≤ y) equals or exceeds some speciﬁed value.

164

Chapter 4

Continuous Variables and Their Probability Distributions

F I G U R E 4.7 Distribution function for Example 4.3

F( y) 1

DEFINITION 4.4

y

1

Let Y denote any random variable. If 0 < p < 1, the pth quantile of Y , denoted by φ p , is the smallest value such that P(Y ≤ φq ) = F(φ p ) ≥ p. If Y is continuous, φ p is the smallest value such that F(φ p ) = P(Y ≤ φ p ) = p. Some prefer to call φ p the 100 pth percentile of Y . An important special case is p = 1/2, and φ.5 is the median of the random variable Y . In Example 4.3, the median of the random variable is such that F(φ.5 ) = .5 and is easily seen to be such that (φ.5 )3 = .5, or equivalently, that the median of Y is φ.5 = (.5)1/3 = .7937. The next step is to ﬁnd the probability that Y falls in a speciﬁc interval; that is, P(a ≤ Y ≤ b). From Chapter 1 we know that this probability corresponds to the area under the frequency distribution over the interval a ≤ y ≤ b. Because f (y) is the theoretical counterpart of the frequency distribution, we would expect P(a ≤ Y ≤ b) to equal a corresponding area under the density function f (y). This indeed is true because, if a < b, " b f (y) dy. P(a < Y ≤ b) = P(Y ≤ b) − P(Y ≤ a) = F(b) − F(a) = a

Because P(Y = a) = 0, we have the following result. THEOREM 4.3

If the random variable Y has density function f (y) and a < b, then the probability that Y falls in the interval [a, b] is " b P(a ≤ Y ≤ b) = f (y) dy. a

This probability is the shaded area in Figure 4.8. F I G U R E 4.8 P (a ≤ Y ≤ b)

f (y)

a

b

y

4.2

The Probability Distribution for a Continuous Random Variable

165

If Y is a continuous random variable and a and b are constants such that a < b, then P(Y = a) = 0 and P(Y = b) = 0 and Theorem 4.3 implies that P(a < Y < b) = P(a ≤ Y < b) = P(a < Y ≤ b) " b f (y) dy. = P(a ≤ Y ≤ b) = a

The fact that the above string of equalities is not, in general, true for discrete random variables is illustrated in Exercise 4.7. E X A M PL E 4.4 Solution

Given f (y) = cy 2 , 0 ≤ y ≤ 2, and f (y) = 0 elsewhere, ﬁnd the value of c for which f (y) is a valid density function. We require a value for c such that " F(∞) =

∞

−∞

f (y) dy = 1 "

2

=

cy 2 dy =

cy 3 3

2 = 0

8 c. 3

Thus, (8/3)c = 1, and we ﬁnd that c = 3/8.

E X A M PL E 4.5 Solution

Find P(1 ≤ Y ≤ 2) for Example 4.4. Also ﬁnd P(1 < Y < 2). 3 2 " 2 " 3 y 3 2 2 7 P(1 ≤ Y ≤ 2) = f (y) dy = y dy = = . 8 1 8 3 1 8 1 Because Y has a continuous distribution, it follows that P(Y = 1) = P(Y = 2) = 0 and, therefore, that P(1 < Y < 2) = P(1 ≤ Y ≤ 2) =

3 8

" 1

2

y 2 dy =

7 . 8

Probability statements regarding a continuous random variable Y are meaningful only if, ﬁrst, the integral deﬁning the probability exists and, second, the resulting probabilities agree with the axioms of Chapter 2. These two conditions will always be satisﬁed if we consider only probabilities associated with a ﬁnite or countable collection of intervals. Because we almost always are interested in probabilities that continuous variables fall in intervals, this consideration will cause us no practical difﬁculty. Some density functions that provide good models for population frequency distributions encountered in practical applications are presented in subsequent sections.

166

Chapter 4

Continuous Variables and Their Probability Distributions

Exercises 4.1

Let Y be a random variable with p(y) given in the table below. y

1

2

3

4

p(y)

.4

.3

.2

.1

a Give the distribution function, F(y). Be sure to specify the value of F(y) for all y, −∞ < y < ∞. b Sketch the distribution function given in part (a).

4.2

A box contains ﬁve keys, only one of which will open a lock. Keys are randomly selected and tried, one at a time, until the lock is opened (keys that do not work are discarded before another is tried). Let Y be the number of the trial on which the lock is opened. a Find the probability function for Y . b Give the corresponding distribution function. c What is P(Y < 3)? P(Y ≤ 3)? P(Y = 3)? d If Y is a continuous random variable, we argued that, for all −∞ < a < ∞, P(Y = a) = 0. Do any of your answers in part (c) contradict this claim? Why?

4.3

A Bernoulli random variable is one that assumes only two values, 0 and 1 with p(1) = p and p(0) = 1 − p ≡ q. a Sketch the corresponding distribution function. b Show that this distribution function has the properties given in Theorem 4.1.

4.4

Let Y be a binomial random variable with n = 1 and success probability p. a Find the probability and distribution function for Y . b Compare the distribution function from part (a) with that in Exercise 4.3(a). What do you conclude?

4.5

Suppose that Y is a random variable that takes on only integer values 1, 2, . . . and has distribution function F(y). Show that the probability function p(y) = P(Y = y) is given by F(1), y = 1, p(y) = F(y) − F(y − 1), y = 2, 3, . . . .

4.6

Consider a random variable with a geometric distribution (Section 3.5); that is, p(y) = q y−1 p,

y = 1, 2, 3, . . . , 0 < p < 1.

a Show that Y has distribution function F(y) such that F(i) = 1 − q i , i = 0, 1, 2, . . . and that, in general, $ 0, y < 0, F(y) = 1 − q i , i ≤ y < i + 1, for i = 0, 1, 2, . . . . b Show that the preceding cumulative distribution function has the properties given in Theorem 4.1.

4.7

Let Y be a binomial random variable with n = 10 and p = .2. a Use Table 1, Appendix 3, to obtain P(2 < Y < 5) and P(2 ≤ Y < 5). Are the probabilities that Y falls in the intevals (2, 5) and [2, 5) equal? Why or why not?

Exercises

167

b Use Table 1, Appendix 3, to obtain P(2 < Y ≤ 5) and P(2 ≤ Y ≤ 5). Are these two probabilities equal? Why or why not? c Earlier in this section, we argued that if Y is continuous and a < b, then P(a < Y < b) = P(a ≤ Y < b). Does the result in part (a) contradict this claim? Why?

4.8

Suppose that Y has density function f (y) =

a b c d e

4.9

$

ky(1 − y), 0 ≤ y ≤ 1,

0, elsewhere. Find the value of k that makes f (y) a probability density function. Find P(.4 ≤ Y ≤ 1). Find P(.4 ≤ Y < 1). Find P(Y ≤ .4|Y ≤ .8). Find P(Y < .4|Y < .8).

A random variable Y has the following distribution function: 0, for y < 2, 1/8, for 2 ≤ y < 2.5, 3/16, for 2.5 ≤ y < 4, for 4 ≤ y < 5.5, F(y) = P(Y ≤ y) = 1/2 5/8, for 5.5 ≤ y < 6, 11/16, for 6 ≤ y < 7, 1, for y ≥ 7. a Is Y a continuous or discrete random variable? Why? b What values of Y are assigned positive probabilities? c Find the probability function for Y . d What is the median, φ.5 , of Y ?

4.10

Refer to the density function given in Exercise 4.8. a Find the .95-quantile, φ.95 , such that P(Y ≤ φ.95 ) = .95. b Find a value y0 so that P(Y < y0 ) = .95. c Compare the values for φ.95 and y0 that you obtained in parts (a) and (b). Explain the relationship between these two values.

4.11

Suppose that Y possesses the density function $ cy, 0 ≤ y ≤ 2, f (y) = 0, elsewhere. a Find the value of c that makes f (y) a probability density function. b Find F(y). c Graph f (y) and F(y). d Use F(y) to ﬁnd P(1 ≤ Y ≤ 2). e Use f (y) and geometry to ﬁnd P(1 ≤ Y ≤ 2).

4.12

The length of time to failure (in hundreds of hours) for a transistor is a random variable Y with distribution function given by $ 0, y < 0, F(y) = −y 2 1 − e , y ≥ 0.

168

Chapter 4

Continuous Variables and Their Probability Distributions

a b c d e

4.13

Show that F(y) has the properties of a distribution function. Find the .30-quantile, φ.30 , of Y . Find f (y). Find the probability that the transistor operates for at least 200 hours. Find P(Y > 100|Y ≤ 200).

A supplier of kerosene has a 150-gallon tank that is ﬁlled at the beginning of each week. His weekly demand shows a relative frequency behavior that increases steadily up to 100 gallons and then levels off between 100 and 150 gallons. If Y denotes weekly demand in hundreds of gallons, the relative frequency of demand can be modeled by y, 0 ≤ y ≤ 1, f (y) = 1, 1 < y ≤ 1.5, 0, elsewhere. a Find F(y). b Find P(0 ≤ Y ≤ .5). c Find P(.5 ≤ Y ≤ 1.2).

4.14

A gas station operates two pumps, each of which can pump up to 10,000 gallons of gas in a month. The total amount of gas pumped at the station in a month is a random variable Y (measured in 10,000 gallons) with a probability density function given by y, 0 < y < 1, f (y) = 2 − y, 1 ≤ y < 2, 0, elsewhere. a Graph f (y). b Find F(y) and graph it. c Find the probability that the station will pump between 8000 and 12,000 gallons in a particular month. d Given that the station pumped more than 10,000 gallons in a particular month, ﬁnd the probability that the station pumped more than 15,000 gallons during the month.

4.15

As a measure of intelligence, mice are timed when going through a maze to reach a reward of food. The time (in seconds) required for any mouse is a random variable Y with a density function given by b , y ≥ b, f (y) = y 2 0, elsewhere, where b is the minimum possible time needed to traverse the maze. a Show that f (y) has the properties of a density function. b Find F(y). c Find P(Y > b + c) for a positive constant c. d If c and d are both positive constants such that d > c, ﬁnd P(Y > b + d|Y > b + c).

4.16

Let Y possess a density function f (y) =

$

c(2 − y),

0 ≤ y ≤ 2,

0,

elsewhere.

Exercises

a b c d e

4.17

Find c. Find F(y). Graph f (y) and F(y). Use F(y) in part (b) to ﬁnd P(1 ≤ Y ≤ 2). Use geometry and the graph for f (y) to calculate P(1 ≤ Y ≤ 2).

The length of time required by students to complete a one-hour exam is a random variable with a density function given by $ f (y) = a b c d e f

4.18

cy 2 + y, 0 ≤ y ≤ 1, 0,

elsewhere.

Find c. Find F(y). Graph f (y) and F(y). Use F(y) in part (b) to ﬁnd F(−1), F(0), and F(1). Find the probability that a randomly selected student will ﬁnish in less than half an hour. Given that a particular student needs at least 15 minutes to complete the exam, ﬁnd the probability that she will require at least 30 minutes to ﬁnish.

Let Y have the density function given by .2, f (y) = .2 + cy, 0, a b c d e f

4.19

169

−1 < y ≤ 0, 0 < y ≤ 1, elsewhere.

Find c. Find F(y). Graph f (y) and F(y). Use F(y) in part (b) to ﬁnd F(−1), F(0), and F(1). Find P(0 ≤ Y ≤ .5). Find P(Y > .5|Y > .1).

Let the distribution function of a random variable Y be 0, y 8, F(y) = y2 , 16 1, a Find the density function of Y . b Find P(1 ≤ Y ≤ 3). c Find P(Y ≥ 1.5). d Find P(Y ≥ 1|Y ≤ 3).

y ≤ 0, 0 < y < 2, 2 ≤ y < 4, y ≥ 4.

170

Chapter 4

Continuous Variables and Their Probability Distributions

4.3 Expected Values for Continuous Random Variables The next step in the study of continuous random variables is to ﬁnd their means, variances, and standard deviations, thereby acquiring numerical descriptive measures associated with their distributions. Many times it is difﬁcult to ﬁnd the probability distribution for a random variable Y or a function of a random variable, g(Y ). Even if the density function for a random variable is known, it can be difﬁcult to evaluate appropriate integrals (we will see this to be the case when a random variable has a gamma distribution, Section 4.6). When we encounter these situations, the approximate behavior of variables of interest can be established by using their moments and the empirical rule or Tchebysheff’s theorem (Chapters 1 and 3). DEFINITION 4.5

The expected value of a continuous random variable Y is " ∞ yf(y) dy, E(Y ) = −∞

provided that the integral exists.3 If the deﬁnition of the expected value for a discrete random variable Y , E(Y ) = y yp(y), is meaningful, then Deﬁnition 4.4 also should agree with our intuitive notion of a mean. The quantity f (y) dy corresponds to p(y) for the discrete case, and integration evolves from and is analogous to summation. Hence, E(Y ) in Deﬁnition 4.5 agrees with our notion of an average, or mean. As in the discrete case, we are sometimes interested in the expected value of a function of a random variable. A result that permits us to evaluate such an expected value is given in the following theorem. THEOREM 4.4

Let g(Y ) be a function of Y ; then the expected value of g(Y ) is given by " ∞ g(y) f (y) dy, E [g(Y )] = −∞

provided that the integral exists. The proof of Theorem 4.4 is similar to that of Theorem 3.2 and is omitted. The expected values of three important functions of a continuous random variable Y evolve 3. Technically, E(Y ) is said to exist if

"

∞ −∞

|y| f (y) dy < ∞.

This will be the case in all expectations that we discuss, and we will not mention this additional condition each time that we deﬁne an expected value.

4.3

Expected Values for Continuous Random Variables

171

as a consequence of well-known theorems of integration. As expected, these results lead to conclusions analogous to those contained in Theorems 3.3, 3.4, and 3.5. As a consequence, the proof of Theorem 4.5 will be left as an exercise.

THEOREM 4.5

Let c be a constant and let g(Y ), g1 (Y ), g2 (Y ), . . . , gk (Y ) be functions of a continuous random variable Y . Then the following results hold: 1. E(c) = c. 2. E[cg(Y )] = cE[g(Y )]. 3. E[g1 (Y )+g2 (Y )+· · ·+gk (Y )] = E[g1 (Y )]+E[g2 (Y )]+· · ·+E[gk (Y )].

As in the case of discrete random variables, we often seek the expected value of the function g(Y ) = (Y − µ)2 . As before, the expected value of this function is the variance of the random variable Y . That is, as in Deﬁnition 3.5, V (Y ) = E(Y − µ)2 . It is a simple exercise to show that Theorem 4.5 implies that V (Y ) = E(Y 2 ) − µ2 .

E X A M PL E 4.6

Solution

In Example 4.4 we determined that f (y) = (3/8)y 2 for 0 ≤ y ≤ 2, f (y) = 0 elsewhere, is a valid density function. If the random variable Y has this density function, ﬁnd µ = E(Y ) and σ 2 = V (Y ). According to Deﬁnition 4.5, " E(Y ) =

∞

−∞

y f (y) dy

3 y 2 dy 8 0 2 1 3 y 4 = 1.5. = 8 4 0 "

2

=

y

The variance of Y can be found once we determine E(Y 2 ). In this case, " ∞ 2 E(Y ) = y 2 f (y) dy −∞

3 y y 2 dy = 8 0 2 1 3 y 5 = 2.4. = 8 5 0 "

2

2

Thus, σ 2 = V (Y ) = E(Y 2 ) − [E(Y )]2 = 2.4 − (1.5)2 = 0.15.

172

Chapter 4

Continuous Variables and Their Probability Distributions

Exercises 4.20

If, as in Exercise 4.16, Y has density function $ (1/2)(2 − y), 0 ≤ y ≤ 2, f (y) = 0, elsewhere, ﬁnd the mean and variance of Y .

4.21

If, as in Exercise 4.17, Y has density function $ (3/2)y 2 + y, 0 ≤ y ≤ 1, f (y) = 0, elsewhere, ﬁnd the mean and variance of Y .

4.22

If, as in Exercise 4.18, Y has density function .2, −1 < y ≤ 0, f (y) = .2 + (1.2)y, 0 < y ≤ 1, 0, elsewhere, ﬁnd the mean and variance of Y .

4.23

Prove Theorem 4.5.

4.24

If Y is a continuous random variable with density function f (y), use Theorem 4.5 to prove that σ 2 = V (Y ) = E(Y 2 ) − [E(Y )]2 .

4.25

If, as in Exercise 4.19, Y has distribution function 0, y ≤ 0, y 8 , 0 < y < 2, F(y) = y2 , 2 ≤ y < 4, 16 1, y ≥ 4, ﬁnd the mean and variance of Y .

4.26

If Y is a continuous random variable with mean µ and variance σ 2 and a and b are constants, use Theorem 4.5 to prove the following: a b

E(aY + b) = a E(Y ) + b = aµ + b. V (aY + b) = a 2 V (Y ) = a 2 σ 2 .

4.27

For certain ore samples, the proportion Y of impurities per sample is a random variable with density function given in Exercise 4.21. The dollar value of each sample is W = 5 − .5Y . Find the mean and variance of W .

4.28

The proportion of time per day that all checkout counters in a supermarket are busy is a random variable Y with density function $ 2 cy (1 − y)4 , 0 ≤ y ≤ 1, f (y) = 0, elsewhere. a Find the value of c that makes f (y) a probability density function. b Find E(Y ).

Exercises

173

4.29

The temperature Y at which a thermostatically controlled switch turns on has probability density function given by $ 1/2, 59 ≤ y ≤ 61, f (y) = 0, elsewhere. Find E(Y ) and V (Y ).

4.30

The proportion of time Y that an industrial robot is in operation during a 40-hour week is a random variable with probability density function $ 2y, 0 ≤ y ≤ 1, f (y) = 0, elsewhere. a Find E(Y ) and V (Y ). b For the robot under study, the proﬁt X for a week is given by X = 200Y − 60. Find E(X ) and V (X ). c Find an interval in which the proﬁt should lie for at least 75% of the weeks that the robot is in use.

4.31

The pH of water samples from a speciﬁc lake is a random variable Y with probability density function given by $ (3/8)(7 − y)2 , 5 ≤ y ≤ 7, f (y) = 0, elsewhere. a Find E(Y ) and V (Y ). b Find an interval shorter than (5, 7) in which at least three-fourths of the pH measurements must lie. c Would you expect to see a pH measurement below 5.5 very often? Why?

4.32

Weekly CPU time used by an accounting ﬁrm has probability density function (measured in hours) given by $ (3/64)y 2 (4 − y), 0 ≤ y ≤ 4, f (y) = 0, elsewhere. a b c

Find the expected value and variance of weekly CPU time. The CPU time costs the ﬁrm $200 per hour. Find the expected value and variance of the weekly cost for CPU time. Would you expect the weekly cost to exceed $600 very often? Why?

4.33

Daily total solar radiation for a speciﬁed location in Florida in October has probability density function given by $ (3/32)(y − 2)(6 − y), 2 ≤ y ≤ 6, f (y) = 0, elsewhere, with measurements in hundreds of calories. Find the expected daily solar radiation for October.

*4.34

Suppose that Y is a continuous random variable with density f (y) that is positive only if y ≥ 0. If F(y) is the distribution function, show that " ∞ " ∞ E(Y ) = y f (y) dy = [1 − F(y)] dy. 0

#y #∞ # ∞ %# y & [Hint: If y > 0, y = 0 dt, and E(Y ) = 0 y f (y) dy = 0 dt f (y) dy. Exchange the 0 order of integration to obtain the desired result.]4 4. Exercises preceded by an asterisk are optional.

174

Chapter 4

Continuous Variables and Their Probability Distributions

*4.35

If Y is a continuous random variable such that E[(Y −a)2 ] < ∞ for all a, show that E[(Y −a)2 ] is minimized when a = E(Y ). [Hint: E[(Y − a)2 ] = E({[Y − E(Y )] + [E(Y ) − a]}2 ).]

*4.36

Is the result obtained in Exercise 4.35 also valid for discrete random variables? Why?

*4.37

If Y is a continuous random variable with density function f (y) that is symmetric about 0 (that f (−y) for all y) and E(Y ) exists, show that E(Y ) = 0. [Hint: E(Y ) = # 0 is, f (y) =# ∞ y f (y) dy + 0 y f (y) dy. Make the change of variable w = −y in the ﬁrst integral.] −∞

4.4 The Uniform Probability Distribution Suppose that a bus always arrives at a particular stop between 8:00 and 8:10 A.M. and that the probability that the bus will arrive in any given subinterval of time is proportional only to the length of the subinterval. That is, the bus is as likely to arrive between 8:00 and 8:02 as it is to arrive between 8:06 and 8:08. Let Y denote the length of time a person must wait for the bus if that person arrived at the bus stop at exactly 8:00. If we carefully measured in minutes how long after 8:00 the bus arrived for several mornings, we could develop a relative frequency histogram for the data. From the description just given, it should be clear that the relative frequency with which we observed a value of Y between 0 and 2 would be approximately the same as the relative frequency with which we observed a value of Y between 6 and 8. A reasonable model for the density function of Y is given in Figure 4.9. Because areas under curves represent probabilities for continuous random variables and A1 = A2 (by inspection), it follows that P(0 ≤ Y ≤ 2) = P(6 ≤ Y ≤ 8), as desired. The random variable Y just discussed is an example of a random variable that has a uniform distribution. The general form for the density function of a random variable with a uniform distribution is as follows. DEFINITION 4.6

F I G U R E 4.9 Density function for Y

If θ1 < θ2 , a random variable Y is said to have a continuous uniform probability distribution on the interval (θ1 , θ2 ) if and only if the density function of Y is 1 , θ1 ≤ y ≤ θ2 , f (y) = θ2 − θ1 0, elsewhere.

f ( y)

A2

A1

1

2

3

4

5

6

7

8

9 10

y

4.4

The Uniform Probability Distribution

175

In the bus problem we can take θ1 = 0 and θ2 = 10 because we are interested only in a particular ten-minute interval. The density function discussed in Example 4.2 is a uniform distribution with θ1 = 0 and θ2 = 1. Graphs of the distribution function and density function for the random variable in Example 4.2 are given in Figures 4.4 and 4.5, respectively.

DEFINITION 4.7

The constants that determine the speciﬁc form of a density function are called parameters of the density function. The quantities θ1 and θ2 are parameters of the uniform density function and are clearly meaningful numerical values associated with the theoretical density function. Both the range and the probability that Y will fall in any given interval depend on the values of θ1 and θ2 . Some continuous random variables in the physical, management, and biological sciences have approximately uniform probability distributions. For example, suppose that the number of events, such as calls coming into a switchboard, that occur in the time interval (0, t) has a Poisson distribution. If it is known that exactly one such event has occurred in the interval (0, t), then the actual time of occurrence is distributed uniformly over this interval.

E X A M PL E 4.7

Arrivals of customers at a checkout counter follow a Poisson distribution. It is known that, during a given 30-minute period, one customer arrived at the counter. Find the probability that the customer arrived during the last 5 minutes of the 30-minute period.

Solution

As just mentioned, the actual time of arrival follows a uniform distribution over the interval of (0, 30). If Y denotes the arrival time, then " 30 30 − 25 5 1 1 P(25 ≤ Y ≤ 30) = dy = = = . 30 30 6 25 30 The probability of the arrival occurring in any other 5-minute interval is also 1/6.

As we will see, the uniform distribution is very important for theoretical reasons. Simulation studies are valuable techniques for validating models in statistics. If we desire a set of observations on a random variable Y with distribution function F(y), we often can obtain the desired results by transforming a set of observations on a uniform random variable. For this reason most computer systems contain a random number generator that generates observed values for a random variable that has a continuous uniform distribution.

176

Chapter 4

Continuous Variables and Their Probability Distributions

THEOREM 4.6

If θ1 < θ2 and Y is a random variable uniformly distributed on the interval (θ1 , θ2 ), then µ = E(Y ) =

Proof

θ1 + θ2 2

By Deﬁnition 4.5,

" E(Y ) =

∞

−∞

and

σ 2 = V (Y ) =

(θ2 − θ1 )2 . 12

y f (y) dy

1 dy y = θ2 − θ 1 θ1 2 θ2 y θ 2 − θ12 1 = 2 = θ2 − θ 1 2 θ1 2(θ2 − θ1 ) "

θ2

θ2 + θ1 . 2 Note that the mean of a uniform random variable is simply the value midway between the two parameter values, θ1 and θ2 . The derivation of the variance is left as an exercise. =

Exercises 4.38

Suppose that Y has a uniform distribution over the interval (0, 1). a Find F(y). b Show that P(a ≤ Y ≤ a + b), for a ≥ 0, b ≥ 0, and a + b ≤ 1 depends only upon the value of b.

4.39

If a parachutist lands at a random point on a line between markers A and B, ﬁnd the probability that she is closer to A than to B. Find the probability that her distance to A is more than three times her distance to B.

4.40

Suppose that three parachutists operate independently as described in Exercise 4.39. What is the probability that exactly one of the three lands past the midpoint between A and B?

4.41

A random variable Y has a uniform distribution over the interval (θ1 , θ2 ). Derive the variance of Y .

4.42

The median of the distribution of a continuous random variable Y is the value φ.5 such that P(Y ≤ φ.5 ) = 0.5. What is the median of the uniform distribution on the interval (θ1 , θ2 )?

4.43

A circle of radius r has area A = πr 2 . If a random circle has a radius that is uniformly distributed on the interval (0, 1), what are the mean and variance of the area of the circle?

4.44

The change in depth of a river from one day to the next, measured (in feet) at a speciﬁc location, is a random variable Y with the following density function: $ k, −2 ≤ y ≤ 2 f (y) = 0, elsewhere.

Exercises

177

a Determine the value of k. b Obtain the distribution function for Y .

4.45

Upon studying low bids for shipping contracts, a microcomputer manufacturing company ﬁnds that intrastate contracts have low bids that are uniformly distributed between 20 and 25, in units of thousands of dollars. Find the probability that the low bid on the next intrastate shipping contract a is below $22,000. b is in excess of $24,000.

4.46

Refer to Exercise 4.45. Find the expected value of low bids on contracts of the type described there.

4.47

The failure of a circuit board interrupts work that utilizes a computing system until a new board is delivered. The delivery time, Y , is uniformly distributed on the interval one to ﬁve days. The cost of a board failure and interruption includes the ﬁxed cost c0 of a new board and a cost that increases proportionally to Y 2 . If C is the cost incurred, C = c0 + c1 Y 2 . a b

Find the probability that the delivery time exceeds two days. In terms of c0 and c1 , ﬁnd the expected cost associated with a single failed circuit board.

4.48

Beginning at 12:00 midnight, a computer center is up for one hour and then down for two hours on a regular cycle. A person who is unaware of this schedule dials the center at a random time between 12:00 midnight and 5:00 A.M. What is the probability that the center is up when the person’s call comes in?

4.49

A telephone call arrived at a switchboard at random within a one-minute interval. The switch board was fully busy for 15 seconds into this one-minute period. What is the probability that the call arrived when the switchboard was not fully busy?

4.50

If a point is randomly located in an interval (a, b) and if Y denotes the location of the point, then Y is assumed to have a uniform distribution over (a, b). A plant efﬁciency expert randomly selects a location along a 500-foot assembly line from which to observe the work habits of the workers on the line. What is the probability that the point she selects is a within 25 feet of the end of the line? b within 25 feet of the beginning of the line? c closer to the beginning of the line than to the end of the line?

4.51

The cycle time for trucks hauling concrete to a highway construction site is uniformly distributed over the interval 50 to 70 minutes. What is the probability that the cycle time exceeds 65 minutes if it is known that the cycle time exceeds 55 minutes?

4.52

Refer to Exercise 4.51. Find the mean and variance of the cycle times for the trucks.

4.53

The number of defective circuit boards coming off a soldering machine follows a Poisson distribution. During a speciﬁc eight-hour day, one defective circuit board was found. a Find the probability that it was produced during the ﬁrst hour of operation during that day. b Find the probability that it was produced during the last hour of operation during that day. c Given that no defective circuit boards were produced during the ﬁrst four hours of operation, ﬁnd the probability that the defective board was manufactured during the ﬁfth hour.

4.54

In using the triangulation method to determine the range of an acoustic source, the test equipment must accurately measure the time at which the spherical wave front arrives at a receiving

178

Chapter 4

Continuous Variables and Their Probability Distributions

sensor. According to Perruzzi and Hilliard (1984), measurement errors in these times can be modeled as possessing a uniform distribution from −0.05 to +0.05 µs (microseconds). a b

4.55

What is the probability that a particular arrival-time measurement will be accurate to within 0.01 µs? Find the mean and variance of the measurement errors.

Refer to Exercise 4.54. Suppose that measurement errors are uniformly distributed between −0.02 to +0.05 µs. a b

What is the probability that a particular arrival-time measurement will be accurate to within 0.01 µs? Find the mean and variance of the measurement errors.

4.56

Refer to Example 4.7. Find the conditional probability that a customer arrives during the last 5 minutes of the 30-minute period if it is known that no one arrives during the ﬁrst 10 minutes of the period.

4.57

According to Zimmels (1983), the sizes of particles used in sedimentation experiments often have a uniform distribution. In sedimentation involving mixtures of particles of various sizes, the larger particles hinder the movements of the smaller ones. Thus, it is important to study both the mean and the variance of particle sizes. Suppose that spherical particles have diameters that are uniformly distributed between .01 and .05 centimeters. Find the mean and variance of the volumes of these particles. (Recall that the volume of a sphere is (4/3)πr 3 .)

4.5 The Normal Probability Distribution The most widely used continuous probability distribution is the normal distribution, a distribution with the familiar bell shape that was discussed in connection with the empirical rule. The examples and exercises in this section illustrate some of the many random variables that have distributions that are closely approximated by a normal probability distribution. In Chapter 7 we will present an argument that at least partially explains the common occurrence of normal distributions of data in nature. The normal density function is as follows:

DEFINITION 4.8

A random variable Y is said to have a normal probability distribution if and only if, for σ > 0 and −∞ < µ < ∞, the density function of Y is 1 2 2 −∞ < y < ∞. f (y) = √ e−(y−µ) /(2σ ) , σ 2π Notice that the normal density function contains two parameters, µ and σ .

THEOREM 4.7

If Y is a normally distributed random variable with parameters µ and σ , then E(Y ) = µ

and

V (Y ) = σ 2 .

4.5

F I G U R E 4.10 The normal probability density function

The Normal Probability Distribution

179

f (y)

y

The proof of this theorem will be deferred to Section 4.9, where we derive the moment-generating function of a normally distributed random variable. The results contained in Theorem 4.7 imply that the parameter µ locates the center of the distribution and that σ measures its spread. A graph of a normal density function is shown in Figure 4.10. Areas under the normal density function corresponding to P(a ≤ Y ≤ b) require evaluation of the integral "

b a

1 2 2 √ e−(y−µ) /(2σ ) dy. σ 2π

Unfortunately, a closed-form expression for this integral does not exist; hence, its evaluation requires the use of numerical integration techniques. Probabilities and quantiles for random variables with normal distributions are easily found using R and S-Plus. If Y has a normal distribution with mean µ and standard deviation σ , the R (or S-Plus) command pnorm(y0 ,µ,σ ) generates P(Y ≤ y0 ) whereas qnorm(p,µ,σ ) yields the pth quantile, the value of φ p such that P(Y ≤ φ p ) = p. Although there are inﬁnitely many normal distributions (µ can take on any ﬁnite value, whereas σ can assume any positive ﬁnite value), we need only one table—Table 4, Appendix 3—to compute areas under normal densities. Probabilities and quantiles associated with normally distributed random variables can also be found using the applet Normal Tail Areas and Quantiles accessible at www.thomsonedu.com/statistics/ wackerly. The only real beneﬁt associated with using software to obtain probabilities and quantiles associated with normally distributed random variables is that the software provides answers that are correct to a greater number of decimal places. The normal density function is symmetric around the value µ, so areas need be tabulated on only one side of the mean. The tabulated areas are to the right of points z, where z is the distance from the mean, measured in standard deviations. This area is shaded in Figure 4.11.

E X A M PL E 4.8

Let Z denote a normal random variable with mean 0 and standard deviation 1. a Find P(Z > 2). b Find P(−2 ≤ Z ≤ 2). c Find P(0 ≤ Z ≤ 1.73).

180

Chapter 4

Continuous Variables and Their Probability Distributions

F I G U R E 4.11 Tabulated area for the normal density function

f (y)

Solution

z

+ z

y

a Since µ = 0 and σ = 1, the value 2 is actually z = 2 standard deviations above the mean. Proceed down the ﬁrst (z) column in Table 4, Appendix 3, and read the area opposite z = 2.0. This area, denoted by the symbol A(z), is A(2.0) = .0228. Thus, P(Z > 2) = .0228. b Refer to Figure 4.12, where we have shaded the area of interest. In part (a) we determined that A1 = A(2.0) = .0228. Because the density function is symmetric about the mean µ = 0, it follows that A2 = A1 = .0228 and hence that P(−2 ≤ Z ≤ 2) = 1 − A1 − A2 = 1 − 2(.0228) = .9544. c Because P(Z > 0) = A(0) = .5, we obtain that P(0 ≤ Z ≤ 1.73) = .5 − A(1.73), where A(1.73) is obtained by proceeding down the z column in Table 4, Appendix 3, to the entry 1.7 and then across the top of the table to the column labeled .03 to read A(1.73) = .0418. Thus, P(0 ≤ Z ≤ 1.73) = .5 − .0418 = .4582.

F I G U R E 4.12 Desired area for Example 4.8(b) A1

A2 –2

y

2

E X A M PL E 4.9

The achievement scores for a college entrance examination are normally distributed with mean 75 and standard deviation 10. What fraction of the scores lies between 80 and 90?

Solution

Recall that z is the distance from the mean of a normal distribution expressed in units of standard deviation. Thus, z=

y−µ . σ

Exercises

181

F I G U R E 4.13 Required area for Example 4.9 A 0

.5

1.5

z

Then the desired fraction of the population is given by the area between 80 − 75 90 − 75 z1 = = .5 and z 2 = = 1.5. 10 10 This area is shaded in Figure 4.13. You can see from Figure 4.13 that A = A(.5) − A(1.5) = .3085 − .0668 = .2417.

We can always transform a normal random variable Y to a standard normal random variable Z by using the relationship Y −µ Z= . σ Table 4, Appendix 3, can then be used to compute probabilities, as shown here. Z locates a point measured from the mean of a normal random variable, with the distance expressed in units of the standard deviation of the original normal random variable. Thus, the mean value of Z must be 0, and its standard deviation must equal 1. The proof that the standard normal random variable, Z , is normally distributed with mean 0 and standard deviation 1 is given in Chapter 6. The applet Normal Probabilities, accessible at www.thomsonedu.com/statistics/ wackerly, illustrates the correspondence between normal probabilities on the original and transformed (z) scales. To answer the question posed in Example 4.9, locate the interval of interest, (80, 90), on the lower horizontal axis labeled Y . The corresponding z-scores are given on the upper horizontal axis, and it is clear that the shaded area gives P(80 < Y < 90) = P(0.5 < Z < 1.5) = 0.2417 (see Figure 4.14). A few of the exercises at the end of this section suggest that you use this applet to reinforce the calculations of probabilities associated with normally distributed random variables.

Exercises 4.58

Use Table 4, Appendix 3, to ﬁnd the following probabilities for a standard normal random variable Z: a b c

P(0 ≤ Z ≤ 1.2) P(−.9 ≤ Z ≤ 0) P(.3 ≤ Z ≤ 1.56)

182

Chapter 4

Continuous Variables and Their Probability Distributions

F I G U R E 4.14 Required area for Example 4.9, using both the original and transformed (z) scales

P(80.0000 < Y < 90.0000) = P(0.50 < Z < 1.50) = 0.2417 0.40

0.30 Prob = 0.2417 0.20

0.10

0.00 −4.00

0.50

1.50

4.00

Z 80.00 90.00 Y

d P(−.2 ≤ Z ≤ .2) e P(−1.56 ≤ Z ≤ −.2) f Applet Exercise Use the applet Normal Probabilities to obtain P(0 ≤ Z ≤ 1.2). Why are the values given on the two horizontal axes identical?

4.59

If Z is a standard normal random variable, ﬁnd the value z 0 such that a b c d

4.60

P(Z > z 0 ) = .5. P(Z < z 0 ) = .8643. P(−z 0 < Z < z 0 ) = .90. P(−z 0 < Z < z 0 ) = .99.

A normally distributed random variable has density function f (y) =

1 2 2 e−(y−µ) /(2σ ) , √ σ 2π

−∞ < y < ∞.

Using the fundamental properties associated with any density function, argue that the parameter σ must be such that σ > 0.

4.61

What is the median of a normally distributed random variable with mean µ and standard deviation σ ?

4.62

If Z is a standard normal random variable, what is a b

4.63

P(Z 2 < 1)? P(Z 2 < 3.84146)?

A company that manufactures and bottles apple juice uses a machine that automatically ﬁlls 16-ounce bottles. There is some variation, however, in the amounts of liquid dispensed into the bottles that are ﬁlled. The amount dispensed has been observed to be approximately normally distributed with mean 16 ounces and standard deviation 1 ounce.

Exercises

183

a Use Table 4, Appendix 3, to determine the proportion of bottles that will have more than 17 ounces dispensed into them. b Applet Exercise Use the applet Normal Probabilities to obtain the answer to part (a).

4.64

The weekly amount of money spent on maintenance and repairs by a company was observed, over a long period of time, to be approximately normally distributed with mean $400 and standard deviation $20. If $450 is budgeted for next week, what is the probability that the actual costs will exceed the budgeted amount? a Answer the question, using Table 4, Appendix 3. b Applet Exercise Use the applet Normal Probabilities to obtain the answer. c Why are the labeled values different on the two horizontal axes?

4.65

In Exercise 4.64, how much should be budgeted for weekly repairs and maintenance to provide that the probability the budgeted amount will be exceeded in a given week is only .1?

4.66

A machining operation produces bearings with diameters that are normally distributed with mean 3.0005 inches and standard deviation .0010 inch. Speciﬁcations require the bearing diameters to lie in the interval 3.000 ± .0020 inches. Those outside the interval are considered scrap and must be remachined. With the existing machine setting, what fraction of total production will be scrap? a Answer the question, using Table 4, Appendix 3. b Applet Exercise Obtain the answer, using the applet Normal Probabilities.

4.67

In Exercise 4.66, what should the mean diameter be in order that the fraction of bearings scrapped be minimized?

4.68

The grade point averages (GPAs) of a large population of college students are approximately normally distributed with mean 2.4 and standard deviation .8. What fraction of the students will possess a GPA in excess of 3.0? a Answer the question, using Table 4, Appendix 3. b Applet Exercise Obtain the answer, using the applet Normal Tail Areas and Quantiles.

4.69

Refer to Exercise 4.68. If students possessing a GPA less than 1.9 are dropped from college, what percentage of the students will be dropped?

4.70

Refer to Exercise 4.68. Suppose that three students are randomly selected from the student body. What is the probability that all three will possess a GPA in excess of 3.0?

4.71

Wires manufactured for use in a computer system are speciﬁed to have resistances between .12 and .14 ohms. The actual measured resistances of the wires produced by company A have a normal probability distribution with mean .13 ohm and standard deviation .005 ohm. a What is the probability that a randomly selected wire from company A’s production will meet the speciﬁcations? b If four of these wires are used in each computer system and all are selected from company A, what is the probability that all four in a randomly selected system will meet the speciﬁcations?

4.72

One method of arriving at economic forecasts is to use a consensus approach. A forecast is obtained from each of a large number of analysts; the average of these individual forecasts is the consensus forecast. Suppose that the individual 1996 January prime interest–rate forecasts of all economic analysts are approximately normally distributed with mean 7% and standard

184

Chapter 4

Continuous Variables and Their Probability Distributions

deviation 2.6%. If a single analyst is randomly selected from among this group, what is the probability that the analyst’s forecast of the prime interest rate will a exceed 11%? b be less than 9%?

4.73

The width of bolts of fabric is normally distributed with mean 950 mm (millimeters) and standard deviation 10 mm. a What is the probability that a randomly chosen bolt has a width of between 947 and 958 mm? b What is the appropriate value for C such that a randomly chosen bolt has a width less than C with probability .8531?

4.74

Scores on an examination are assumed to be normally distributed with mean 78 and variance 36. a What is the probability that a person taking the examination scores higher than 72? b Suppose that students scoring in the top 10% of this distribution are to receive an A grade. What is the minimum score a student must achieve to earn an A grade? c What must be the cutoff point for passing the examination if the examiner wants only the top 28.1% of all scores to be passing? d Approximately what proportion of students have scores 5 or more points above the score that cuts off the lowest 25%? e Applet Exercise Answer parts (a)–(d), using the applet Normal Tail Areas and Quantiles. f If it is known that a student’s score exceeds 72, what is the probability that his or her score exceeds 84?

4.75

A soft-drink machine can be regulated so that it discharges an average of µ ounces per cup. If the ounces of ﬁll are normally distributed with standard deviation 0.3 ounce, give the setting for µ so that 8-ounce cups will overﬂow only 1% of the time.

4.76

The machine described in Exercise 4.75 has standard deviation σ that can be ﬁxed at certain levels by carefully adjusting the machine. What is the largest value of σ that will allow the actual amount dispensed to fall within 1 ounce of the mean with probability at least .95?

4.77

The SAT and ACT college entrance exams are taken by thousands of students each year. The mathematics portions of each of these exams produce scores that are approximately normally distributed. In recent years, SAT mathematics exam scores have averaged 480 with standard deviation 100. The average and standard deviation for ACT mathematics scores are 18 and 6, respectively.

4.78

a An engineering school sets 550 as the minimum SAT math score for new students. What percentage of students will score below 550 in a typical year? b What score should the engineering school set as a comparable standard on the ACT math test? √ Show that the maximum value of the normal density with parameters µ and σ is 1/(σ 2π ) and occurs when y = µ.

4.79

Show that the normal density with parameters µ and σ has inﬂection points at the values µ − σ and µ + σ . (Recall that an inﬂection point is a point where the curve changes direction from concave up to concave down, or vice versa, and occurs when the second derivative changes sign. Such a change in sign may occur when the second derivative equals zero.)

4.80

Assume that Y is normally distributed with mean µ and standard deviation σ . After observing a value of Y , a mathematician constructs a rectangle with length L = |Y | and width W = 3|Y |. Let A denote the area of the resulting rectangle. What is E(A)?

4.6

The Gamma Probability Distribution

185

4.6 The Gamma Probability Distribution Some random variables are always nonnegative and for various reasons yield distributions of data that are skewed (nonsymmetric) to the right. That is, most of the area under the density function is located near the origin, and the density function drops gradually as y increases. A skewed probability density function is shown in Figure 4.15. The lengths of time between malfunctions for aircraft engines possess a skewed frequency distribution, as do the lengths of time between arrivals at a supermarket checkout queue (that is, the line at the checkout counter). Similarly, the lengths of time to complete a maintenance checkup for an automobile or aircraft engine possess a skewed frequency distribution. The populations associated with these random variables frequently possess density functions that are adequately modeled by a gamma density function.

DEFINITION 4.9

A random variable Y is said to have a gamma distribution with parameters α > 0 and β > 0 if and only if the density function of Y is α−1 −y/β y e , 0 ≤ y < ∞, β α (α) f (y) = 0, elsewhere, where

" (α) =

∞

y α−1 e−y dy.

The quantity (α) is known as the gamma function. Direct integration will verify that (1) = 1. Integration by parts will verify that (α) = (α − 1)(α − 1) for any α > 1 and that (n) = (n − 1)!, provided that n is an integer. Graphs of gamma density functions for α = 1, 2, and 4 and β = 1 are given in Figure 4.16. Notice in Figure 4.16 that the shape of the gamma density differs for the different values of α. For this reason, α is sometimes called the shape parameter associated with a gamma distribution. The parameter β is generally called the scale parameter because multiplying a gamma-distributed random variable by a positive constant (and thereby changing the scale on which the measurement is made) produces F I G U R E 4.15 A skewed probability density function

f(y)

y

186

Chapter 4

Continuous Variables and Their Probability Distributions

F I G U R E 4.16 Gamma density functions, β = 1

f(y) 1

␣ =1

␣ =2 ␣ =4

y

a random variable that also has a gamma distribution with the same value of α (shape parameter) but with an altered value of β. In the special case when α is an integer, the distribution function of a gammadistributed random variable can be expressed as a sum of certain Poisson probabilities. You will ﬁnd this representation in Exercise 4.99. If α is not an integer and 0 < c < d < ∞, it is impossible to give a closed-form expression for "

d c

y α−1 e−y/β dy. β α (α)

As a result, except when α = 1 (an exponential distribution), it is impossible to obtain areas under the gamma density function by direct integration. Tabulated values for integrals like the above are given in Tables of the Incomplete Gamma Function (Pearson 1965). By far the easiest way to compute probabilities associated with gamma-distributed random variables is to use available statistical software. If Y is a gamma-distributed random variable with parameters α and β, the R (or S-Plus) command pgamma(y0 ,α,1/β) generates P(Y ≤ y0 ), whereas qgamma(q,α,1/β) yields the pth quantile, the value of φ p such that P(Y ≤ φ p ) = p. In addition, one of the applets, Gamma Probabilities and Quantites, accessible at www.thomsonedu.com/statistics/wackerly, can be used to determine probabilities and quantiles associated with gamma-distributed random variables. Another applet at the Thomson website, Comparison of Gamma Density Functions, will permit you to visualize and compare gamma density functions with different values for α and/or β. These applets will be used to answer some of the exercises at the end of this section. As indicated in the next theorem, the mean and variance of gamma-distributed random variables are easy to compute.

THEOREM 4.8

If Y has a gamma distribution with parameters α and β, then µ = E(Y ) = αβ

and σ 2 = V (Y ) = αβ 2 .

4.6

Proof

" E(Y ) =

∞

−∞

The Gamma Probability Distribution

" y f (y) dy =

∞

y 0

y α−1 e−y/β β α (α)

187

dy.

By deﬁnition, the gamma density function is such that " ∞ α−1 −y/β y e dy = 1. β α (α) 0 Hence, "

∞

y α−1 e−y/β dy = β α (α),

and

"

∞

E(Y ) = 0

=

1 y α e−y/β dy = α α β (α) β (α)

"

∞

y α e−y/β dy

1 βα(α) [β α+1 (α + 1)] = = αβ. β α (α) (α)

From Exercise 4.24, V (Y ) = E[Y 2 ] − [E(Y )]2 . Further, " E(Y ) =

y 0

=

∞

2

2

y α−1 e−y/β β α (α)

dy =

1 α β (α)

"

∞

y α+1 e−y/β dy

β 2 (α + 1)α(α) 1 α+2 [β = α(α + 1)β 2 . (α + 2)] = β α (α) (α)

Then V (Y ) = E[Y 2 ]−[E(Y )]2 where, from the earlier part of the derivation, E(Y ) = αβ. Substituting E[Y 2 ] and E(Y ) into the formula for V (Y ), we obtain V (Y ) = α(α + 1)β 2 − (αβ)2 = α 2 β 2 + αβ 2 − α 2 β 2 = αβ 2 . Two special cases of gamma-distributed random variables merit particular consideration. DEFINITION 4.10

Let ν be a positive integer. A random variable Y is said to have a chi-square distribution with ν degrees of freedom if and only if Y is a gamma-distributed random variable with parameters α = ν/2 and β = 2. A random variable with a chi-square distribution is called a chi-square (χ 2 ) random variable. Such random variables occur often in statistical theory. The motivation behind calling the parameter ν the degrees of freedom of the χ 2 distribution rests on one of the major ways for generating a random variable with this distribution and is given in Theorem 6.4. The mean and variance of a χ 2 random variable follow directly from Theorem 4.8.

188

Chapter 4

Continuous Variables and Their Probability Distributions

THEOREM 4.9 Proof

If Y is a chi-square random variable with ν degrees of freedom, then µ = E(Y ) = ν and σ 2 = V (Y ) = 2ν. Apply Theorem 4.8 with α = ν/2 and β = 2. Tables that give probabilities associated with χ 2 distributions are readily available in most statistics texts. Table 6, Appendix 3, gives percentage points associated with χ 2 distributions for many choices of ν. Tables of the general gamma distribution are not so readily available. However, we will show in Exercise 6.46 that if Y has a gamma distribution with α = n/2 for some integer n, then 2Y /β has a χ 2 distribution with n degrees of freedom. Hence, for example, if Y has a gamma distribution with α = 1.5 = 3/2 and β = 4, then 2Y/β = 2Y/4 = Y/2 has a χ 2 distribution with 3 degrees of freedom. Thus, P(Y < 3.5) = P([Y /2] < 1.75) can be found by using readily available tables of the χ 2 distribution. The gamma density function in which α = 1 is called the exponential density function.

DEFINITION 4.11

A random variable Y is said to have an exponential distribution with parameter β > 0 if and only if the density function of Y is 1 e−y/β , 0 ≤ y < ∞, f (y) = β 0, elsewhere. The exponential density function is often useful for modeling the length of life of electronic components. Suppose that the length of time a component already has operated does not affect its chance of operating for at least b additional time units. That is, the probability that the component will operate for more than a + b time units, given that it has already operated for at least a time units, is the same as the probability that a new component will operate for at least b time units if the new component is put into service at time 0. A fuse is an example of a component for which this assumption often is reasonable. We will see in the next example that the exponential distribution provides a model for the distribution of the lifetime of such a component.

THEOREM 4.10 Proof

E X A M PL E 4.10

If Y is an exponential random variable with parameter β, then µ = E(Y ) = β and σ 2 = V (Y ) = β 2 . The proof follows directly from Theorem 4.8 with α = 1.

Suppose that Y has an exponential probability density function. Show that, if a > 0 and b > 0, P(Y > a + b|Y > a) = P(Y > b).

Exercises

Solution

189

From the deﬁnition of conditional probability, we have that P(Y > a + b|Y > a) =

P(Y > a + b) P(Y > a)

because the intersection of the events (Y > a + b) and (Y > a) is the event (Y > a + b). Now ∞ " ∞ 1 −y/β −y/β P(Y > a + b) = e dy = −e = e−(a+b)/β . a+b β a+b Similarly, " P(Y > a) = a

∞

1 −y/β e dy = e−a/β , β

and P(Y > a + b|Y > a) =

e−(a+b)/β = e−b/β = P(Y > b). e−a/β

This property of the exponential distribution is often called the memoryless property of the distribution.

You will recall from Chapter 3 that the geometric distribution, a discrete distribution, also had this memoryless property. An interesting relationship between the exponential and geometric distributions is given in Exercise 4.95.

Exercises 4.81

#∞ a If α > 0, (α) is deﬁned by (α) = 0 y α−1 e−y dy, show that (1) = 1. *b If α > 1, integrate by parts to prove that (α) = (α − 1)(α − 1).

4.82

Use the results obtained in Exercise 4.81 to prove that if n is a positive integer, then (n) = (n − 1)!. What are the numerical values of (2), (4), and (7)?

4.83

Applet Exercise Use the applet Comparison of Gamma Density Functions to obtain the results given in Figure 4.16.

4.84

Applet Exercise Refer to Exercise 4.83. Use the applet Comparison of Gamma Density Functions to compare gamma density functions with (α = 4, β = 1), (α = 40, β = 1), and (α = 80, β = 1). a What do you observe about the shapes of these three density functions? Which are less skewed and more symmetric? b What differences do you observe about the location of the centers of these density functions? c Give an explanation for what you observed in part (b).

190

Chapter 4

Continuous Variables and Their Probability Distributions

4.85

Applet Exercise Use the applet Comparison of Gamma Density Functions to compare gamma density functions with (α = 1, β = 1), (α = 1, β = 2), and (α = 1, β = 4). a b c

4.86

What is another name for the density functions that you observed? Do these densities have the same general shape? The parameter β is a “scale” parameter. What do you observe about the “spread” of these three density functions?

Applet Exercise When we discussed the χ 2 distribution in this section, we presented (with justiﬁcation to follow in Chapter 6) the fact that if Y is gamma distributed with α = n/2 for some integer n, then 2Y /β has a χ 2 distribution. In particular, it was stated that when α = 1.5 and β = 4, W = Y /2 has a χ 2 distribution with 3 degrees of freedom. a Use the applet Gamma Probabilities and Quantiles to ﬁnd P(Y < 3.5). b Use the applet Gamma Probabilities and Quantiles to ﬁnd P(W < 1.75). [Hint: Recall that the χ 2 distribution with ν degrees of freedom is just a gamma distribution with α = ν/2 and β = 2.] c Compare your answers to parts (a) and (b).

4.87

Applet Exercise Let Y and W have the distributions given in Exercise 4.86. a

Use the applet Gamma Probabilities and Quantiles to ﬁnd the .05-quantile of the distribution of Y . b Use the applet Gamma Probabilities and Quantiles to ﬁnd the .05-quantile of the χ 2 distribution with 3 degrees of freedom. c What is the relationship between the .05-quantile of the gamma (α = 1.5, β = 4) distribution and the .05-quantile of the χ 2 distribution with 3 degrees of freedom? Explain this relationship.

4.88

The magnitude of earthquakes recorded in a region of North America can be modeled as having an exponential distribution with mean 2.4, as measured on the Richter scale. Find the probability that an earthquake striking this region will a exceed 3.0 on the Richter scale. b fall between 2.0 and 3.0 on the Richter scale.

4.89

The operator of a pumping station has observed that demand for water during early afternoon hours has an approximately exponential distribution with mean 100 cfs (cubic feet per second). a Find the probability that the demand will exceed 200 cfs during the early afternoon on a randomly selected day. b What water-pumping capacity should the station maintain during early afternoons so that the probability that demand will exceed capacity on a randomly selected day is only .01?

4.90

Refer to Exercise 4.88. Of the next ten earthquakes to strike this region, what is the probability that at least one will exceed 5.0 on the Richter scale?

4.91

If Y has an exponential distribution and P(Y > 2) = .0821, what is a β = E(Y )? b P(Y ≤ 1.7)?

4.92

The length of time Y necessary to complete a key operation in the construction of houses has an exponential distribution with mean 10 hours. The formula C = 100 + 40Y + 3Y 2 relates

Exercises

191

the cost C of completing this operation to the square of the time to completion. Find the mean and variance of C.

4.93

Historical evidence indicates that times between fatal accidents on scheduled American domestic passenger ﬂights have an approximately exponential distribution. Assume that the mean time between accidents is 44 days. a If one of the accidents occurred on July 1 of a randomly selected year in the study period, what is the probability that another accident occurred that same month? b What is the variance of the times between accidents?

4.94

One-hour carbon monoxide concentrations in air samples from a large city have an approximately exponential distribution with mean 3.6 ppm (parts per million). a Find the probability that the carbon monoxide concentration exceeds 9 ppm during a randomly selected one-hour period. b A trafﬁc-control strategy reduced the mean to 2.5 ppm. Now ﬁnd the probability that the concentration exceeds 9 ppm.

4.95

Let Y be an exponentially distributed random variable with mean β. Deﬁne a random variable X in the following way: X = k if k − 1 ≤ Y < k for k = 1, 2, . . . . a Find P(X = k) for each k = 1, 2, . . . . b

Show that your answer to part (a) can be written as k−1 1 − e−1/β , P(X = k) = e−1/β

k = 1, 2, . . .

and that X has a geometric distribution with p = 1 − e−1/β .

4.96

Suppose that a random variable Y has a probability density function given by f (y) =

ky 3 e−y/2 ,

y > 0,

0,

elsewhere.

a Find the value of k that makes f (y) a density function. b Does Y have a χ 2 distribution? If so, how many degrees of freedom? c What are the mean and standard deviation of Y ? d Applet Exercise What is the probability that Y lies within 2 standard deviations of its mean?

4.97

A manufacturing plant uses a speciﬁc bulk product. The amount of product used in one day can be modeled by an exponential distribution with β = 4 (measurements in tons). Find the probability that the plant will use more than 4 tons on a given day.

4.98

Consider the plant of Exercise 4.97. How much of the bulk product should be stocked so that the plant’s chance of running out of the product is only .05?

4.99

If λ > 0 and α is a positive integer, the relationship between incomplete gamma integrals and sums of Poisson probabilities is given by 1 (α)

" λ

∞

y α−1 e−y dy =

α−1 x −λ λ e . x! x=0

192

Chapter 4

Continuous Variables and Their Probability Distributions

a If Y has a gamma distribution with α = 2 and β = 1, ﬁnd P(Y > 1) by using the preceding equality and Table 3 of Appendix 3. b Applet Exercise If Y has a gamma distribution with α = 2 and β = 1, ﬁnd P(Y > 1) by using the applet Gamma Probabilities.

*4.100

Let Y be a gamma-distributed random variable where α is a positive integer and β = 1. The result given in Exercise 4.99 implies that that if y > 0, α−1 x −y y e = P(Y > y). x! x=0

Suppose that X 1 is Poisson distributed with mean λ1 and X 2 is Poisson distributed with mean λ2 , where λ2 > λ1 . a Show that P(X 1 = 0) > P(X 2 = 0). b Let k be any ﬁxed positive integer. Show that P(X 1 ≤ k) = P(Y > λ1 ) and P(X 2 ≤ k) = P(Y > λ2 ), where Y is has a gamma distribution with α = k + 1 and β = 1. c Let k be any ﬁxed positive integer. Use the result derived in part (b) and the fact that λ2 > λ1 to show that P(X 1 ≤ k) > P(X 2 ≤ k). d Because the result in part (c) is valid for any k = 1, 2, 3, . . . and part (a) is also valid, we have established that P(X 1 ≤ k) > P(X 2 ≤ k) for all k = 0, 1, 2, . . . . Interpret this result.

4.101

Applet Exercise Refer to Exercise 4.88. Suppose that the magnitude of earthquakes striking the region has a gamma distribution with α = .8 and β = 2.4. a b c d

4.102

What is the mean magnitude of earthquakes striking the region? What is the probability that the magnitude an earthquake striking the region will exceed 3.0 on the Richter scale? Compare your answers to Exercise 4.88(a). Which probability is larger? Explain. What is the probability that an earthquake striking the regions will fall between 2.0 and 3.0 on the Richter scale?

Applet Exercise Refer to Exercise 4.97. Suppose that the amount of product used in one day has a gamma distribution with α = 1.5 and β = 3. a b

Find the probability that the plant will use more than 4 tons on a given day. How much of the bulk product should be stocked so that the plant’s chance of running out of the product is only .05?

4.103

Explosive devices used in mining operations produce nearly circular craters when detonated. The radii of these craters are exponentially distributed with mean 10 feet. Find the mean and variance of the areas produced by these explosive devices.

4.104

The lifetime (in hours) Y of an electronic component is a random variable with density function given by 1 e−y/100 , y > 0, f (y) = 100 0, elsewhere. Three of these components operate independently in a piece of equipment. The equipment fails if at least two of the components fail. Find the probability that the equipment will operate for at least 200 hours without failure.

4.105

Four-week summer rainfall totals in a section of the Midwest United States have approximately a gamma distribution with α = 1.6 and β = 2.0.

Exercises

193

a Find the mean and variance of the four-week rainfall totals. b Applet Exercise What is the probability that the four-week rainfall total exceeds 4 inches?

4.106

The response times on an online computer terminal have approximately a gamma distribution with mean four seconds and variance eight seconds2 . a b

4.107

Write the probability density function for the response times. Applet Exercise What is the probability that the response time on the terminal is less than ﬁve seconds?

Refer to Exercise 4.106. a Use Tchebysheff’s theorem to give an interval that contains at least 75% of the response times. b Applet Exercise What is the actual probability of observing a response time in the interval you obtained in part (a)?

4.108

Annual incomes for heads of household in a section of a city have approximately a gamma distribution with α = 20 and β = 1000. a Find the mean and variance of these incomes. b Would you expect to ﬁnd many incomes in excess of $30,000 in this section of the city? c Applet Exercise What proportion of heads of households in this section of the city have incomes in excess of $30,000?

4.109

The weekly amount of downtime Y (in hours) for an industrial machine has approximately a gamma distribution with α = 3 and β = 2. The loss L (in dollars) to the industrial operation as a result of this downtime is given by L = 30Y + 2Y 2 . Find the expected value and variance of L.

4.110

If Y has a probability density function given by $ f (y) =

4y 2 e−2y ,

y > 0,

0,

elsewhere,

obtain E(Y ) and V (Y ) by inspection.

4.111

Suppose that Y has a gamma distribution with parameters α and β. a If a is any positive or negative value such that α + a > 0, show that E(Y a ) =

β a (α + a) . (α)

b Why did your answer in part (a) require that α + a > 0? c Show that, with a = 1, the result in part (a) gives E(Y ) = αβ. √ d Use the result in part (a) to give an expression for E( Y ). What do you need to assume about α? √ e Use the result in part (a) to give an expression for E(1/Y ), E(1/ Y ), and E(1/Y 2 ). What do you need to assume about α in each case?

4.112

Suppose that Y has a χ 2 distribution with ν degrees of freedom. Use the results in Exercise 4.111 in your answers to the following. These results will be useful when we study the t and F distributions in Chapter 7.

194

Chapter 4

Continuous Variables and Their Probability Distributions

a Give an expression for E(Y a ) if ν > −2a. b Why did your answer in part (a) require that ν > −2a? √ c Use the result in part (a) to give an expression for E( Y ). What do you need to assume about ν? √ d Use the result in part (a) to give an expression for E(1/Y ), E(1/ Y ), and E(1/Y 2 ). What do you need to assume about ν in each case?

4.7 The Beta Probability Distribution The beta density function is a two-parameter density function deﬁned over the closed interval 0 ≤ y ≤ 1. It is often used as a model for proportions, such as the proportion of impurities in a chemical product or the proportion of time that a machine is under repair.

DEFINITION 4.12

A random variable Y is said to have a beta probability distribution with parameters α > 0 and β > 0 if and only if the density function of Y is α−1 β−1 y (1 − y) , 0 ≤ y ≤ 1, B(α, β) f (y) = 0, elsewhere, where " 1 (α)(β) . y α−1 (1 − y)β−1 dy = B(α, β) = (α + β) 0

The graphs of beta density functions assume widely differing shapes for various values of the two parameters α and β. Some of these are shown in Figure 4.17. Some of the exercises at the end of this section ask you to use the applet Comparison of Beta Density Functions accessible at www.thomsonedu.com/statistics/wackerly to explore and compare the shapes of more beta densities. Notice that deﬁning y over the interval 0 ≤ y ≤ 1 does not restrict the use of the beta distribution. If c ≤ y ≤ d, then y ∗ = (y − c)/(d − c) deﬁnes a new variable such that 0 ≤ y ∗ ≤ 1. Thus, the beta density function can be applied to a random variable deﬁned on the interval c ≤ y ≤ d by translation and a change of scale. The cumulative distribution function for the beta random variable is commonly called the incomplete beta function and is denoted by "

y

F(y) = 0

t α−1 (1 − t)β−1 dt = I y (α, β). B(α, β)

A tabulation of I y (α, β) is given in Tables of the Incomplete Beta Function (Pearson, 1968). When α and β are both positive integers, I y (α, β) is related to the binomial

4.7

F I G U R E 4.17 Beta density functions

The Beta Probability Distribution

195

f ( y)

␣ =5  =3

␣ =3  =3

␣ =2  =2

1

y

probability function. Integration by parts can be used to show that for 0 < y < 1, and α and β both integers, " F(y) = 0

y

n n i t α−1 (1 − t)β−1 dt = y (1 − y)n−i , B(α, β) i i=α

where n = α + β − 1. Notice that the sum on the right-hand side of this expression is just the sum of probabilities associated with a binomial random variable with n = α + β − 1 and p = y. The binomial cumulative distribution function is presented in Table 1, Appendix 3, for n = 5, 10, 15, 20, and 25 and p = .01, .05, .10, .20, .30, .40, .50, .60, .70, .80, .90, .95, and .99. The most efﬁcient way to obtain binomial probabilities is to use statistical software such as R or S-Plus (see Chapter 3). An even easier way to ﬁnd probabilities and quantiles associated with beta-distributed random variables is to use appropriate software directly. The Thomson website provides an applet, Beta Probabilities, that gives “upper-tail” probabilities [that is, P(Y > y0 )] and quantiles associated with beta-distributed random variables. In addition, if Y is a beta-distributed random variable with parameters α and β, the R (or S-Plus) command pbeta(y0,α,1/β) generates P(Y ≤ y0 ), whereas qbeta(p,α,1/β) yields the pth quantile, the value of φ p such that P(Y ≤ φ p ) = p.

THEOREM 4.11

If Y is a beta-distributed random variable with parameters α > 0 and β > 0, then α αβ µ = E(Y ) = and σ 2 = V (Y ) = . α+β (α + β)2 (α + β + 1)

196

Chapter 4

Continuous Variables and Their Probability Distributions

Proof

By deﬁnition, " E(Y ) =

∞ −∞

= = = =

y α−1 (1 − y)β−1 dy B(α, β) 0 " 1 1 y α (1 − y)β−1 dy B(α, β) 0 B(α + 1, β) (because α > 0 implies that α + 1 > 0) B(α, β) (α + 1)(β) (α + β) × (α)(β) (α + β + 1) α(α)(β) α (α + β) × = . (α)(β) (α + β)(α + β) (α + β)

" =

y f (y) dy

1

y

The derivation of the variance is left to the reader (see Exercise 4.130).

We will see in the next example that the beta density function can be integrated directly in the case when α and β are both integers.

E X A M PL E 4.11

Solution

A gasoline wholesale distributor has bulk storage tanks that hold ﬁxed supplies and are ﬁlled every Monday. Of interest to the wholesaler is the proportion of this supply that is sold during the week. Over many weeks of observation, the distributor found that this proportion could be modeled by a beta distribution with α = 4 and β = 2. Find the probability that the wholesaler will sell at least 90% of her stock in a given week. If Y denotes the proportion sold during the week, then (4 + 2) 3 y (1 − y), 0 ≤ y ≤ 1, f (y) = (4)(2) 0, elsewhere, and " ∞ " 1 P(Y > .9) = f (y) dy = 20(y 3 − y 4 ) dy .9

= 20

.9

4 1

y 4

.9

−

5 1

y 5

.9

= 20(.004) = .08.

It is not very likely that 90% of the stock will be sold in a given week.

Exercises

197

Exercises 4.113

Applet Exercise Use the applet Comparison of Beta Density Functions to obtain the results given in Figure 4.17.

4.114

Applet Exercise Refer to Exercise 4.113. Use the applet Comparison of Beta Density Functions to compare beta density functions with (α = 1, β = 1), (α = 1, β = 2), and (α = 2, β = 1). a b c d *e

4.115

What have we previously called the beta distribution with (α = 1, β = 1)? Which of these beta densities is symmetric? Which of these beta densities is skewed right? Which of these beta densities is skewed left? In Chapter 6 we will see that if Y is beta distributed with parameters α and β, then Y ∗ = 1 − Y has a beta distribution with parameters α ∗ = β and β ∗ = α. Does this explain the differences in the graphs of the beta densities?

Applet Exercise Use the applet Comparison of Beta Density Functions to compare beta density functions with (α = 2, β = 2), (α = 3, β = 3), and (α = 9, β = 9). a What are the means associated with random variables with each of these beta distributions? b What is similar about these densities? c How do these densities differ? In particular, what do you observe about the “spread” of these three density functions? d Calculate the standard deviations associated with random variables with each of these beta densities. Do the values of these standard deviations explain what you observed in part (c)? Explain. e Graph some more beta densities with α = β. What do you conjecture about the shape of beta densities with α = β?

4.116

Applet Exercise Use the applet Comparison of Beta Density Functions to compare beta density functions with (α = 1.5, β = 7), (α = 2.5, β = 7), and (α = 3.5, β = 7). a Are these densities symmetric? Skewed left? Skewed right? b What do you observe as the value of α gets closer to 7? c Graph some more beta densities with α > 1, β > 1, and α < β. What do you conjecture about the shape of beta densities when both α > 1, β > 1, and α < β?

4.117

Applet Exercise Use the applet Comparison of Beta Density Functions to compare beta density functions with (α = 9, β = 7), (α = 10, β = 7), and (α = 12, β = 7). a Are these densities symmetric? Skewed left? Skewed right? b What do you observe as the value of α gets closer to 12? c Graph some more beta densities with α > 1, β > 1, and α > β. What do you conjecture about the shape of beta densities with α > β and both α > 1 and β > 1?

4.118

Applet Exercise Use the applet Comparison of Beta Density Functions to compare beta density functions with (α = .3, β = 4), (α = .3, β = 7), and (α = .3, β = 12). a Are these densities symmetric? Skewed left? Skewed right? b What do you observe as the value of β gets closer to 12?

198

Chapter 4

Continuous Variables and Their Probability Distributions

c Which of these beta distributions gives the highest probability of observing a value larger than 0.2? d Graph some more beta densities with α < 1 and β > 1. What do you conjecture about the shape of beta densities with α < 1 and β > 1?

4.119

Applet Exercise Use the applet Comparison of Beta Density Functions to compare beta density functions with (α = 4, β = 0.3), (α = 7, β = 0.3), and (α = 12, β = 0.3). a Are these densities symmetric? Skewed left? Skewed right? b What do you observe as the value of α gets closer to 12? c Which of these beta distributions gives the highest probability of observing a value less than 0.8? d Graph some more beta densities with α > 1 and β < 1. What do you conjecture about the shape of beta densities with α > 1 and β < 1?

*4.120

In Chapter 6 we will see that if Y is beta distributed with parameters α and β, then Y ∗ = 1 − Y has a beta distribution with parameters α ∗ = β and β ∗ = α. Does this explain the differences and similarities in the graphs of the beta densities in Exercises 4.118 and 4.119?

4.121

Applet Exercise Use the applet Comparison of Beta Density Functions to compare beta density functions with (α = 0.5, β = 0.7), (α = 0.7, β = 0.7), and (α = 0.9, β = 0.7). a b

4.122

What is the general shape of these densities? What do you observe as the value of α gets larger?

Applet Exercise Beta densities with α < 1 and β < 1 are difﬁcult to display because of scaling/resolution problems. a Use the applet Beta Probabilities and Quantiles to compute P(Y > 0.1) if Y has a beta distribution with (α = 0.1, β = 2). b Use the applet Beta Probabilities and Quantiles to compute P(Y < 0.1) if Y has a beta distribution with (α = 0.1, β = 2). c Based on your answer to part (b), which values of Y are assigned high probabilities if Y has a beta distribution with (α = 0.1, β = 2)? d Use the applet Beta Probabilities and Quantiles to compute P(Y < 0.1) if Y has a beta distribution with (α = 0.1, β = 0.2). e Use the applet Beta Probabilities and Quantiles to compute P(Y > 0.9) if Y has a beta distribution with (α = 0.1, β = 0.2). f Use the applet Beta Probabilities and Quantiles to compute P(0.1 < Y < 0.9) if Y has a beta distribution with (α = .1, β = 0.2). g Based on your answers to parts (d), (e), and (f ), which values of Y are assigned high probabilities if Y has a beta distribution with (α = 0.1, β = 0.2)?

4.123

The relative humidity Y , when measured at a location, has a probability density function given by f (y) =

ky 3 (1 − y)2 ,

0 ≤ y ≤ 1,

0,

elsewhere.

a Find the value of k that makes f (y) a density function. b Applet Exercise Use the applet Beta Probabilities and Quantiles to ﬁnd a humidity value that is exceeded only 5% of the time.

Exercises

4.124

199

The percentage of impurities per batch in a chemical product is a random variable Y with density function $ 12y 2 (1 − y), 0 ≤ y ≤ 1, f (y) = 0, elsewhere. A batch with more than 40% impurities cannot be sold. a b

Integrate the density directly to determine the probability that a randomly selected batch cannot be sold because of excessive impurities. Applet Exercise Use the applet Beta Probabilities and Quantiles to ﬁnd the answer to part (a).

4.125

Refer to Exercise 4.124. Find the mean and variance of the percentage of impurities in a randomly selected batch of the chemical.

4.126

The weekly repair cost Y for a machine has a probability density function given by $ 3(1 − y)2 , 0 < y < 1, f (y) = 0, elsewhere, with measurements in hundreds of dollars. How much money should be budgeted each week for repair costs so that the actual cost will exceed the budgeted amount only 10% of the time?

4.127

Verify that if Y has a beta distribution with α = β = 1, then Y has a uniform distribution over (0, 1). That is, the uniform distribution over the interval (0, 1) is a special case of a beta distribution.

4.128

Suppose that a random variable Y has a probability density function given by $ 6y(1 − y), 0 ≤ y ≤ 1, f (y) = 0, elsewhere. a Find F(y). b Graph F(y) and f (y). c Find P(.5 ≤ Y ≤ .8).

4.129

During an eight-hour shift, the proportion of time Y that a sheet-metal stamping machine is down for maintenance or repairs has a beta distribution with α = 1 and β = 2. That is, $ 2(1 − y), 0 ≤ y ≤ 1, f (y) = 0, elsewhere. The cost (in hundreds of dollars) of this downtime, due to lost production and cost of maintenance and repair, is given by C = 10 + 20Y + 4Y 2 . Find the mean and variance of C.

4.130

Prove that the variance of a beta-distributed random variable with parameters α and β is σ2 =

4.131

αβ (α +

β)2 (α

+ β + 1)

.

Errors in measuring the time of arrival of a wave front from an acoustic source sometimes have an approximate beta distribution. Suppose that these errors, measured in microseconds, have approximately a beta distribution with α = 1 and β = 2. a What is the probability that the measurement error in a randomly selected instance is less than .5 µs? b Give the mean and standard deviation of the measurement errors.

200

Chapter 4

Continuous Variables and Their Probability Distributions

4.132

Proper blending of ﬁne and coarse powders prior to copper sintering is essential for uniformity in the ﬁnished product. One way to check the homogeneity of the blend is to select many small samples of the blended powders and measure the proportion of the total weight contributed by the ﬁne powders in each. These measurements should be relatively stable if a homogeneous blend has been obtained. a Suppose that the proportion of total weight contributed by the ﬁne powders has a beta distribution with α = β = 3. Find the mean and variance of the proportion of weight contributed by the ﬁne powders. b Repeat part (a) if α = β = 2. c Repeat part (a) if α = β = 1. d Which of the cases—parts (a), (b), or (c)—yields the most homogeneous blending?

4.133

The proportion of time per day that all checkout counters in a supermarket are busy is a random variable Y with a density function given by $ 2 cy (1 − y)4 , 0 ≤ y ≤ 1, f (y) = 0, elsewhere. Find the value of c that makes f (y) a probability density function. Find E(Y ). (Use what you have learned about the beta-type distribution. Compare your answers to those obtained in Exercise 4.28.) c Calculate the standard deviation of Y . d Applet Exercise Use the applet Beta Probabilities and Quantiles to ﬁnd P(Y > µ + 2σ ). a b

4.134

In the text of this section, we noted the relationship between the distribution function of a beta-distributed random variable and sums of binomial probabilities. Speciﬁcally, if Y has a beta distribution with positive integer values for α and β and 0 < y < 1, "

y

F(y) = 0

n t α−1 (1 − t)β−1 n i y (1 − y)n−i , dt = i B(α, β) i=α

where n = α + β − 1. a If Y has a beta distribution with α = 4 and β = 7, use the appropriate binomial tables to ﬁnd P(Y ≤ .7) = F(.7). b If Y has a beta distribution with α = 12 and β = 14, use the appropriate binomial tables to ﬁnd P(Y ≤ .6) = F(.6). c Applet Exercise Use the applet Beta Probabilities and Quantiles to ﬁnd the probabilities in parts (a) and (b).

*4.135

Suppose that Y1 and Y2 are binomial random variables with parameters (n, p1 ) and (n, p2 ), respectively, where p1 < p2 . (Note that the parameter n is the same for the two variables.) a Use the binomial formula to deduce that P(Y1 = 0) > P(Y2 = 0). b Use the relationship between the beta distribution function and sums of binomial probabilities given in Exercise 4.134 to deduce that, if k is an integer between 1 and n − 1, P(Y1 ≤ k) =

k n i=0

i

"

1

( p1 )i (1 − p1 )n−i = p1

t k (1 − t)n−k−1 dt. B(k + 1, n − k)

4.8

Some General Comments

201

c If k is an integer between 1 and n − 1, the same argument used in part (b) yields that P(Y2 ≤ k) =

k n i=0

i

"

1

( p2 )i (1 − p2 )n−i = p2

t k (1 − t)n−k−1 dt. B(k + 1, n − k)

Show that, if k is any integer between 1 and n − 1, P(Y1 ≤ k) > P(Y2 ≤ k). Interpret this result.

4.8 Some General Comments Keep in mind that density functions are theoretical models for populations of real data that occur in random phenomena. How do we know which model to use? How much does it matter if we use the wrong density as our model for reality? To answer the latter question ﬁrst, we are unlikely ever to select a density function that provides a perfect representation of nature; but goodness of ﬁt is not the criterion for assessing the adequacy of our model. The purpose of a probabilistic model is to provide the mechanism for making inferences about a population based on information contained in a sample. The probability of the observed sample (or a quantity proportional to it) is instrumental in making an inference about the population. It follows that a density function that provides a poor ﬁt to the population frequency distribution could (but does not necessarily) yield incorrect probability statements and lead to erroneous inferences about the population. A good model is one that yields good inferences about the population of interest. Selecting a reasonable model is sometimes a matter of acting on theoretical considerations. Often, for example, a situation in which the discrete Poisson random variable is appropriate is indicated by the random behavior of events in time. Knowing this, we can show that the length of time between any adjacent pair of events follows an exponential distribution. Similarly, if a and b are integers, a < b, then the length of time between the occurrences of the ath and bth events possesses a gamma distribution with α = b − a. We will later encounter a theorem (called the central limit theorem) that outlines some conditions that imply that a normal distribution would be a suitable approximation for the distribution of data. A second way to select a model is to form a frequency histogram (Chapter 1) for data drawn from the population and to choose a density function that would visually appear to give a similar frequency curve. For example, if a set of n = 100 sample measurements yielded a bell-shaped frequency distribution, we might conclude that the normal density function would adequately model the population frequency distribution. Not all model selection is completely subjective. Statistical procedures are available to test a hypothesis that a population frequency distribution is of a particular type. We can also calculate a measure of goodness of ﬁt for several distributions and select the best. Studies of many common inferential methods have been made to determine the magnitude of the errors of inference introduced by incorrect population models. It is comforting to know that many statistical methods of inference are insensitive to assumptions about the form of the underlying population frequency distribution.

202

Chapter 4

Continuous Variables and Their Probability Distributions

The uniform, normal, gamma, and beta distributions offer an assortment of density functions that ﬁt many population frequency distributions. Another, the Weibull distribution, appears in the exercises at the end of the chapter.

4.9 Other Expected Values Moments for continuous random variables have deﬁnitions analogous to those given for the discrete case. DEFINITION 4.13

If Y is a continuous random variable, then the kth moment about the origin is given by µk = E(Y k ),

k = 1, 2, . . . .

The kth moment about the mean, or the kth central moment, is given by µk = E[(Y − µ)k ],

k = 1, 2, . . . .

Notice that for k = 1, µ1 = µ, and for k = 2, µ2 = V (Y ) = σ 2 . E X A M PL E 4.12 Solution

Find µk for the uniform random variable with θ1 = 0 and θ2 = θ. By deﬁnition, µk

" = E(Y ) = k

∞

−∞

" y f (y) dy = k

θ

θ 1 θk y k+1 . y = dy = θ θ(k + 1) 0 k+1 k

Thus, µ1 = µ =

θ , 2

µ2 =

θ2 , 3

µ3 =

θ3 , 4

and so on.

DEFINITION 4.14

If Y is a continuous random variable, then the moment-generating function of Y is given by m(t) = E(etY ). The moment-generating function is said to exist if there exists a constant b > 0 such that m(t) is ﬁnite for |t| ≤ b.

This is simply the continuous analogue of Deﬁnition 3.14. That m(t) generates moments is established in exactly the same manner as in Section 3.9. If m(t) exists,

4.9

then

E etY =

"

∞ −∞

" =

∞

−∞

Other Expected Values

203

t 3 y3 t 2 y2 + + · · · f (y) dy 2! 3! −∞ " ∞ " t2 ∞ 2 f (y) dy + t y f (y) dy + y f (y) dy + · · · 2! −∞ −∞ "

et y f (y) dy =

= 1 + tµ1 +

∞

1 + ty +

t2 t3 µ2 + µ3 + · · · . 2! 3!

Notice that the moment-generating function, t2 µ + ···, 2! 2 takes the same form for both discrete and continuous random variables. Hence, Theorem 3.12 holds for continuous random variables, and d k m(t) = µk . dt k t=0 m(t) = 1 + tµ1 +

EXAMPLE 4.13 Solution

Find the moment-generating function for a gamma-distributed random variable. α−1 −y/β " ∞ y e et y dy m(t) = E etY = β α (α) 0 " ∞ 1 1 −t dy y α−1 exp −y = α β (α) 0 β " ∞ −y 1 α−1 = α y exp dy. β (α) 0 β/(1 − βt) [The term exp(·) is simply a more convenient way to write e(·) when the term in the exponent is long or complex.] To complete the integration, notice that the integral of the variable factor of any density function must equal the reciprocal of the constant factor. That is, if f (y) = cg(y), where c is a constant, then " ∞ " ∞ " ∞ 1 f (y) dy = cg(y) dy = 1 and so g(y) dy = . c −∞ −∞ −∞ Applying this result to the integral in m(t) and noting that if [β/(1 − βt)] > 0 (or, equivalently, if t < 1/β), g(y) = y α−1 × exp{−y/[β/(1 − βt)]} is the variable factor of a gamma density function with parameters α > 0 and [β/ (1 − βt)] > 0 , we obtain α β 1 1 1 (α) = for t < . m(t) = α β (α) 1 − βt (1 − βt)α β

204

Chapter 4

Continuous Variables and Their Probability Distributions

The moments µk can be extracted from the moment-generating function by differentiating with respect to t (in accordance with Theorem 3.12) or by expanding the function into a power series in t. We will demonstrate the latter approach. E X A M PL E 4.14

Expand the moment-generating function of Example 4.13 into a power series in t and thereby obtain µk .

Solution

From Example 4.13, m(t) = 1/(1 − βt)α = (1 − βt)−α . Using the expansion for a binomial term of the form (x + y)−c , we have m(t) = (1 − βt)−α = 1 + (−α)(1)−α−1 (−βt) (−α)(−α − 1)(1)−α−2 (−βt)2 + ··· 2! t 2 [α(α + 1)β 2 ] t 3 [α(α + 1)(α + 2)β 3 ] + + ··· . = 1 + t (αβ) + 2! 3! +

Because µk is the coefﬁcient of t k /k!, we ﬁnd, by inspection, µ1 = µ = αβ, µ2 = α(α + 1)β 2 , µ3 = α(α + 1)(α + 2)β 3 , and, in general, µk = α(α + 1)(α + 2) · · · (α + k − 1)β k . Notice that µ1 and µ2 agree with the results of Theorem 4.8. Moreover, these results agree with the result of Exercise 4.111(a).

We have already explained the importance of the expected values of Y k , (Y − µ)k , and etY, all of which provide important information about the distribution of Y . Sometimes, however, we are interested in the expected value of a function of a random variable as an end in itself. (We also may be interested in the probability distribution of functions of random variables, but we defer discussion of this topic until Chapter 6.) E X A M PL E 4.15

The kinetic energy k associated with a mass m moving at velocity ν is given by the expression k=

mν 2 . 2

Consider a device that ﬁres a serrated nail into concrete at a mean velocity of 2000 feet per second, where the random velocity V possesses a density function given by f (ν) =

ν 3 e−ν/500 , (500)4 (4)

ν ≥ 0.

Find the expected kinetic energy associated with a nail of mass m.

4.9

Solution

205

Other Expected Values

Let K denote the random kinetic energy associated with the nail. Then mV 2 m E(K ) = E = E(V 2 ), 2 2 by Theorem 4.5, part 2. The random variable V has a gamma distribution with α = 4 and β = 500. Therefore, E(V 2 ) = µ2 for the random variable V . Referring to Example 4.14, we have µ2 = α(α + 1)β 2 = 4(5)(500)2 = 5,000,000. Therefore, m m E(K ) = E(V 2 ) = (5,000,000) = 2,500,000 m. 2 2 Finding the moments of a function of a random variable is frequently facilitated by using its moment-generating function.

THEOREM 4.12

Let Y be a random variable with density function f (y) and g(Y ) be a function of Y . Then the moment-generating function for g(Y ) is " ∞ tg(Y ) ]= etg(y) f (y) dy. E[e −∞

This theorem follows directly from Deﬁnition 4.14 and Theorem 4.4. EXAMPLE 4.16 Solution

Let g(Y ) = Y − µ, where Y is a normally distributed random variable with mean µ and variance σ 2 . Find the moment-generating function for g(Y ). The moment-generating function of g(Y ) is given by " ∞ exp[−(y − µ)2 /2σ 2 ] et (y−µ) m(t) = E[etg(Y ) ] = E[et (Y −µ) ] = dy. √ σ 2π −∞ To integrate, let u = y − µ. Then du = dy and " ∞ 1 2 2 etu e−u /(2σ ) du m(t) = √ σ 2π −∞ " ∞ 1 1 2 2 = √ (u − 2σ tu) du. exp − 2σ 2 σ 2π −∞ Complete the square in the exponent of e by multiplying and dividing by et " ∞ exp[−(1/2σ 2 )(u 2 − 2σ 2 tu + σ 4 t 2 )] 2 2 m(t) = et σ /2 du √ σ 2π −∞ " ∞ exp[−(u − σ 2 t)2 /2σ 2 ] t 2 σ 2 /2 du. =e √ σ 2π −∞

2

σ 2 /2

. Then

The function inside the integral is a normal density function with mean σ 2 t and variance σ 2 . (See the equation for the normal density function in Section 4.5.) Hence, the integral is equal to 1, and m(t) = e(t

2

/2)σ 2

.

206

Chapter 4

Continuous Variables and Their Probability Distributions

The moments of U = Y − µ can be obtained from m(t) by differentiating m(t) in accordance with Theorem 3.12 or by expanding m(t) into a series.

The purpose of the preceding discussion of moments is twofold. First, moments can be used as numerical descriptive measures to describe the data that we obtain in an experiment. Second, they can be used in a theoretical sense to prove that a random variable possesses a particular probability distribution. It can be shown that if two random variables Y and Z possess identical moment-generating functions, then Y and Z possess identical probability distributions. This latter application of moments was mentioned in the discussion of moment-generating functions for discrete random variables in Section 3.9; it applies to continuous random variables as well. For your convenience, the probability and density functions, means, variances, and moment-generating functions for some common random variables are given in Appendix 2 and inside the back cover of this text.

Exercises 4.136

Suppose that the waiting time for the ﬁrst customer to enter a retail shop after 9:00 A.M. is a random variable Y with an exponential density function given by 1 e−y/θ , y > 0, θ f (y) = 0, elsewhere. a Find the moment-generating function for Y . b Use the answer from part (a) to ﬁnd E(Y ) and V (Y ).

4.137

Show that the result given in Exercise 3.158 also holds for continuous random variables. That is, show that, if Y is a random variable with moment-generating function m(t) and U is given by U = aY + b, the moment-generating function of U is etb m(at). If Y has mean µ and variance σ 2 , use the moment-generating function of U to derive the mean and variance of U .

4.138

Example 4.16 derives the moment-generating function for Y − µ, where Y is normally distributed with mean µ and variance σ 2 . a b

Use the results in Example 4.16 and Exercise 4.137 to ﬁnd the moment-generating function for Y . Differentiate the moment-generating function found in part (a) to show that E(Y ) = µ and V (Y ) = σ 2 .

4.139

The moment-generating function of a normally distributed random variable, Y , with mean 2 2 µ and variance σ 2 was shown in Exercise 4.138 to be m(t) = eµt+(1/2)t σ . Use the result in Exercise 4.137 to derive the moment-generating function of X = −3Y + 4. What is the distribution of X ? Why?

4.140

Identify the distributions of the random variables with the following moment-generating functions: a m(t) = (1 − 4t)−2 . b m(t) = 1/(1 − 3.2t). 2 c m(t) = e−5t+6t .

4.10

Tchebysheff’s Theorem

207

4.141

If θ1 < θ2 , derive the moment-generating function of a random variable that has a uniform distribution on the interval (θ1 , θ2 ).

4.142

Refer to Exercises 4.141 and 4.137. Suppose that Y is uniformly distributed on the interval (0, 1) and that a > 0 is a constant. a Give the moment-generating function for Y . b Derive the moment-generating function of W = aY . What is the distribution of W ? Why? c Derive the moment-generating function of X = −aY . What is the distribution of X ? Why? d If b is a ﬁxed constant, derive the moment-generating function of V = aY + b. What is the distribution of V ? Why?

4.143

The moment-generating function for the gamma random variable is derived in Example 4.13. Differentiate this moment-generating function to ﬁnd the mean and variance of the gamma distribution.

4.144

Consider a random variable Y with density function given by f (y) = ke−y

2 /2

,

−∞ < y < ∞.

a Find k. b Find the moment-generating function of Y . c Find E(Y ) and V (Y ).

4.145

A random variable Y has the density function $ y e , f (y) = 0,

y < 0, elsewhere.

a Find E(e3Y /2 ). b Find the moment-generating function for Y . c Find V (Y ).

4.10 Tchebysheff’s Theorem As was the case for discrete random variables, an interpretation of µ and σ for continuous random variables is provided by the empirical rule and Tchebysheff’s theorem. Even if the exact distributions are unknown for random variables of interest, knowledge of the associated means and standard deviations permits us to deduce meaningful bounds for the probabilities of events that are often of interest. We stated and utilized Tchebysheff’s theorem in Section 3.11. We now restate this theorem and give a proof applicable to a continuous random variable. THEOREM 4.13

Tchebysheff’s Theorem Let Y be a random variable with ﬁnite mean µ and variance σ 2 . Then, for any k > 0, 1 1 P(|Y − µ| < kσ ) ≥ 1 − 2 or P(|Y − µ| ≥ kσ ) ≤ 2 . k k

208

Chapter 4

Continuous Variables and Their Probability Distributions

Proof

We will give the proof for a continuous random variable. The proof for the discrete case proceeds similarly. Let f (y) denote the density function of Y . Then " ∞ 2 (y − µ)2 f (y) dy V (Y ) = σ = −∞

" =

µ−kσ −∞

" +

"

(y − µ)2 f (y) dy +

∞

µ+kσ

µ+kσ µ−kσ

(y − µ)2 f (y) dy

(y − µ)2 f (y) dy.

The second integral is always greater than or equal to zero, and (y −µ)2 ≥ k 2 σ 2 for all values of y between the limits of integration for the ﬁrst and third integrals; that is, the regions of integration are in the tails of the density function and cover only values of y for which (y − µ)2 ≥ k 2 σ 2 . Replace the second integral by zero and substitute k 2 σ 2 for (y − µ)2 in the ﬁrst and third integrals to obtain the inequality " µ−kσ " ∞ 2 2 2 V (Y ) = σ ≥ k σ f (y) dy + k 2 σ 2 f (y) dy. −∞

Then

" σ ≥k σ 2

2

2

µ+kσ

µ−kσ −∞

" f (y) dy +

+∞ µ+kσ

f (y) dy ,

or σ 2 ≥ k 2 σ 2 [P(Y ≤ µ − kσ ) + P(Y ≥ µ + kσ )] = k 2 σ 2 P(|Y − µ| ≥ kσ ). Dividing by k 2 σ 2 , we obtain P(|Y − µ| ≥ kσ ) ≤

1 , k2

or, equivalently, P(|Y − µ| < kσ ) ≥ 1 −

1 . k2

One real value of Tchebysheff’s theorem is that it enables us to ﬁnd bounds for probabilities that ordinarily would have to be obtained by tedious mathematical manipulations (integration or summation). Further, we often can obtain means and variances of random variables (see Example 4.15) without specifying the distribution of the variable. In situations like these, Tchebysheff’s theorem still provides meaningful bounds for probabilities of interest. E X A M PL E 4.17

Suppose that experience has shown that the length of time Y (in minutes) required to conduct a periodic maintenance check on a dictating machine follows a gamma distribution with α = 3.1 and β = 2. A new maintenance worker takes 22.5 minutes to

Exercises

209

check the machine. Does this length of time to perform a maintenance check disagree with prior experience? Solution

The mean and variance for the length of maintenance check times (based on prior experience) are (from Theorem 4.8) µ = αβ = (3.1)(2) = 6.2 and σ 2 = αβ 2 = (3.1)(22 ) = 12.4. √ It follows that σ = 12.4 = 3.52. Notice that y = 22.5 minutes exceeds the mean µ = 6.2 minutes by 16.3 minutes, or k = 16.3/3.52 = 4.63 standard deviations. Then from Tchebysheff’s theorem, 1 P(|Y − 6.2| ≥ 16.3) = P(|Y − µ| ≥ 4.63σ ) ≤ = .0466. (4.63)2 This probability is based on the assumption that the distribution of maintenance times has not changed from prior experience. Then, observing that P(Y ≥ 22.5) is small, we must conclude either that our new maintenance worker has generated by chance a lengthy maintenance time that occurs with low probability or that the new worker is somewhat slower than preceding ones. Considering the low probability for P(Y ≥ 22.5), we favor the latter view.

The exact probability, P(Y ≥ 22.5), for Example 4.17 would require evaluation of the integral " ∞ 2.1 −y/2 y e dy. P(Y ≥ 22.5) = 3.1 22.5 2 (3.1) Although we could utilize tables given by Pearson (1965) to evaluate this integral, we cannot evaluate it directly. We could, of course use R or S-Plus or one of the provided applets to numerically evaluate this probability. Unless we use statistical software, similar integrals are difﬁcult to evaluate for the beta density and for many other density functions. Tchebysheff’s theorem often provides quick bounds for probabilities while circumventing laborious integration, utilization of software, or searches for appropriate tables.

Exercises 4.146

A manufacturer of tires wants to advertise a mileage interval that excludes no more than 10% of the mileage on tires he sells. All he knows is that, for a large number of tires tested, the mean mileage was 25,000 miles, and the standard deviation was 4000 miles. What interval would you suggest?

4.147

A machine used to ﬁll cereal boxes dispenses, on the average, µ ounces per box. The manufacturer wants the actual ounces dispensed Y to be within 1 ounce of µ at least 75% of the time. What is the largest value of σ , the standard deviation of Y , that can be tolerated if the manufacturer’s objectives are to be met?

4.148

Find P(|Y − µ| ≤ 2σ ) for Exercise 4.16. Compare with the corresponding probabilistic statements given by Tchebysheff’s theorem and the empirical rule.

210

Chapter 4

Continuous Variables and Their Probability Distributions

4.149

Find P(|Y − µ| ≤ 2σ ) for the uniform random variable. Compare with the corresponding probabilistic statements given by Tchebysheff’s theorem and the empirical rule.

4.150

Find P(|Y − µ| ≤ 2σ ) for the exponential random variable. Compare with the corresponding probabilistic statements given by Tchebysheff’s theorem and the empirical rule.

4.151

Refer to Exercise 4.92. Would you expect C to exceed 2000 very often?

4.152

Refer to Exercise 4.109. Find an interval that will contain L for at least 89% of the weeks that the machine is in use.

4.153

Refer to Exercise 4.129. Find an interval for which the probability that C will lie within it is at least .75.

4.154

Suppose that Y is a χ 2 distributed random variable with ν = 7 degrees of freedom. a What are the mean and variance of Y ? b Is it likely that Y will take on a value of 23 or more? c Applet Exercise Use the applet Gamma Probabilities and Quantiles to ﬁnd P(Y > 23).

4.11 Expectations of Discontinuous Functions and Mixed Probability Distributions (Optional) Problems in probability and statistics sometimes involve functions that are partly continuous and partly discrete, in one of two ways. First, we may be interested in the properties, perhaps the expectation, of a random variable g(Y ) that is a discontinuous function of a discrete or continuous random variable Y . Second, the random variable of interest itself may have a distribution function that is continuous over some intervals and such that some isolated points have positive probabilities. We illustrate these ideas with the following examples. E X A M PL E 4.18

A retailer for a petroleum product sells a random amount Y each day. Suppose that Y , measured in thousands of gallons, has the probability density function $ f (y) =

(3/8)y 2 , 0 ≤ y ≤ 2, 0,

elsewhere.

The retailer’s proﬁt turns out to be $100 for each 1000 gallons sold (10 c| per gallon) if Y ≤ 1 and $40 extra per 1000 gallons (an extra 4 c| per gallon) if Y > 1. Find the retailer’s expected proﬁt for any given day. Solution

Let g(Y ) denote the retailer’s daily proﬁt. Then $ g(Y ) =

100Y,

0 ≤ Y ≤ 1,

140Y, 1 < Y ≤ 2.

4.11

Expectations of Discontinuous Functions

211

We want to ﬁnd expected proﬁt; by Theorem 4.4, the expectation is " ∞ E[g(Y )] = g(y) f (y) dy −∞

" 2 3 3 y 2 dy + y 2 dy 140y 8 8 0 1 1 2 420 4 300 4 y y + = (8)(4) (8)(4) 1 0 "

=

1

100y

420 300 (1) + (15) = 206.25. 32 32 Thus, the retailer can expect a proﬁt of $206.25 on the daily sale of this particular product. =

Suppose that Y denotes the amount paid out per policy in one year by an insurance company that provides automobile insurance. For many policies, Y = 0 because the insured individuals are not involved in accidents. For insured individuals who do have accidents, the amount paid by the company might be modeled with one of the density functions that we have previously studied. A random variable Y that has some of its probability at discrete points (0 in this example) and the remainder spread over intervals is said to have a mixed distribution. Let F(y) denote a distribution function of a random variable Y that has a mixed distribution. For all practical purposes, any mixed distribution function F(y) can be written uniquely as F(y) = c1 F1 (y) + c2 F2 (y), where F1 (y) is a step distribution function, F2 (y) is a continuous distribution function, c1 is the accumulated probability of all discrete points, and c2 = 1 − c1 is the accumulated probability of all continuous portions. The following example gives an illustration of a mixed distribution.

EXAMPLE 4.19

Let Y denote the length of life (in hundreds of hours) of electronic components. These components frequently fail immediately upon insertion into a system. It has been observed that the probability of immediate failure is 1/4. If a component does not fail immediately, the distribution for its length of life has the exponential density function $ −y e , y > 0, f (y) = 0, elsewhere. Find the distribution function for Y and evaluate P(Y > 10).

Solution

There is only one discrete point, y = 0, and this point has probability 1/4. Hence, c1 = 1/4 and c2 = 3/4. It follows that Y is a mixture of the distributions of two

212

Chapter 4

Continuous Variables and Their Probability Distributions

F I G U R E 4.18 Distribution function F (y) for Example 4.19

F(y) 1

1/4 0

y

random variables, X 1 and X 2 , where X 1 has probability 1 at point 0 and X 2 has the given exponential density. That is, $ 0, y < 0, F1 (y) = 1, y ≥ 0, and $ F2 (y) =

0, #y 0

y < 0, e

−x

dx = 1 − e

−y

,

y ≥ 0.

Now F(y) = (1/4)F1 (y) + (3/4)F2 (y), and, hence, P(Y > 10) = 1 − P(Y ≤ 10) = 1 − F(10) = 1 − [(1/4) + (3/4)(1 − e−10 )] = (3/4)[1 − (1 − e−10 )] = (3/4)e−10 . A graph of F(y) is given in Figure 4.18.

An easy method for ﬁnding expectations of random variables with mixed distributions is given in Deﬁnition 4.15.

DEFINITION 4.15

Let Y have the mixed distribution function F(y) = c1 F1 (y) + c2 F2 (y) and suppose that X 1 is a discrete random variable with distribution function F1 (y) and that X 2 is a continuous random variable with distribution function F2 (y). Let g(Y ) denote a function of Y . Then E[g(Y )] = c1 E[g(X 1 )] + c2 E[g(X 2 )].

Exercises

EXAMPLE 4.20 Solution

213

Find the mean and variance of the random variable deﬁned in Example 4.19. With all deﬁnitions as in Example 4.19, it follows that " ∞ E(X 1 ) = 0 and E(X 2 ) = ye−y dy = 1. 0

Therefore, µ = E(Y ) = (1/4)E(X 1 ) + (3/4)E(X 2 ) = 3/4. Also,

" E(X 12 ) = 0

and

E(X 22 ) =

∞

y 2 e−y dy = 2.

Therefore, E(Y 2 ) = (1/4)E(X 12 ) + (3/4)E(X 22 ) = (1/4)(0) + (3/4)(2) = 3/2. Then V (Y ) = E(Y 2 ) − µ2 = (3/2) − (3/4)2 = 15/16.

Exercises *4.155

A builder of houses needs to order some supplies that have a waiting time Y for delivery, with a continuous uniform distribution over the interval from 1 to 4 days. Because she can get by without them for 2 days, the cost of the delay is ﬁxed at $100 for any waiting time up to 2 days. After 2 days, however, the cost of the delay is $100 plus $20 per day (prorated) for each additional day. That is, if the waiting time is 3.5 days, the cost of the delay is $100 + $20(1.5) = $130. Find the expected value of the builder’s cost due to waiting for supplies.

*4.156

The duration Y of long-distance telephone calls (in minutes) monitored by a station is a random variable with the properties that P(Y = 3) = .2

and

P(Y = 6) = .1.

Otherwise, Y has a continuous density function given by $ (1/4)ye−y/2 , y > 0, f (y) = 0, elsewhere. The discrete points at 3 and 6 are due to the fact that the length of the call is announced to the caller in three-minute intervals and the caller must pay for three minutes even if he talks less than three minutes. Find the expected duration of a randomly selected long-distance call.

*4.157

The life length Y of a component used in a complex electronic system is known to have an exponential density with a mean of 100 hours. The component is replaced at failure or at age 200 hours, whichever comes ﬁrst. a Find the distribution function for X , the length of time the component is in use. b Find E(X ).

214

Chapter 4

Continuous Variables and Their Probability Distributions

*4.158

Consider the nail-ﬁring device of Example 4.15. When the device works, the nail is ﬁred with velocity, V , with density f (v) =

v 3 e−v/500 . (500)4 (4)

The device misﬁres 2% of the time it is used, resulting in a velocity of 0. Find the expected kinetic energy associated with a nail of mass m. Recall that the kinetic energy, k, of a mass m moving at velocity v is k = (mv 2 )/2.

*4.159

A random variable Y has distribution function 0, y 2 + 0.1, F(y) = y, 1,

if y < 0, if 0 ≤ y < 0.5, if 0.5 ≤ y < 1, if y ≥ 1.

a Give F1 (y) and F2 (y), the discrete and continuous components of F(y). b Write F(y) as c1 F1 (y) + c2 F2 (y). c Find the expected value and variance of Y .

4.12 Summary This chapter presented probabilistic models for continuous random variables. The density function, which provides a model for a population frequency distribution associated with a continuous random variable, subsequently will yield a mechanism for inferring characteristics of the population based on measurements contained in a sample taken from that population. As a consequence, the density function provides a model for a real distribution of data that exist or could be generated by repeated experimentation. Similar distributions for small sets of data (samples from populations) were discussed in Chapter 1. Four speciﬁc types of density functions—uniform, normal, gamma (with the χ 2 and exponential as special cases), and beta—were presented, providing a wide assortment of models for population frequency distributions. For your convenience, Table 4.1 contains a summary of the R (or S-Plus) commands that provide probabilities and quantiles associated with these distributions. Many other density functions could be employed to ﬁt real situations, but the four described suit many situations adequately. A few other density functions are presented in the exercises at the end of the chapter. The adequacy of a density function in modeling the frequency distribution for a random variable depends upon the inference-making technique to be employed. If modest Table 4.1 R (and S -Plus) procedures giving probabilities and percentiles for some common continuous distributions

Distribution

P(Y ≤ y0 )

pth Quantile: φ p Such That P(Y ≤ φ p ) = p

Normal Exponential Gamma Beta

pnorm(y0 ,µ,σ ) pexp(y0 ,1/β) pgamma(y0 ,α,1/β) pbeta(y0 ,α,β)

qnorm(p,µ,σ ) qexp(p,1/β) qgamma(p,α,1/β) qbeta(p,α,β)

Supplementary Exercises

215

disagreement between the model and the real population frequency distribution does not affect the goodness of the inferential procedure, the model is adequate. The latter part of the chapter concerned expectations, particularly moments and moment-generating functions. It is important to focus attention on the reason for presenting these quantities and to avoid excessive concentration on the mathematical aspects of the material. Moments, particularly the mean and variance, are numerical descriptive measures for random variables. Particularly, we will subsequently see that it is sometimes difﬁcult to ﬁnd the probability distribution for a random variable Y or a function g(Y ), and we already have observed that integration over intervals for many density functions (the normal and gamma, for example) is very difﬁcult. When this occurs, we can approximately describe the behavior of the random variable by using its moments along with Tchebysheff’s theorem and the empirical rule (Chapter 1).

References and Further Readings Hogg, R. V., A. T. Craig, and J. W. McKean. 2005. Introduction to Mathematical Statistics, 6th ed. Upper Saddle River, N.J.: Pearson Prentice Hall. Johnson, N. L., S. Kotz, and N. Balakrishnan. 1995. Continuous Univariate Distributions, 2d ed. New York: Wiley. Parzen, E. 1992. Modern Probability Theory and Its Applications. New York: Wiley-Interscience. Pearson, K., ed. 1965. Tables of the Incomplete Gamma Function. London: Cambridge University Press. ———. 1968. Tables of the Incomplete Beta Function. London: Cambridge University Press. Perruzzi, J. J., and E. J. Hilliard. 1984. “Modeling Time-Delay Measurement Errors Using a Generalized Beta Density Function,” Journal of the Acoustical Society of America 75(1): 197–201. Tables of the Binomial Probability Distribution. 1950. Department of Commerce, National Bureau of Standards, Applied Mathematics Series 6. Zimmels, Y. 1983. “Theory of Kindered Sedimentation of Polydisperse Mixtures,” American Institute of Chemical Engineers Journal 29(4): 669–76. Zwilliger, D. 2002. CRC Standard Mathematical Tables, 31st ed. Boca Raton, Fla.: CRC Press.

Supplementary Exercises 4.160

Let the density function of a random variable Y be given by 2 , −1 ≤ y ≤ 1, f (y) = π(1 + y 2 ) 0, elsewhere. a Find the distribution function. b Find E(Y ).

216

Chapter 4

Continuous Variables and Their Probability Distributions

4.161

The length of time required to complete a college achievement test is found to be normally distributed with mean 70 minutes and standard deviation 12 minutes. When should the test be terminated if we wish to allow sufﬁcient time for 90% of the students to complete the test?

4.162

A manufacturing plant utilizes 3000 electric light bulbs whose length of life is normally distributed with mean 500 hours and standard deviation 50 hours. To minimize the number of bulbs that burn out during operating hours, all the bulbs are replaced after a given period of operation. How often should the bulbs be replaced if we want not more than 1% of the bulbs to burn out between replacement periods?

4.163

Refer to Exercise 4.66. Suppose that ﬁve bearings are randomly drawn from production. What is the probability that at least one is defective?

4.164

The length of life of oil-drilling bits depends upon the types of rock and soil that the drill encounters, but it is estimated that the mean length of life is 75 hours. An oil exploration company purchases drill bits whose length of life is approximately normally distributed with mean 75 hours and standard deviation 12 hours. What proportion of the company’s drill bits a will fail before 60 hours of use? b will last at least 60 hours? c will have to be replaced after more than 90 hours of use?

4.165

Let Y have density function $ f (y) =

cye−2y ,

0 ≤ y ≤ ∞,

0,

elsewhere.

a Find the value of c that makes f (y) a density function. b Give the mean and variance for Y . c Give the moment-generating function for Y .

4.166

Use the fact that ez = 1 + z +

z3 z4 z2 + + + ··· 2! 3! 4!

to expand the moment-generating function of Example 4.16 into a series to ﬁnd µ1 , µ2 , µ3 , and µ4 for the normal random variable.

4.167

Find an expression for µk = E(Y k ), where the random variable Y has a beta distribution.

4.168

The number of arrivals N at a supermarket checkout counter in the time interval from 0 to t follows a Poisson distribution with mean λt. Let T denote the length of time until the ﬁrst arrival. Find the density function for T . [Note: P(T > t0 ) = P(N = 0 at t = t0 ).]

4.169

An argument similar to that of Exercise 4.168 can be used to show that if events are occurring in time according to a Poisson distribution with mean λt, then the interarrival times between events have an exponential distribution with mean 1/λ. If calls come into a police emergency center at the rate of ten per hour, what is the probability that more than 15 minutes will elapse between the next two calls?

*4.170

Refer to Exercise 4.168. a If U is the time until the second arrival, show that U has a gamma density function with α = 2 and β = 1/λ. b Show that the time until the kth arrival has a gamma density with α = k and β = 1/λ.

Supplementary Exercises

4.171

217

Suppose that customers arrive at a checkout counter at a rate of two per minute. a b

What are the mean and variance of the waiting times between successive customer arrivals? If a clerk takes three minutes to serve the ﬁrst customer arriving at the counter, what is the probability that at least one more customer will be waiting when the service to the ﬁrst customer is completed?

4.172

Calls for dial-in connections to a computer center arrive at an average rate of four per minute. The calls follow a Poisson distribution. If a call arrives at the beginning of a one-minute interval, what is the probability that a second call will not arrive in the next 20 seconds?

4.173

Suppose that plants of a particular species are randomly dispersed over an area so that the number of plants in a given area follows a Poisson distribution with a mean density of λ plants per unit area. If a plant is randomly selected in this area, ﬁnd the probability density function of the distance to the nearest neighboring plant. [Hint: If R denotes the distance to the nearest neighbor, then P(R > r ) is the same as the probability of seeing no plants in a circle of radius r .]

4.174

The time (in hours) a manager takes to interview a job applicant has an exponential distribution with β = 1/2. The applicants are scheduled at quarter-hour intervals, beginning at 8:00 A.M., and the applicants arrive exactly on time. When the applicant with an 8:15 A.M. appointment arrives at the manager’s ofﬁce, what is the probability that he will have to wait before seeing the manager?

4.175

The median value y of a continuous random variable is that value such that F(y) = .5. Find the median value of the random variable in Exercise 4.11.

4.176

If Y has an exponential distribution with mean β, ﬁnd (as a function of β) the median of Y .

4.177

Applet Exercise Use the applet Gamma Probabilities and Quantiles to ﬁnd the medians of gamma distributed random variables with parameters a α = 1, β = 3. Compare your answer with that in Exercise 4.176. b α = 2, β = 2. Is the median larger or smaller than E(Y )? c α = 5, β = 10. Is the median larger or smaller than E(Y )? d In all of these cases, the median exceeds the mean. How is that reﬂected in the shapes of the corresponding densities?

4.178

Graph the beta probability density function for α = 3 and β = 2. a If Y has this beta density function, ﬁnd P(.1 ≤ Y ≤ .2) by using binomial probabilities to evaluate F(y). (See Section 4.7.) b Applet Exercise If Y has this beta density function, ﬁnd P(.1 ≤ Y ≤ .2), using the applet Beta Probabilities and Quantiles. c Applet Exercise If Y has this beta density function, use the applet Beta Probabilities and Quantiles to ﬁnd the .05 and .95-quantiles for Y . d What is the probability that Y falls between the two quantiles you found in part (c)?

*4.179

A retail grocer has a daily demand Y for a certain food sold by the pound, where Y (measured in hundreds of pounds) has a probability density function given by f (y) =

3y 2 ,

0 ≤ y ≤ 1,

0,

elsewhere.

218

Chapter 4

Continuous Variables and Their Probability Distributions

(She cannot stock over 100 pounds.) The grocer wants to order 100k pounds of food. She buys the food at 6¢ per pound and sells it at 10¢ per pound. What value of k will maximize her expected daily proﬁt?

4.180

Suppose that Y has a gamma distribution with α = 3 and β = 1. a Use Poisson probabilities to evaluate P(Y ≤ 4). (See Exercise 4.99.) b Applet Exercise Use the applet Gamma Probabilities and Quantiles to ﬁnd P(Y ≤ 4).

4.181

Suppose that Y is a normally distributed random variable with mean µ and variance σ 2 . Use the results of Example 4.16 to ﬁnd the moment-generating function, mean, and variance of Y −µ Z= . σ What is the distribution of Z ? Why?

*4.182

A random variable Y is said to have a log-normal distribution if X = ln(Y ) has a normal distribution. (The symbol ln denotes natural logarithm.) In this case Y must be nonnegative. The shape of the log-normal probability density function is similar to that of the gamma distribution, with a long tail to the right. The equation of the log-normal density function is given by 1 2 2 √ e−(ln(y)−µ) /(2σ ) , y > 0, f (y) = σ y 2π 0, elsewhere. Because ln(y) is a monotonic function of y, P(Y ≤ y) = P[ln(Y ) ≤ ln(y)] = P[X ≤ ln(y)], where X has a normal distribution with mean µ and variance σ 2 . Thus, probabilities for random variables with a log-normal distribution can be found by transforming them into probabilities that can be computed using the ordinary normal distribution. If Y has a log-normal distribution with µ = 4 and σ 2 = 1, ﬁnd a b

4.183

P(Y ≤ 4). P(Y > 8).

If Y has a log-normal distribution with parameters µ and σ 2 , it can be shown that E(Y ) = e(µ+σ

2 )/2

and

V (Y ) = e2µ+σ (eσ − 1). 2

2

The grains composing polycrystalline metals tend to have weights that follow a log-normal distribution. For a type of aluminum, gram weights have a log-normal distribution with µ = 3 and σ = 4 (in units of 10−2 g). a Find the mean and variance of the grain weights. b Find an interval in which at least 75% of the grain weights should lie. [Hint: Use Tchebysheff’s theorem.] c Find the probability that a randomly chosen grain weighs less than the mean grain weight.

4.184

Let Y denote a random variable with probability density function given by f (y) = (1/2)e−|y| ,

−∞ < y < ∞.

Find the moment-generating function of Y and use it to ﬁnd E(Y ).

*4.185

Let f 1 (y) and f 2 (y) be density functions and let a be a constant such that 0 ≤ a ≤ 1. Consider the function f (y) = a f 1 (y) + (1 − a) f 2 (y).

Supplementary Exercises

219

a Show that f (y) is a density function. Such a density function is often referred to as a mixture of two density functions. b Suppose that Y1 is a random variable with density function f 1 (y) and that E(Y1 ) = µ1 and Var(Y1 ) = σ12 ; and similarly suppose that Y2 is a random variable with density function f 2 (y) and that E(Y2 ) = µ2 and Var(Y2 ) = σ22 . Assume that Y is a random variable whose density is a mixture of the densities corresponding to Y1 and Y2 . Show that i E(Y ) = aµ1 + (1 − a)µ2 . ii Var(Y ) = aσ12 + (1 − a)σ22 + a(1 − a)[µ1 − µ2 ]2 . [Hint: E(Yi2 ) = µi2 + σi2 , i = 1, 2.]

*4.186

The random variable Y , with a density function given by my m−1 −y m /α e , 0 ≤ y < ∞, α, m > 0 α is said to have a Weibull distribution. The Weibull density function provides a good model for the distribution of length of life for many mechanical devices and biological plants and animals. Find the mean and variance for a Weibull distributed random variable with m = 2. f (y) =

*4.187

Refer to Exercise 4.186. Resistors used in the construction of an aircraft guidance system have life lengths that follow a Weibull distribution with m = 2 and α = 10 (with measurements in thousands of hours). a Find the probability that the life length of a randomly selected resistor of this type exceeds 5000 hours. b If three resistors of this type are operating independently, ﬁnd the probability that exactly one of the three will burn out prior to 5000 hours of use.

*4.188

Refer to Exercise 4.186. a

What is the usual name of the distribution of a random variable that has a Weibull distribution with m = 1? b Derive, in terms of the parameters α and m, the mean and variance of a Weibull distributed random variable.

*4.189

If n > 2 is an integer, the distribution with density given by 1 (1 − y 2 )(n−4)/2 , f (y) = B(1/2, [n − 2]/2) 0,

−1 ≤ y ≤ 1, elsewhere.

is called the r distribution. Derive the mean and variance of a random variable with the r distribution.

*4.190

A function sometimes associated with continuous nonnegative random variables is the failure rate (or hazard rate) function, which is deﬁned by f (t) r (t) = 1 − F(t) for a density function f (t) with corresponding distribution function F(t). If we think of the random variable in question as being the length of life of a component, r (t) is proportional to the probability of failure in a small interval after t, given that the component has survived up to time t. Show that, a for an exponential density function, r (t) is constant. b for a Weibull density function with m > 1, r (t) is an increasing function of t. (See Exercise 4.186.)

220

Chapter 4

Continuous Variables and Their Probability Distributions

*4.191

Suppose that Y is a continuous random variable with distribution function given by F(y) and probability density function f (y). We often are interested in conditional probabilities of the form P(Y ≤ y|Y ≥ c) for a constant c. a Show that, for y ≥ c, P(Y ≤ y|Y ≥ c) = b c

*4.192

F(y) − F(c) . 1 − F(c)

Show that the function in part (a) has all the properties of a distribution function. If the length of life Y for a battery has a Weibull distribution with m = 2 and α = 3 (with measurements in years), ﬁnd the probability that the battery will last less than four years, given that it is now two years old.

The velocities of gas particles can be modeled by the Maxwell distribution, whose probability density function is given by m 3/2 2 v 2 e−v (m/[2K T ]) , v > 0, f (v) = 4π 2π K T where m is the mass of the particle, K is Boltzmann’s constant, and T is the absolute temperature. a Find the mean velocity of these particles. b The kinetic energy of a particle is given by (1/2)mV 2 . Find the mean kinetic energy for a particle.

*4.193

Because F(y) − F(c) 1 − F(c) has the properties of a distribution function, its derivative will have the properties of a probability density function. This derivative is given by P(Y ≤ y|Y ≥ c) =

f (y) , y ≥ c. 1 − F(c) We can thus ﬁnd the expected value of Y , given that Y is greater than c, by using " ∞ 1 E(Y |Y ≥ c) = y f (y) dy. 1 − F(c) c If Y , the length of life of an electronic component, has an exponential distribution with mean 100 hours, ﬁnd the expected value of Y , given that this component already has been in use for 50 hours.

*4.194

We can show that the normal density function integrates to unity by showing that, if u > 0, " ∞ 1 1 2 e−(1/2)uy dy = √ . √ u 2π −∞ This, in turn, can be shown by considering the product of two such integrals: " ∞ " ∞ " ∞" ∞ 1 1 2 2 2 2 e−(1/2)uy dy e−(1/2)ux d x = e−(1/2)u(x +y ) d x d y. 2π 2π −∞ −∞ −∞ −∞ By transforming to polar coordinates, show that the preceding double integral is equal to 1/u.

*4.195

Let Z be a standard normal random variable and W = (Z 2 + 3Z )2 . a Use the moments of Z (see Exercise 4.199) to derive the mean of W . b Use the result given in Exercise 4.198 to ﬁnd a value of w such that P(W ≤ w) ≥ .90.

Supplementary Exercises

*4.196

Show that (1/2) =

√ π by writing

"

(1/2) =

∞

221

y −1/2 e−y dy

by making the transformation y = (1/2)x 2 and by employing the result of Exercise 4.194.

*4.197

The function B(α, β) is deﬁned by

"

1

B(α, β) =

y α−1 (1 − y)β−1 dy.

a

Letting y = sin2 θ , show that

"

B(α, β) = 2

π/2

sin2α−1 θ cos2β−1 θ dθ.

b

Write (α)(β) as a double integral, transform to polar coordinates, and conclude that B(α, β) =

*4.198

(α)(β) . (α + β)

The Markov Inequality Let g(Y ) be a function of the continuous random variable Y , with E(|g(Y )|) < ∞. Show that, for every positive constant k, P(|g(Y )| ≤ k) ≥ 1 −

E(|g(Y )|) . k

[Note: This inequality also holds for discrete random variables, with an obvious adaptation in the proof.]

*4.199

Let Z be a standard normal random variable. a Show that the expected values of all odd integer powers of Z are 0. That is, if i = 1, 2, . . . , g(·) is an odd function if, for all y, g(−y) = show that E(Z 2i−1 ) = 0. [Hint: A function #∞ −g(y). For any odd function g(y), −∞ g(y) dy = 0, if the integral exists.] b If i = 1, 2, . . . , show that 2i i + 12 2i E(Z ) = . √ π [Hint: #A function h(·) is #an even function if, for all y, h(−y) = h(y). For any even function ∞ ∞ h(y), −∞ h(y) dy = 2 0 h(y) dy, if the integrals exist. Use this fact, make the change of variable w = z 2 /2, and use what you know about the gamma function.] c Use the results in part (b) and in Exercises 4.81(b) and 4.194 to derive E(Z 2 ), E(Z 4 ), E(Z 6 ), and E(Z 8 ). d If i = 1, 2, . . . , show that E(Z 2i ) =

i ' (2 j − 1). j=1

This implies that the ith even moment is the product of the ﬁrst i odd integers.

4.200

Suppose that Y has a beta distribution with parameters α and β. a If a is any positive or negative value such that α + a > 0, show that E(Y a ) =

(α + a)(α + β) . (α)(α + β + a)

222

Chapter 4

Continuous Variables and Their Probability Distributions

b Why did your answer in part (a) require that α + a > 0? c Show that, with a = 1, the result in part (a) gives E(Y ) = α/(α + β). √ d Use the result in part (a) to give an expression for E( Y ). What do you need to assume about α? √ e Use the result in part (a) to give an expression for E(1/Y ), E(1/ Y ), and E(1/Y 2 ). What do you need to assume about α in each case?

CHAPTER

5

Multivariate Probability Distributions 5.1

Introduction

5.2

Bivariate and Multivariate Probability Distributions

5.3

Marginal and Conditional Probability Distributions

5.4

Independent Random Variables

5.5

The Expected Value of a Function of Random Variables

5.6

Special Theorems

5.7

The Covariance of Two Random Variables

5.8

The Expected Value and Variance of Linear Functions of Random Variables

5.9

The Multinomial Probability Distribution

5.10 The Bivariate Normal Distribution (Optional) 5.11 Conditional Expectations 5.12 Summary References and Further Readings

5.1 Introduction The intersection of two or more events is frequently of interest to an experimenter. For example, a gambler playing blackjack is interested in the event of drawing both an ace and a face card from a 52-card deck. A biologist, observing the number of animals surviving in a litter, is concerned about the intersection of these events: A: The litter contains n animals. B: y animals survive. Similarly, observing both the height and the weight of an individual represents the intersection of a speciﬁc pair of events associated with height–weight measurements. 223

224

Chapter 5

Multivariate Probability Distributions

Most important to statisticians are intersections that occur in the course of sampling. Suppose that Y1 , Y2 , . . . , Yn denote the outcomes of n successive trials of an experiment. For example, this sequence could represent the weights of n people or the measurements of n physical characteristics for a single person. A speciﬁc set of outcomes, or sample measurements, may be expressed in terms of the intersection of the n events (Y1 = y1 ), (Y2 = y2 ), . . . , (Yn = yn ), which we will denote as (Y1 = y1 , Y2 = y2 , . . . , Yn = yn ), or, more compactly, as (y1 , y2 , . . . , yn ). Calculation of the probability of this intersection is essential in making inferences about the population from which the sample was drawn and is a major reason for studying multivariate probability distributions.

5.2 Bivariate and Multivariate Probability Distributions Many random variables can be deﬁned over the same sample space. For example, consider the experiment of tossing a pair of dice. The sample space contains 36 sample points, corresponding to the mn = (6)(6) = 36 ways in which numbers may appear on the faces of the dice. Any one of the following random variables could be deﬁned over the sample space and might be of interest to the experimenter: Y1 : Y2 : Y3 : Y4 :

The number of dots appearing on die 1. The number of dots appearing on die 2. The sum of the number of dots on the dice. The product of the number of dots appearing on the dice.

The 36 sample points associated with the experiment are equiprobable and correspond to the 36 numerical events (y1 , y2 ). Thus, throwing a pair of 1s is the simple event (1, 1). Throwing a 2 on die 1 and a 3 on die 2 is the simple event (2, 3). Because all pairs (y1 , y2 ) occur with the same relative frequency, we assign probability 1/36 to each sample point. For this simple example, the intersection (y1 , y2 ) contains at most one sample point. Hence, the bivariate probability function is p(y1 , y2 ) = P(Y1 = y1 , Y2 = y2 ) = 1/36,

y1 = 1, 2, . . . , 6, y2 = 1, 2, . . . , 6.

A graph of the bivariate probability function for the die-tossing experiment is shown in Figure 5.1. Notice that a nonzero probability is assigned to a point (y1 , y2 ) in the plane if and only if y1 = 1, 2, . . . , 6 and y2 = 1, 2, . . . , 6. Thus, exactly 36 points in the plane are assigned nonzero probabilities. Further, the probabilities are assigned in such a way that the sum of the nonzero probabilities is equal to 1. In Figure 5.1 the points assigned nonzero probabilities are represented in the (y1 , y2 ) plane, whereas the probabilities associated with these points are given by the lengths of the lines above them. Figure 5.1 may be viewed as a theoretical, three-dimensional relative frequency histogram for the pairs of observations (y1 , y2 ). As in the singlevariable discrete case, the theoretical histogram provides a model for the sample histogram that would be obtained if the die-tossing experiment were repeated a large number of times.

5.2

F I G U R E 5.1 Bivariate probability function; y1 = number of dots on die 1, y2 = number of dots on die 2

Bivariate and Multivariate Probability Distributions

225

p ( y1, y2 )

1兾36 0

1

2

1 2

3

4

5

6 y1

3 4 5 6

y2

DEFINITION 5.1

Let Y1 and Y2 be discrete random variables. The joint (or bivariate) probability function for Y1 and Y2 is given by p(y1 , y2 ) = P(Y1 = y1 , Y2 = y2 ),

−∞ < y1 < ∞, −∞ < y2 < ∞.

In the single-variable case discussed in Chapter 3, we saw that the probability function for a discrete random variable Y assigns nonzero probabilities to a ﬁnite or countable number of distinct values of Y in such a way that the sum of the probabilities is equal to 1. Similarly, in the bivariate case the joint probability function p(y1 , y2 ) assigns nonzero probabilities to only a ﬁnite or countable number of pairs of values (y1 , y2 ). Further, the nonzero probabilities must sum to 1. THEOREM 5.1

If Y1 and Y2 are discrete random variables with joint probability function p(y1 , y2 ), then 1. p(y1 , y2 ) ≥ 0 for all y1 , y2 . 2. y1 ,y2 p(y1 , y2 ) = 1, where the sum is over all values (y1 , y2 ) that are assigned nonzero probabilities. As in the univariate discrete case, the joint probability function for discrete random variables is sometimes called the joint probability mass function because it speciﬁes the probability (mass) associated with each of the possible pairs of values for the random variables. Once the joint probability function has been determined for discrete random variables Y1 and Y2 , calculating joint probabilities involving Y1 and Y2 is

226

Chapter 5

Multivariate Probability Distributions

straightforward. For the die-tossing experiment, P(2 ≤ Y1 ≤ 3, 1 ≤ Y2 ≤ 2) is P(2 ≤ Y1 ≤ 3, 1 ≤ Y2 ≤ 2) = p(2, 1) + p(2, 2) + p(3, 1) + p(3, 2) = 4/36 = 1/9.

E X A M PL E 5.1

A local supermarket has three checkout counters. Two customers arrive at the counters at different times when the counters are serving no other customers. Each customer chooses a counter at random, independently of the other. Let Y1 denote the number of customers who choose counter 1 and Y2 , the number who select counter 2. Find the joint probability function of Y1 and Y2 .

Solution

We might proceed with the derivation in many ways. The most direct is to consider the sample space associated with the experiment. Let the pair {i, j} denote the simple event that the ﬁrst customer chose counter i and the second customer chose counter j, where i, j = 1, 2, and 3. Using the mn rule, the sample space consists of 3 × 3 = 9 sample points. Under the assumptions given earlier, each sample point is equally likely and has probability 1/9. The sample space associated with the experiment is S = [{1, 1}, {1, 2}, {1, 3}, {2, 1}, {2, 2}, {2, 3}, {3, 1}, {3, 2}, {3, 3}]. Notice that sample point {1, 1} is the only sample point corresponding to (Y1 = 2, Y2 = 0) and hence P(Y1 = 2, Y2 = 0) = 1/9. Similarly, P(Y1 = 1, Y2 = 1) = P({1, 2} or {2, 1}) = 2/9. Table 5.1 contains the probabilities associated with each possible pair of values for Y1 and Y2 —that is, the joint probability function for Y1 and Y2 . As always, the results of Theorem 5.1 hold for this example. Table 5.1 Probability function for Y1 and Y2 , Example 5.1

y1 y2

1

2

0 1 2

1/9 2/9 1/9

2/9 2/9 0

1/9 0 0

As in the case of univariate random variables, the distinction between jointly discrete and jointly continuous random variables may be characterized in terms of their ( joint) distribution functions.

DEFINITION 5.2

For any random variables Y1 and Y2 , the joint (bivariate) distribution function F(y1 , y2 ) is F(y1 , y2 ) = P(Y1 ≤ y1 , Y2 ≤ y2 ),

−∞ < y1 < ∞, −∞ < y2 < ∞.

5.2

Bivariate and Multivariate Probability Distributions

227

For two discrete variables Y1 and Y2 , F(y1 , y2 ) is given by p(t1 , t2 ). F(y1 , y2 ) = t1 ≤y1 t2 ≤y2

For the die-tossing experiment, F(2, 3) = P(Y1 ≤ 2, Y2 ≤ 3) = p(1, 1) + p(1, 2) + p(1, 3) + p(2, 1) + p(2, 2) + p(2, 3). Because p(y1 , y2 ) = 1/36 for all pairs of values of y1 and y2 under consideration, F(2, 3) = 6/36 = 1/6. E X A M PL E 5.2 Solution

Consider the random variables Y1 and Y2 of Example 5.1. Find F(−1, 2), F(1.5, 2), and F(5, 7). Using the results in Table 5.1, we see that F(−1, 2) = P(Y1 ≤ −1, Y2 ≤ 2) = P(∅) = 0. Further, F(1.5, 2) = P(Y1 ≤ 1.5, Y2 ≤ 2) = p(0, 0) + p(0, 1) + p(0, 2) + p(1, 0) + p(1, 1) + p(1, 2) = 8/9. Similarly, F(5, 7) = P(Y1 ≤ 5, Y2 ≤ 7) = 1. Notice that F(y1 , y2 ) = 1 for all y1 , y2 such that min{y1 , y2 } ≥ 2. Also, F(y1 , y2 ) = 0 if min{y1 , y2 ) < 0.

Two random variables are said to be jointly continuous if their joint distribution function F(y1 , y2 ) is continuous in both arguments. DEFINITION 5.3

Let Y1 and Y2 be continuous random variables with joint distribution function F(y1 , y2 ). If there exists a nonnegative function f (y1 , y2 ), such that " y1 " y2 f (t1 , t2 ) dt2 dt1 , F(y1 , y2 ) = −∞

−∞

for all −∞ < y1 < ∞, −∞ < y2 < ∞, then Y1 and Y2 are said to be jointly continuous random variables. The function f (y1 , y2 ) is called the joint probability density function. Bivariate cumulative distribution functions satisfy a set of properties similar to those speciﬁed for univariate cumulative distribution functions.

228

Chapter 5

Multivariate Probability Distributions

THEOREM 5.2

If Y1 and Y2 are random variables with joint distribution function F(y1 , y2 ), then 1. F(−∞, −∞) = F(−∞, y2 ) = F(y1 , −∞) = 0. 2. F(∞, ∞) = 1. 3. If y1∗ ≥ y1 and y2∗ ≥ y2 , then F(y1∗ , y2∗ ) − F(y1∗ , y2 ) − F(y1 , y2∗ ) + F(y1 , y2 ) ≥ 0. Part 3 follows because F(y1∗ , y2∗ ) − F(y1∗ , y2 ) − F(y1 , y2∗ ) + F(y1 , y2 ) = P(y1 < Y1 ≤ y1∗ , y2 < Y2 ≤ y2∗ ) ≥ 0. Notice that F(∞, ∞) ≡ lim y1 →∞ lim y2 →∞ F(y1 , y2 ) = 1 implies that the joint density function f (y1 , y2 ) must be such that the integral of f (y1 , y2 ) over all values of (y1 , y2 ) is 1.

THEOREM 5.2

If Y1 and Y2 are jointly continuous random variables with a joint density function given by f (y1 , y2 ), then 1. #f (y1#, y2 ) ≥ 0 for all y1 , y2 . ∞ ∞ 2. −∞ −∞ f (y1 , y2 ) dy1 dy2 = 1. As in the univariate continuous case discussed in Chapter 4, the joint density function may be intuitively interpreted as a model for the joint relative frequency histogram for Y1 and Y2 . For the univariate continuous case, areas under the probability density over an interval correspond to probabilities. Similarly, the bivariate probability density function f (y1 , y2 ) traces a probability density surface over the (y1 , y2 ) plane (Figure 5.2).

F I G U R E 5.2 A bivariate density function f (y1 , y2 )

f ( y1, y2 )

0 b1 b2

y2

a1

a2

y1

5.2

Bivariate and Multivariate Probability Distributions

229

Volumes under this surface correspond to probabilities. Thus, P(a1 ≤ Y1 ≤ a2 , b1 ≤ Y2 ≤ b2 ) is the shaded volume shown in Figure 5.2 and is equal to " b2 " a 2 f (y1 , y2 ) dy1 dy2 . b1

E X A M PL E 5.3

a1

Suppose that a radioactive particle is randomly located in a square with sides of unit length. That is, if two regions within the unit square and of equal area are considered, the particle is equally likely to be in either region. Let Y1 and Y2 denote the coordinates of the particle’s location. A reasonable model for the relative frequency histogram for Y1 and Y2 is the bivariate analogue of the univariate uniform density function: $ 1, 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. a Sketch the probability density surface. b Find F(.2, .4). c Find P(.1 ≤ Y1 ≤ .3, 0 ≤ Y2 ≤ .5).

Solution

a The sketch is shown in Figure 5.3. " .4 " b F(.2, .4) = "

−∞

−∞

.4

.2

= "

.2

(1) dy1 dy2

.4

=

"

f (y1 , y2 ) dy1 dy2

" .2 y1 dy2 = 0

.4

.2 dy2 = .08.

The probability F(.2, .4) corresponds to the volume under f (y1 , y2 ) = 1, which is shaded in Figure 5.3. As geometric considerations indicate, the desired probability (volume) is equal to .08, which we obtained through integration at the beginning of this part. F I G U R E 5.3 Geometric representation of f (y1 , y2 ), Example 5.3

f ( y1, y2 )

1 F(.2, .4)

0 .4 1

y2

.2 1

y1

230

Chapter 5

Multivariate Probability Distributions

" P(.1 ≤ Y1 ≤ .3, 0 ≤ Y2 ≤ .5) =

c

.5

" =

.5

" "

.3 .1 .3 .1

f (y1 , y2 ) dy1 dy2 1 dy1 dy2 = .10.

This probability corresponds to the volume under the density function f (y1 , y2 ) = 1 that is above the region .1 ≤ y1 ≤ .3, 0 ≤ y2 ≤ .5. Like the solution in part (b), the current solution can be obtained by using elementary geometric concepts. The density or height of the surface is equal to 1, and hence the desired probability (volume) is P(.1 ≤ Y1 ≤ .3, 0 ≤ Y2 ≤ .5) = (.2)(.5)(1) = .10.

A slightly more complicated bivariate model is illustrated in the following example. E X A M PL E 5.4

Gasoline is to be stocked in a bulk tank once at the beginning of each week and then sold to individual customers. Let Y1 denote the proportion of the capacity of the bulk tank that is available after the tank is stocked at the beginning of the week. Because of the limited supplies, Y1 varies from week to week. Let Y2 denote the proportion of the capacity of the bulk tank that is sold during the week. Because Y1 and Y2 are both proportions, both variables take on values between 0 and 1. Further, the amount sold, y2 , cannot exceed the amount available, y1 . Suppose that the joint density function for Y1 and Y2 is given by $ 3y1 , 0 ≤ y2 ≤ y1 ≤ 1, f (y1 , y2 ) = 0, elsewhere. A sketch of this function is given in Figure 5.4. Find the probability that less than one-half of the tank will be stocked and more than one-quarter of the tank will be sold.

Solution

We want to ﬁnd P(0 ≤ Y1 ≤ .5, Y2 > .25). For any continuous random variable, the probability of observing a value in a region is the volume under the density function above the region of interest. The density function f (y1 , y2 ) is positive only in the

F I G U R E 5.4 The joint density function for Example 5.4

f ( y1, y2 )

3

1 y1

1

y2

5.2

F I G U R E 5.5 Region of integration for Example 5.4

Bivariate and Multivariate Probability Distributions

231

y2 1

1/2 1/4

1兾2

y1

1

large triangular portion of the (y1 , y2 ) plane shown in Figure 5.5. We are interested only in values of y1 and y2 such that 0 ≤ y1 ≤ .5 and y2 > .25. The intersection of this region and the region where the density function is positive is given by the small (shaded) triangle in Figure 5.5. Consequently, the probability we desire is the volume under the density function of Figure 5.4 above the shaded region in the (y1 , y2 ) plane shown in Figure 5.5. Thus, we have " 1/2 " y1 P(0 ≤ Y1 ≤ .5, .25 ≤ Y2 ) = 3y1 dy2 dy1 " = " =

1/4 1/2

1/4

y1 3y1 y2 dy1 1/4

1/4 1/2

3y1 (y1 − 1/4) dy1

1/4

1/2 = y13 − (3/8)y12 1/4

= [(1/8) − (3/8)(1/4)] − [(1/64) − (3/8)(1/16)] = 5/128.

Calculating the probability speciﬁed in Example 5.4 involved integrating the joint density function for Y1 and Y2 over the appropriate region. The speciﬁcation of the limits of integration was made easier by sketching the region of integration in Figure 5.5. This approach, sketching the appropriate region of integration, often facilitates setting up the appropriate integral. The methods discussed in this section can be used to calculate the probability of the intersection of two events (Y1 = y1 , Y2 = y2 ). In a like manner, we can deﬁne a probability function (or probability density function) for the intersection of n events (Y1 = y1 , Y2 = y2 , . . . , Yn = yn ). The joint probability function corresponding to the discrete case is given by p(y1 , y2 , . . . , yn ) = P(Y1 = y1 , Y2 = y2 , . . . , Yn = yn ). The joint density function of Y1 , Y2 , . . . , Yn is given by f (y1 , y2 , . . . , yn ). As in the bivariate case, these functions provide models for the joint relative frequency

232

Chapter 5

Multivariate Probability Distributions

distributions of the populations of joint observations (y1 , y2 , . . . , yn ) for the discrete case and the continuous case, respectively. In the continuous case, P(Y1 ≤ y1 , Y2 ≤ y2 , . . . , Yn ≤ yn ) = F(y1 , . . . , yn ) " y1 " y2 " yn ··· f (t1 , t2 , . . . , tn )dtn . . . dt1 = −∞

−∞

−∞

for every set of real numbers (y1 , y2 , . . . , yn ). Multivariate distribution functions deﬁned by this equality satisfy properties similar to those speciﬁed for the bivariate case.

Exercises 5.1

Contracts for two construction jobs are randomly assigned to one or more of three ﬁrms, A, B, and C. Let Y1 denote the number of contracts assigned to ﬁrm A and Y2 the number of contracts assigned to ﬁrm B. Recall that each ﬁrm can receive 0, 1, or 2 contracts. a Find the joint probability function for Y1 and Y2 . b Find F(1, 0).

5.2

Three balanced coins are tossed independently. One of the variables of interest is Y1 , the number of heads. Let Y2 denote the amount of money won on a side bet in the following manner. If the ﬁrst head occurs on the ﬁrst toss, you win $1. If the ﬁrst head occurs on toss 2 or on toss 3 you win $2 or $3, respectively. If no heads appear, you lose $1 (that is, win −$1). a Find the joint probability function for Y1 and Y2 . b What is the probability that fewer than three heads will occur and you will win $1 or less? [That is, ﬁnd F(2, 1).]

5.3

Of nine executives in a business ﬁrm, four are married, three have never married, and two are divorced. Three of the executives are to be selected for promotion. Let Y1 denote the number of married executives and Y2 denote the number of never-married executives among the three selected for promotion. Assuming that the three are randomly selected from the nine available, ﬁnd the joint probability function of Y1 and Y2 .

5.4

Given here is the joint probability function associated with data obtained in a study of automobile accidents in which a child (under age 5 years) was in the car and at least one fatality occurred. Speciﬁcally, the study focused on whether or not the child survived and what type of seatbelt (if any) he or she used. Deﬁne 0, if no belt used, 0, if the child survived, Y1 = and Y2 = 1, if adult belt used, 1, if not, 2, if car-seat belt used. Notice that Y1 is the number of fatalities per child and, since children’s car seats usually utilize two belts, Y2 is the number of seatbelts in use at the time of the accident. y1 y2

1

Total

0 1 2

.38 .14 .24

.17 .02 .05

.55 .16 .29

Total

.76

.24

1.00

Exercises

233

a Verify that the preceding probability function satisﬁes Theorem 5.1. b Find F(1, 2). What is the interpretation of this value?

5.5

Refer to Example 5.4. The joint density of Y1 , the proportion of the capacity of the tank that is stocked at the beginning of the week, and Y2 , the proportion of the capacity sold during the week, is given by 3y1 , 0 ≤ y2 ≤ y1 ≤ 1, f (y1 , y2 ) = 0, elsewhere. a Find F(1/2, 1/3) = P(Y1 ≤ 1/2, Y2 ≤ 1/3). b Find P(Y2 ≤ Y1 /2), the probability that the amount sold is less than half the amount purchased.

5.6

Refer to Example 5.3. If a radioactive particle is randomly located in a square of unit length, a reasonable model for the joint density function for Y1 and Y2 is 1, 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. a What is P(Y1 − Y2 > .5)? b What is P(Y1 Y2 < .5)?

5.7

Let Y1 and Y2 have joint density function −(y +y ) e 1 2 , y1 > 0, y2 > 0, f (y1 , y2 ) = 0, elsewhere. a What is P(Y1 < 1, Y2 > 5)? b What is P(Y1 + Y2 < 3)?

5.8

Let Y1 and Y2 have the joint probability density function given by ky1 y2 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. a Find the value of k that makes this a probability density function. b Find the joint distribution function for Y1 and Y2 . c Find P(Y1 ≤ 1/2, Y2 ≤ 3/4).

5.9

Let Y1 and Y2 have the joint probability density function given by k(1 − y2 ), 0 ≤ y1 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. a Find the value of k that makes this a probability density function. b Find P(Y1 ≤ 3/4, Y2 ≥ 1/2).

5.10

An environmental engineer measures the amount (by weight) of particulate pollution in air samples of a certain volume collected over two smokestacks at a coal-operated power plant. One of the stacks is equipped with a cleaning device. Let Y1 denote the amount of pollutant per sample collected above the stack that has no cleaning device and let Y2 denote the amount of pollutant per sample collected above the stack that is equipped with the cleaning device.

234

Chapter 5

Multivariate Probability Distributions

Suppose that the relative frequency behavior of Y1 and Y2 can be modeled by $ k, 0 ≤ y1 ≤ 2, 0 ≤ y2 ≤ 1, 2y2 ≤ y1 f (y1 , y2 ) = 0, elsew her e. That is, Y1 and Y2 are uniformly distributed over the region inside the triangle bounded by y1 = 2, y2 = 0, and 2y2 = y1 . a Find the value of k that makes this function a probability density function. b Find P(Y1 ≥ 3Y2 ). That is, ﬁnd the probability that the cleaning device reduces the amount of pollutant by one-third or more.

5.11

Suppose that Y1 and Y2 are uniformly distributed over the triangle shaded in the accompanying diagram. y2 (0, 1)

(–1, 0)

(1, 0)

y1

a Find P(Y1 ≤ 3/4, Y2 ≤ 3/4). b Find P(Y1 − Y2 ≥ 0).

5.12

Let Y1 and Y2 denote the proportions of two different types of components in a sample from a mixture of chemicals used as an insecticide. Suppose that Y1 and Y2 have the joint density function given by $ 2, 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, 0 ≤ y1 + y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. (Notice that Y1 + Y2 ≤ 1 because the random variables denote proportions within the same sample.) Find a b

5.13

P(Y1 ≤ 3/4, Y2 ≤ 3/4). P(Y1 ≤ 1/2, Y2 ≤ 1/2).

The joint density function of Y1 and Y2 is given by $ 30y1 y22 , y1 − 1 ≤ y2 ≤ 1 − y1 , 0 ≤ y1 ≤ 1, f (y1 , y2 ) = 0, elsewhere. a Find F(1/2, 1/2). b Find F(1/2, 2). c Find P(Y1 > Y2 ).

5.14

Suppose that the random variables Y1 and Y2 have joint probability density function f (y1 , y2 ) given by $ 2 6y1 y2 , 0 ≤ y1 ≤ y2 , y1 + y2 ≤ 2, f (y1 , y2 ) = 0, elsewhere. a Verify that this is a valid joint density function. b What is the probability that Y1 + Y2 is less than 1?

5.3

5.15

Marginal and Conditional Probability Distributions

235

The management at a fast-food outlet is interested in the joint behavior of the random variables Y1 , deﬁned as the total time between a customer’s arrival at the store and departure from the service window, and Y2 , the time a customer waits in line before reaching the service window. Because Y1 includes the time a customer waits in line, we must have Y1 ≥ Y2 . The relative frequency distribution of observed values of Y1 and Y2 can be modeled by the probability density function e−y1 , 0 ≤ y2 ≤ y1 < ∞, f (y1 , y2 ) = 0, elsewhere with time measured in minutes. Find a b c

5.16

P(Y1 < 2, Y2 > 1). P(Y1 ≥ 2Y2 ). P(Y1 − Y2 ≥ 1). (Notice that Y1 − Y2 denotes the time spent at the service window.)

Let Y1 and Y2 denote the proportions of time (out of one workday) during which employees I and II, respectively, perform their assigned tasks. The joint relative frequency behavior of Y1 and Y2 is modeled by the density function y1 + y2 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. a Find P(Y1 < 1/2, Y2 > 1/4). b Find P(Y1 + Y2 ≤ 1).

5.17

Let (Y1 , Y2 ) denote the coordinates of a point chosen at random inside a unit circle whose center is at the origin. That is, Y1 and Y2 have a joint density function given by 1 , y 2 + y 2 ≤ 1, 1 2 f (y1 , y2 ) = π 0, elsewhere. Find P(Y1 ≤ Y2 ).

5.18

An electronic system has one each of two different types of components in joint operation. Let Y1 and Y2 denote the random lengths of life of the components of type I and type II, respectively. The joint density function is given by (1/8)y1 e−(y1 +y2 )/2 , y1 > 0, y2 > 0, f (y1 , y2 ) = 0, elsewhere. (Measurements are in hundreds of hours.) Find P(Y1 > 1, Y2 > 1).

5.3 Marginal and Conditional Probability Distributions Recall that the distinct values assumed by a discrete random variable represent mutually exclusive events. Similarly, for all distinct pairs of values y1 , y2 , the bivariate events (Y1 = y1 , Y2 = y2 ), represented by (y1 , y2 ), are mutually exclusive events. It follows that the univariate event (Y1 = y1 ) is the union of bivariate events of the type (Y1 = y1 , Y2 = y2 ), with the union being taken over all possible values for y2 .

236

Chapter 5

Multivariate Probability Distributions

For example, reconsider the die-tossing experiment of Section 5.2, where Y1 = number of dots on the upper face of die 1, Y2 = number of dots on the upper face of die 2. Then P(Y1 = 1) = p(1, 1) + p(1, 2) + p(1, 3) + · · · + p(1, 6) = 1/36 + 1/36 + 1/36 + · · · + 1/36 = 6/36 = 1/6 P(Y1 = 2) = p(2, 1) + p(2, 2) + p(2, 3) + · · · + p(2, 6) = 1/6 . . . P(Y1 = 6) = p(6, 1) + p(6, 2) + p(6, 3) + · · · + p(6, 6) = 1/6. Expressed in summation notation, probabilities about the variable Y1 alone are P(Y1 = y1 ) = p1 (y1 ) =

6

p(y1 , y2 ).

y2 =1

Similarly, probabilities corresponding to values of the variable Y2 alone are given by p2 (y2 ) = P(Y2 = y2 ) =

6

p(y1 , y2 ).

y1 =1

Summation in the discrete case corresponds to integration in the continuous case, which leads us to the following deﬁnition. DEFINITION 5.4

a Let Y1 and Y2 be jointly discrete random variables with probability function p(y1 , y2 ). Then the marginal probability functions of Y1 and Y2 , respectively, are given by p1 (y1 ) = p(y1 , y2 ) and p2 (y2 ) = p(y1 , y2 ). all y2

all y1

b Let Y1 and Y2 be jointly continuous random variables with joint density function f (y1 , y2 ). Then the marginal density functions of Y1 and Y2 , respectively, are given by " ∞ " ∞ f (y1 , y2 ) dy2 and f 2 (y2 ) = f (y1 , y2 ) dy1 . f 1 (y1 ) = −∞

−∞

The term marginal, as applied to the univariate probability functions of Y1 and Y2 , has intuitive meaning. To ﬁnd p1 (y1 ), we sum p(y1 , y2 ) over all values of y2 and hence accumulate the probabilities on the y1 axis (or margin). The discrete and continuous cases are illustrated in the following two examples.

5.3

Marginal and Conditional Probability Distributions

237

E X A M PL E 5.5

From a group of three Republicans, two Democrats, and one independent, a committee of two people is to be randomly selected. Let Y1 denote the number of Republicans and Y2 denote the number of Democrats on the committee. Find the joint probability function of Y1 and Y2 and then ﬁnd the marginal probability function of Y1 .

Solution

The probabilities sought here are similar to the hypergeometric probabilities of Chapter 3. For example, 3 2 1 1 1 0 6 3(2) P(Y1 = 1, Y2 = 1) = p(1, 1) = = = 15 15 6 2 because there are 15 equally likely sample points; for the event in question we must select one Republican from the three, one Democrat from the two, and zero independents. Similar calculations lead to the other probabilities shown in Table 5.2. To ﬁnd p1 (y1 ), we must sum over the values of Y2 , as Deﬁnition 5.4 indicates. Hence, these probabilities are given by the column totals in Table 5.2. That is, p1 (0) = p(0, 0) + p(0, 1) + p(0, 2) = 0 + 2/15 + 1/15 = 3/15. Similarly, p1 (1) = 9/15 and p1 (2) = 3/15. Analogously, the marginal probability function of Y2 is given by the row totals. Table 5.2 Joint probability function for Y1 and Y2 , Example 5.5

y1

E X A M PL E 5.6

y2

1

2

Total

0 1 2

0 2/15 1/15

3/15 6/15 0

3/15 0 0

6/15 8/15 1/15

Total

3/15

9/15

3/15

1

Let

$

2y1 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, 0, elsewhere. Sketch f (y1 , y2 ) and ﬁnd the marginal density functions for Y1 and Y2 . f (y1 , y2 ) =

Solution

Viewed geometrically, f (y1 , y2 ) traces a wedge-shaped surface, as sketched in Figure 5.6. Before applying Deﬁnition 5.4 to ﬁnd f 1 (y1 ) and f 2 (y2 ), we will use Figure 5.6 to visualize the result. If the probability represented by the wedge were accumulated on the y1 axis (accumulating probability along lines parallel to the y2 axis), the result

238

Chapter 5

Multivariate Probability Distributions

F I G U R E 5.6 Geometric representation of f (y1 , y2 ), Example 5.6

f ( y1, y2 ) 2

1 1 0

y1

1 y2

would be a triangular probability density that would look like the side of the wedge in Figure 5.6. If the probability were accumulated along the y2 axis (accumulating along lines parallel to the y1 axis), the resulting density would be uniform. We will conﬁrm these visual solutions by applying Deﬁnition 5.4. Then, if 0 ≤ y1 ≤ 1, " ∞ " 1 1 f 1 (y1 ) = f (y1 , y2 ) dy2 = 2y1 dy2 = 2y1 y2 and if y1 < 0 or y1 > 1,

−∞

"

f 1 (y1 ) =

∞

−∞

" f (y1 , y2 ) dy2 = $

f 1 (y1 ) =

and if y2 < 0 or y2 > 1,

∞

−∞

"

f 2 (y2 ) =

1

0 ∞

−∞

0 dy2 = 0.

2y1 , 0 ≤ y1 ≤ 1, 0, elsewhere.

f (y1 , y2 ) dy1 = "

1

Thus,

Similarly, if 0 ≤ y2 ≤ 1, " f 2 (y2 ) =

1 2y1 dy1 = y12 "

f (y1 , y2 ) dy1 =

Summarizing,

=1 0

1

0 dy1 = 0.

$

1, 0 ≤ y2 ≤ 1, 0, elsewhere. Graphs of f 1 (y1 ) and f 2 (y2 ) trace triangular and uniform probability densities, respectively, as expected. f 2 (y2 ) =

We now turn our attention to conditional distributions, looking ﬁrst at the discrete case. The multiplicative law (Section 2.8) gives the probability of the intersection A ∩ B as P(A ∩ B) = P(A)P(B|A),

5.3

Marginal and Conditional Probability Distributions

239

where P(A) is the unconditional probability of A and P(B|A) is the probability of B given that A has occurred. Now consider the intersection of the two numerical events, (Y1 = y1 ) and (Y2 = y2 ), represented by the bivariate event (y1 , y2 ). It follows directly from the multiplicative law of probability that the bivariate probability for the intersection (y1 , y2 ) is p(y1 , y2 ) = p1 (y1 ) p(y2 |y1 ) = p2 (y2 ) p(y1 |y2 ). The probabilities p1 (y1 ) and p2 (y2 ) are associated with the univariate probability distributions for Y1 and Y2 individually (recall Chapter 3). Using the interpretation of conditional probability discussed in Chapter 2, p(y1 |y2 ) is the probability that the random variable Y1 equals y1 , given that Y2 takes on the value y2 . DEFINITION 5.5

If Y1 and Y2 are jointly discrete random variables with joint probability function p(y1 , y2 ) and marginal probability functions p1 (y1 ) and p2 (y2 ), respectively, then the conditional discrete probability function of Y1 given Y2 is p(y1 |y2 ) = P(Y1 = y1 |Y2 = y2 ) =

p(y1 , y2 ) P(Y1 = y1 , Y2 = y2 ) = , P(Y2 = y2 ) p2 (y2 )

provided that p2 (y2 ) > 0. Thus, P(Y1 = 2|Y2 = 3) is the conditional probability that Y1 = 2 given that Y2 = 3. A similar interpretation can be attached to the conditional probability p(y2 |y1 ). Note that p(y1 |y2 ) is undeﬁned if p2 (y2 ) = 0. E X A M PL E 5.7

Refer to Example 5.5 and ﬁnd the conditional distribution of Y1 given that Y2 = 1. That is, given that one of the two people on the committee is a Democrat, ﬁnd the conditional distribution for the number of Republicans selected for the committee.

Solution

The joint probabilities are given in Table 5.2. To ﬁnd p(y1 |Y2 = 1), we concentrate on the row corresponding to Y2 = 1. Then 2/15 1 p(0, 1) = = , p2 (1) 8/15 4 6/15 3 p(1, 1) P(Y1 = 1|Y2 = 1) = = = , p2 (1) 8/15 4 P(Y1 = 0|Y2 = 1) =

and P(Y1 ≥ 2|Y2 = 1) =

0 p(2, 1) = = 0. p2 (1) 8/15

In the randomly selected committee, if one person is a Democrat (equivalently, if Y2 = 1), there is a high probability that the other will be a Republican (equivalently, Y1 = 1).

240

Chapter 5

Multivariate Probability Distributions

In the continuous case, we can obtain an appropriate analogue of the conditional probability function p(y1 |y2 ), but it is not obtained in such a straightforward manner. If Y1 and Y2 are continuous, P(Y1 = y1 |Y2 = y2 ) cannot be deﬁned as in the discrete case because both (Y1 = y1 ) and (Y2 = y2 ) are events with zero probability. The following considerations, however, do lead to a useful and consistent deﬁnition for a conditional density function. Assuming that Y1 and Y2 are jointly continuous with density function f (y1 , y2 ), we might be interested in a probability of the form P(Y1 ≤ y1 |Y2 = y2 ) = F(y1 |y2 ), which, as a function of y1 for a ﬁxed y2 , is called the conditional distribution function of Y1 , given Y2 = y2 . DEFINITION 5.6

If Y1 and Y2 are jointly continuous random variables with joint density function f (y1 , y2 ), then the conditional distribution function of Y1 given Y2 = y2 is F(y1 |y2 ) = P(Y1 ≤ y1 |Y2 = y2 ). Notice that F(y1 |y2 ) is a function of y1 for a ﬁxed value of y2 . If we could take F(y1 |y2 ), multiply by P(Y2 = y2 ) for each possible value of Y2 , and sum all the resulting probabilities, we would obtain F(y1 ). This is not possible because the number of values for y2 is uncountable and all probabilities P(Y2 = y2 ) are zero. But we can do something analogous by multiplying by f 2 (y2 ) and then integrating to obtain " ∞ F(y1 ) = F(y1 |y2 ) f 2 (y2 ) dy2 . −∞

The quantity f 2 (y2 ) dy2 can be thought of as the approximate probability that Y2 takes on a value in a small interval about y2 , and the integral is a generalized sum. Now from previous considerations, we know that " y1 " y1 " ∞ f 1 (t1 ) dt1 = f (t1 , y2 ) dy2 dt1 F(y1 ) = " =

−∞ ∞

−∞

"

−∞

y1

−∞

−∞

f (t1 , y2 ) dt1 dy2 .

From these two expressions for F(y1 ), we must have " y1 F(y1 |y2 ) f 2 (y2 ) = f (t1 , y2 ) dt1 −∞

or

" F(y1 |y2 ) =

y1 −∞

f (t1 , y2 ) dt1 . f 2 (y2 )

We will call the integrand of this expression the conditional density function of Y1 given Y2 = y2 , and we will denote it by f (y1 |y2 ).

5.3

DEFINITION 5.7

Marginal and Conditional Probability Distributions

241

Let Y1 and Y2 be jointly continuous random variables with joint density f (y1 , y2 ) and marginal densities f 1 (y1 ) and f 2 (y2 ), respectively. For any y2 such that f 2 (y2 ) > 0, the conditional density of Y1 given Y2 = y2 is given by f (y1 , y2 ) f 2 (y2 ) and, for any y1 such that f 1 (y1 ) > 0, the conditional density of Y2 given Y1 = y1 is given by f (y1 |y2 ) =

f (y1 , y2 ) . f 1 (y1 )

f (y2 |y1 ) =

Note that the conditional density f (y1 |y2 ) is undeﬁned for all y2 such that f 2 (y2 ) = 0. Similarly, f (y2 |y1 ) is undeﬁned if y1 is such that f 1 (y1 ) = 0. E X A M PL E 5.8

A soft-drink machine has a random amount Y2 in supply at the beginning of a given day and dispenses a random amount Y1 during the day (with measurements in gallons). It is not resupplied during the day, and hence Y1 ≤ Y2 . It has been observed that Y1 and Y2 have a joint density given by $ 1/2, 0 ≤ y1 ≤ y2 ≤ 2, f (y1 , y2 ) = 0 elsewhere. That is, the points (y1 , y2 ) are uniformly distributed over the triangle with the given boundaries. Find the conditional density of Y1 given Y2 = y2 . Evaluate the probability that less than 1/2 gallon will be sold, given that the machine contains 1.5 gallons at the start of the day.

Solution

The marginal density of Y2 is given by " f 2 (y2 ) =

∞

−∞

Thus,

f 2 (y2 ) =

"

y2

f (y1 , y2 ) dy1 .

(1/2) dy1 = (1/2)y2 , 0 ≤ y2 ≤ 2,

"

∞

−∞

0 dy1 = 0,

elsewhere.

Note that f 2 (y2 ) > 0 if and only if 0 < y2 ≤ 2. Thus, for any 0 < y2 ≤ 2, using Deﬁnition 5.7, f (y1 |y2 ) =

1/2 1 f (y1 , y2 ) = = , f 2 (y2 ) (1/2)(y2 ) y2

0 ≤ y1 ≤ y2 .

Also, f (y1 |y2 ) is undeﬁned if y2 ≤ 0 or y2 > 2. The probability of interest is " 1/2 " 1/2 1 1/2 1 P(Y1 ≤ 1/2|Y2 = 1.5) = f (y1 |y2 = 1.5) dy1 = dy1 = = . 1.5 1.5 3 −∞ 0

242

Chapter 5

Multivariate Probability Distributions

If the machine contains 2 gallons at the start of the day, then " 1/2 1 1 dy1 = . P(Y1 ≤ 1/2|Y2 = 2) = 2 4 0 Thus, the conditional probability that Y1 ≤ 1/2 given Y2 = y2 changes appreciably depending on the particular choice of y2 .

Exercises 5.19

In Exercise 5.1, we determined that the joint distribution of Y1 , the number of contracts awarded to ﬁrm A, and Y2 , the number of contracts awarded to ﬁrm B, is given by the entries in the following table. y1 y2

1

2

0 1 2

1/9 2/9 1/9

2/9 2/9 0

1/9 0 0

a Find the marginal probability distribution of Y1 . b According to results in Chapter 4, Y1 has a binomial distribution with n = 2 and p = 1/3. Is there any conﬂict between this result and the answer you provided in part (a)?

5.20

Refer to Exercise 5.2. a Derive the marginal probability distribution for your winnings on the side bet. b What is the probability that you obtained three heads, given that you won $1 on the side bet?

5.21

In Exercise 5.3, we determined that the joint probability distribution of Y1 , the number of married executives, and Y2 , the number of never-married executives, is given by 3 2 4 y2 3 − y1 − y2 y1 p(y1 , y2 ) = 9 3 where y1 and y2 are integers, 0 ≤ y1 ≤ 3, 0 ≤ y2 ≤ 3, and 1 ≤ y1 + y2 ≤ 3. a Find the marginal probability distribution of Y1 , the number of married executives among the three selected for promotion. b Find P(Y1 = 1|Y2 = 2). c If we let Y3 denote the number of divorced executives among the three selected for promotion, then Y3 = 3 − Y1 − Y2 . Find P(Y3 = 1|Y2 = 1). d Compare the marginal distribution derived in (a) with the hypergeometric distributions with N = 9, n = 3, and r = 4 encountered in Section 3.7.

5.22

In Exercise 5.4, you were given the following joint probability function for 0, if no belt used, 0, if child survived, Y1 = and Y2 = 1, if adult belt used, 1, if not, 2, if car-seat belt used.

Exercises

243

y1 y2

1

Total

0 1 2

.38 .14 .24

.17 .02 .05

.55 .16 .29

Total

.76

.24

1.00

a Give the marginal probability functions for Y1 and Y2 . b Give the conditional probability function for Y2 given Y1 = 0. c What is the probability that a child survived given that he or she was in a car-seat belt?

5.23

In Example 5.4 and Exercise 5.5, we considered the joint density of Y1 , the proportion of the capacity of the tank that is stocked at the beginning of the week, and Y2 , the proportion of the capacity sold during the week, given by $ f (y1 , y2 ) =

3y1 , 0,

0 ≤ y2 ≤ y1 ≤ 1, elsewhere.

a Find the marginal density function for Y2 . b For what values of y2 is the conditional density f (y1 |y2 ) deﬁned? c What is the probability that more than half a tank is sold given that three-fourths of a tank is stocked?

5.24

In Exercise 5.6, we assumed that if a radioactive particle is randomly located in a square with sides of unit length, a reasonable model for the joint density function for Y1 and Y2 is $ f (y1 , y2 ) = a b c d e f g

5.25

1, 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, 0, elsewhere.

Find the marginal density functions for Y1 and Y2 . What is P(.3 < Y1 < .5)? P(.3 < Y2 < .5)? For what values of y2 is the conditional density f (y1 |y2 ) deﬁned? For any y2 , 0 ≤ y2 ≤ 1 what is the conditional density function of Y1 given that Y2 = y2 ? Find P(.3 < Y1 < .5|Y2 = .3). Find P(.3 < Y1 < .5|Y2 = .5). Compare the answers that you obtained in parts (a), (d), and (e). For any y2 , 0 ≤ y2 ≤ 1 how does P(.3 < Y1 < .5) compare to P(.3 < Y1 < .5|Y2 = y2 )?

Let Y1 and Y2 have joint density function ﬁrst encountered in Exercise 5.7: f (y1 , y2 ) = a b c d e

e−(y1 +y2 ) ,

y1 > 0, y2 > 0,

0,

elsewhere.

Find the marginal density functions for Y1 and Y2 . Identify these densities as one of those studied in Chapter 4. What is P(1 < Y1 < 2.5)? P(1 < Y2 < 2.5)? For what values of y2 is the conditional density f (y1 |y2 ) deﬁned? For any y2 > 0, what is the conditional density function of Y1 given that Y2 = y2 ? For any y1 > 0, what is the conditional density function of Y2 given that Y1 = y1 ?

244

Chapter 5

Multivariate Probability Distributions

f For any y2 > 0, how does the conditional density function f (y1 |y2 ) that you obtained in part (d) compare to the marginal density function f 1 (y1 ) found in part (a)? g What does your answer to part (f ) imply about marginal and conditional probabilities that Y1 falls in any interval?

5.26

In Exercise 5.8, we derived the fact that 4y1 y2 , f (y1 , y2 ) = 0,

0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, elsewhere

is a valid joint probability density function. Find a the marginal density functions for Y1 and Y2 . b P(Y1 ≤ 1/2|Y2 ≥ 3/4). c the conditional density function of Y1 given Y2 = y2 . d the conditional density function of Y2 given Y1 = y1 . e P(Y1 ≤ 3/4|Y2 = 1/2).

5.27

In Exercise 5.9, we determined that f (y1 , y2 ) =

6(1 − y2 ),

0 ≤ y1 ≤ y2 ≤ 1,

0,

elsewhere

is a valid joint probability density function. Find a the marginal density functions for Y1 and Y2 . b P(Y2 ≤ 1/2|Y1 ≤ 3/4). c the conditional density function of Y1 given Y2 = y2 . d the conditional density function of Y2 given Y1 = y1 . e P(Y2 ≥ 3/4|Y1 = 1/2).

5.28

In Exercise 5.10, we proved that f (y1 , y2 ) =

1, 0 ≤ y1 ≤ 2, 0 ≤ y2 ≤ 1, 2y2 ≤ y1 , 0, elsewhere

is a valid joint probability density function for Y1 , the amount of pollutant per sample collected above the stack without the cleaning device, and for Y2 , the amount collected above the stack with the cleaner. a If we consider the stack with the cleaner installed, ﬁnd the probability that the amount of pollutant in a given sample will exceed .5. b Given that the amount of pollutant in a sample taken above the stack with the cleaner is observed to be 0.5, ﬁnd the probability that the amount of pollutant exceeds 1.5 above the other stack (without the cleaner).

5.29

Refer to Exercise 5.11. Find a the marginal density functions for Y1 and Y2 . b P(Y2 > 1/2|Y1 = 1/4).

5.30

In Exercise 5.12, we were given the following joint probability density function for the random variables Y1 and Y2 , which were the proportions of two components in a sample from a mixture

Exercises

of insecticide:

f (y1 , y2 ) =

2,

0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, 0 ≤ y1 + y2 ≤ 1,

0,

elsewhere.

245

a Find P(Y1 ≥ 1/2|Y2 ≤ 1/4). b Find P(Y1 ≥ 1/2|Y2 = 1/4).

5.31

In Exercise 5.13, the joint density function of Y1 and Y2 is given by 30y1 y22 , y1 − 1 ≤ y2 ≤ 1 − y1 , 0 ≤ y1 ≤ 1, f (y1 , y2 ) = 0, elsewhere. a Show that the marginal density of Y1 is a beta density with α = 2 and β = 4. b Derive the marginal density of Y2 . c Derive the conditional density of Y2 given Y1 = y1 . d Find P(Y2 > 0|Y1 = .75).

5.32

Suppose that the random variables Y1 and Y2 have joint probability density function, f (y1 , y2 ), given by (see Exercise 5.14) 6y12 y2 , 0 ≤ y1 ≤ y2 , y1 + y2 ≤ 2, f (y1 , y2 ) = 0, elsewhere. a Show that the marginal density of Y1 is a beta density with α = 3 and β = 2. b Derive the marginal density of Y2 . c Derive the conditional density of Y2 given Y1 = y1 . d Find P(Y2 < 1.1|Y1 = .60).

5.33

Suppose that Y1 is the total time between a customer’s arrival in the store and departure from the service window, Y2 is the time spent in line before reaching the window, and the joint density of these variables (as was given in Exercise 5.15) is −y e 1 , 0 ≤ y2 ≤ y1 ≤ ∞, f (y1 , y2 ) = 0, elsewhere. a Find the marginal density functions for Y1 and Y2 . b What is the conditional density function of Y1 given that Y2 = y2 ? Be sure to specify the values of y2 for which this conditional density is deﬁned. c What is the conditional density function of Y2 given that Y1 = y1 ? Be sure to specify the values of y1 for which this conditional density is deﬁned. d Is the conditional density function f (y1 |y2 ) that you obtained in part (b) the same as the marginal density function f 1 (y1 ) found in part (a)? e What does your answer to part (d) imply about marginal and conditional probabilities that Y1 falls in any interval?

5.34

If Y1 is uniformly distributed on the interval (0, 1) and, for 0 < y1 < 1, 1/y1 , 0 ≤ y2 ≤ y1 , f (y2 |y1 ) = 0, elsewhere, a what is the “name” of the conditional distribution of Y2 given Y1 = y1 ? b ﬁnd the joint density function of Y1 and Y2 . c ﬁnd the marginal density function for Y2 .

246

Chapter 5

Multivariate Probability Distributions

5.35

Refer to Exercise 5.33. If two minutes elapse between a customer’s arrival at the store and his departure from the service window, ﬁnd the probability that he waited in line less than one minute to reach the window.

5.36

In Exercise 5.16, Y1 and Y2 denoted the proportions of time during which employees I and II actually performed their assigned tasks during a workday. The joint density of Y1 and Y2 is given by y1 + y2 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. a Find the marginal density functions for Y1 and Y2 . b Find P(Y1 ≥ 1/2|Y2 ≥ 1/2). c If employee II spends exactly 50% of the day working on assigned duties, ﬁnd the probability that employee I spends more than 75% of the day working on similar duties.

5.37

In Exercise 5.18, Y1 and Y2 denoted the lengths of life, in hundreds of hours, for components of types I and II, respectively, in an electronic system. The joint density of Y1 and Y2 is given by (1/8)y1 e−(y1 +y2 )/2 , y1 > 0, y2 > 0 f (y1 , y2 ) = 0, elsewhere. Find the probability that a component of type II will have a life length in excess of 200 hours.

5.38

Let Y1 denote the weight (in tons) of a bulk item stocked by a supplier at the beginning of a week and suppose that Y1 has a uniform distribution over the interval 0 ≤ y1 ≤ 1. Let Y2 denote the amount (by weight) of this item sold by the supplier during the week and suppose that Y2 has a uniform distribution over the interval 0 ≤ y2 ≤ y1 , where y1 is a speciﬁc value of Y1 . a Find the joint density function for Y1 and Y2 . b If the supplier stocks a half-ton of the item, what is the probability that she sells more than a quarter-ton? c If it is known that the supplier sold a quarter-ton of the item, what is the probability that she had stocked more than a half-ton?

*5.39

Suppose that Y1 and Y2 are independent Poisson distributed random variables with means λ1 and λ2 , respectively. Let W = Y1 + Y2 . In Chapter 6 you will show that W has a Poisson distribution with mean λ1 + λ2 . Use this result to show that the conditional distribution of Y1 , given that W = w, is a binomial distribution with n = w and p = λ1 /(λ1 + λ2 ).1

*5.40

Suppose that Y1 and Y2 are independent binomial distributed random variables based on samples of sizes n 1 and n 2 , respectively. Suppose that p1 = p2 = p. That is, the probability of “success” is the same for the two random variables. Let W = Y1 + Y2 . In Chapter 6 you will prove that W has a binomial distribution with success probability p and sample size n 1 + n 2 . Use this result to show that the conditional distribution of Y1 , given that W = w, is a hypergeometric distribution with N = n 1 + n 2 , n = w, and r = n 1 .

*5.41

A quality control plan calls for randomly selecting three items from the daily production (assumed large) of a certain machine and observing the number of defectives. However, the proportion p of defectives produced by the machine varies from day to day and is assumed to have a uniform distribution on the interval (0, 1). For a randomly chosen day, ﬁnd the unconditional probability that exactly two defectives are observed in the sample. 1. Exercises preceded by an asterisk are optional.

5.4

*5.42

Independent Random Variables

247

The number of defects per yard Y for a certain fabric is known to have a Poisson distribution with parameter λ. However, λ itself is a random variable with probability density function given by −λ e , λ ≥ 0, f (λ) = 0, elsewhere. Find the unconditional probability function for Y .

5.4 Independent Random Variables In Example 5.8 we saw two dependent random variables, for which probabilities associated with Y1 depended on the observed value of Y2 . In Exercise 5.24 (and some others), this was not the case: Probabilities associated with Y1 were the same, regardless of the observed value of Y2 . We now present a formal deﬁnition of independence of random variables. Two events A and B are independent if P(A ∩ B) = P(A) × P(B). When discussing random variables, if a < b and c < d we are often concerned with events of the type (a < Y1 ≤ b) ∩ (c < Y2 ≤ d). For consistency with the earlier deﬁnition of independent events, if Y1 and Y2 are independent, we would like to have P(a < Y1 ≤ b, c < Y2 ≤ d) = P(a < Y1 ≤ b) × P(c < Y2 ≤ d) for any choice of real numbers a < b and c < d. That is, if Y1 and Y2 are independent, the joint probability can be written as the product of the marginal probabilities. This property will be satisﬁed if Y1 and Y2 are independent in the sense detailed in the following deﬁnition. DEFINITION 5.8

Let Y1 have distribution function F1 (y1 ), Y2 have distribution function F2 (y2 ), and Y1 and Y2 have joint distribution function F(y1 , y2 ). Then Y1 and Y2 are said to be independent if and only if F(y1 , y2 ) = F1 (y1 )F2 (y2 ) for every pair of real numbers (y1 , y2 ). If Y1 and Y2 are not independent, they are said to be dependent. It usually is convenient to establish independence, or the lack of it, by using the result contained in the following theorem. The proof is omitted; see “References and Further Readings” at the end of the chapter.

THEOREM 5.4

If Y1 and Y2 are discrete random variables with joint probability function p(y1 , y2 ) and marginal probability functions p1 (y1 ) and p2 (y2 ), respectively, then Y1 and Y2 are independent if and only if p(y1 , y2 ) = p1 (y1 ) p2 (y2 ) for all pairs of real numbers (y1 , y2 ).

248

Chapter 5

Multivariate Probability Distributions

If Y1 and Y2 are continuous random variables with joint density function f (y1 , y2 ) and marginal density functions f 1 (y1 ) and f 2 (y2 ), respectively, then Y1 and Y2 are independent if and only if f (y1 , y2 ) = f 1 (y1 ) f 2 (y2 ) for all pairs of real numbers (y1 , y2 ). We now illustrate the concept of independence with some examples. E X A M PL E 5.9 Solution

For the die-tossing problem of Section 5.2, show that Y1 and Y2 are independent. In this problem each of the 36 sample points was given probability 1/36. Consider, for example, the point (1, 2). We know that p(1, 2) = 1/36. Also, p1 (1) = P(Y1 = 1) = 1/6 and p2 (2) = P(Y2 = 2) = 1/6. Hence, p(1, 2) = p1 (1) p2 (2). The same is true for all other values for y1 and y2 , and it follows that Y1 and Y2 are independent.

E X A M PL E 5.10

Refer to Example 5.5. Is the number of Republicans in the sample independent of the number of Democrats? (Is Y1 independent of Y2 ?)

Solution

Independence of discrete random variables requires that p(y1 , y2 ) = p1 (y1 ) p2 (y2 ) for every choice (y1 , y2 ). Thus, if this equality is violated for any pair of values, (y1 , y2 ), the random variables are dependent. Looking in the upper left-hand corner of Table 5.2, we see p(0, 0) = 0. But p1 (0) = 3/15 and p2 (0) = 6/15. Hence, p(0, 0) =

p1 (0) p2 (0), so Y1 and Y2 are dependent.

E X A M PL E 5.11

Let

f (y1 , y2 ) =

6y1 y22 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, 0, elsewhere.

Show that Y1 and Y2 are independent.

5.4

Solution

Independent Random Variables

249

We have

1 " " 1 ∞ y23 2 f (y , y ) dy = 6y y dy = 6y 1 2 1 2 1 2 2 3 0 0 −∞ f 1 (y1 ) = = 2y1 , " " ∞ ∞ f (y1 , y2 ) dy2 = 0 dy1 = 0, −∞

0 ≤ y1 ≤ 1, elsewhere.

−∞

Similarly, " ∞ " 1 f (y , y ) dy = 6y1 y22 dy1 = 3y22 , 0 ≤ y2 ≤ 1, 1 2 1 −∞ 0 f 2 (y2 ) = " ∞ " ∞ f (y , y ) dy = 0 dy1 = 0, elsewhere. 1 2 1 −∞

−∞

Hence, f (y1 , y2 ) = f 1 (y1 ) f 2 (y2 ) for all real numbers (y1 , y2 ), and, therefore, Y1 and Y2 are independent.

EXAMPLE 5.12

Let

$ f (y1 , y2 ) =

2, 0 ≤ y2 ≤ y1 ≤ 1, 0, elsewhere.

Show that Y1 and Y2 are dependent. Solution

F I G U R E 5.7 Region over which f (y1 , y2 ) is positive, Example 5.12

We see that f (y1 , y2 ) = 2 over the shaded region shown in Figure 5.7. Therefore, y1 " y1 2 dy2 = 2y2 = 2y1 , 0 ≤ y1 ≤ 1, f 1 (y1 ) = 0 0 0, elsewhere. y2 1

y1

=

y2

1

y1

250

Chapter 5

Multivariate Probability Distributions

Similarly, f 2 (y2 ) =

"

1

1 2 dy1 = 2y1

y2

= 2(1 − y2 ), 0 ≤ y2 ≤ 1, y2

0,

elsewhere.

Hence,

f 1 (y1 ) f 2 (y2 ) f (y1 , y2 ) = for some pair of real numbers (y1 , y2 ), and, therefore, Y1 and Y2 are dependent.

You will observe a distinct difference in the limits of integration employed in ﬁnding the marginal density functions obtained in Examples 5.11 and 5.12. The limits of integration for y2 involved in ﬁnding the marginal density of Y1 in Example 5.12 depended on y1 . In contrast, the limits of integration were constants when we found the marginal density functions in Example 5.11. If the limits of integration are constants, the following theorem provides an easy way to show independence of two random variables. THEOREM 5.5

Let Y1 and Y2 have a joint density f (y1 , y2 ) that is positive if and only if a ≤ y1 ≤ b and c ≤ y2 ≤ d, for constants a, b, c, and d; and f (y1 , y2 ) = 0 otherwise. Then Y1 and Y2 are independent random variables if and only if f (y1 , y2 ) = g(y1 )h(y2 ) where g(y1 ) is a nonnegative function of y1 alone and h(y2 ) is a nonnegative function of y2 alone. The proof of this theorem is omitted. (See “References and Further Readings” at the end of the chapter.) The key beneﬁt of the result given in Theorem 5.5 is that we do not actually need to derive the marginal densities. Indeed, the functions g(y1 ) and h(y2 ) need not, themselves, be density functions (although they will be constant multiples of the marginal densities, should we go to the bother of determining the marginal densities).

E X A M PL E 5.13

Let Y1 and Y2 have a joint density given by $ 2y1 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Are Y1 and Y2 independent variables?

Solution

Notice that f (y1 , y2 ) is positive if and only if 0 ≤ y1 ≤ 1 and 0 ≤ y2 ≤ 1. Further, f (y1 , y2 ) = g(y1 )h(y2 ), where $ $ y1 , 0 ≤ y1 ≤ 1, 2, 0 ≤ y2 ≤ 1, and h(y2 ) = g(y1 ) = 0, elsewhere, 0, elsewhere.

Exercises

251

Therefore, Y1 and Y2 are independent random variables. Notice that g(y1 ) and h(y2 ), as deﬁned here, are not density functions, although 2g(y1 ) and h(y2 )/2 are densities.

EXAMPLE 5.14

Refer to Example 5.4. Is Y1 , the amount in stock, independent of Y2 , the amount sold?

Solution

Because the density function is positive if and only if 0 ≤ y2 ≤ y1 ≤ 1, there do not exist constants a, b, c, and d such that the density is positive over the region a ≤ y1 ≤ b, c ≤ y2 ≤ d. Thus, Theorem 5.5 cannot be applied. However, Y1 and Y2 can be shown to be dependent random variables because the joint density is not the product of the marginal densities.

Deﬁnition 5.8 easily can be generalized to n dimensions. Suppose that we have n random variables, Y1 , . . . , Yn , where Yi has distribution function Fi (yi ), for i = 1, 2, . . . , n; and where Y1 , Y2 , . . . , Yn have joint distribution function F(y1 , y2 , . . . , yn ). Then Y1 , Y2 , . . . , Yn are independent if and only if F(y1 , y2 , . . . , yn ) = F1 (y1 ) · · · Fn (yn ) for all real numbers y1 , y2 , . . . , yn , with the obvious equivalent forms for the discrete and continuous cases.

Exercises 5.43

Let Y1 and Y2 have joint density function f (y1 , y2 ) and marginal densities f 1 (y1 ) and f 2 (y2 ), respectively. Show that Y1 and Y2 are independent if and only if f (y1 |y2 ) = f 1 (y1 ) for all values of y1 and for all y2 such that f 2 (y2 ) > 0. A completely analogous argument establishes that Y1 and Y2 are independent if and only if f (y2 |y1 ) = f 2 (y2 ) for all values of y2 and for all y1 such that f 1 (y1 ) > 0.

5.44

Prove that the results in Exercise 5.43 also hold for discrete random variables.

5.45

In Exercise 5.1, we determined that the joint distribution of Y1 , the number of contracts awarded to ﬁrm A, and Y2 , the number of contracts awarded to ﬁrm B, is given by the entries in the following table. y1 y2

1

2

0 1 2

1/9 2/9 1/9

2/9 2/9 0

1/9 0 0

The marginal probability function of Y1 was derived in Exercise 5.19 to be binomial with n = 2 and p = 1/3. Are Y1 and Y2 independent? Why?

252

Chapter 5

Multivariate Probability Distributions

5.46

Refer to Exercise 5.2. The number of heads in three coin tosses is binomially distributed with n = 3, p = 1/2. Are the total number of heads and your winnings on the side bet independent? [Examine your answer to Exercise 5.20(b).]

5.47

In Exercise 5.3, we determined that the joint probability distribution of Y1 , the number of married executives, and Y2 , the number of never-married executives, is given by 3 2 4 y2 3 − y1 − y2 y1 , p(y1 , y2 ) = 9 3 where y1 and y2 are integers, 0 ≤ y1 ≤ 3, 0 ≤ y2 ≤ 3, and 1 ≤ y1 + y2 ≤ 3. Are Y1 and Y2 independent? (Recall your answer to Exercise 5.21.)

5.48

In Exercise 5.4, you were given the following joint probability function for 0, if no belt used, 0, if child survived, Y1 = and Y2 = 1, if adult belt used, 1, if not, 2, if car-seat belt used. y1 y2

1

Total

0 1 2

.38 .14 .24

.17 .02 .05

.55 .16 .29

Total

.76

.24

1.00

Are Y1 and Y2 independent? Why or why not?

5.49

In Example 5.4 and Exercise 5.5, we considered the joint density of Y1 , the proportion of the capacity of the tank that is stocked at the beginning of the week and Y2 , the proportion of the capacity sold during the week, given by 3y1 , 0 ≤ y2 ≤ y1 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Show that Y1 and Y2 are dependent.

5.50

In Exercise 5.6, we assumed that if a radioactive particle is randomly located in a square with sides of unit length, a reasonable model for the joint density function for Y1 and Y2 is 1, 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. a Are Y1 and Y2 independent? b Does the result from part (a) explain the results you obtained in Exercise 5.24 (d)–(f )? Why?

5.51

In Exercise 5.7, we considered Y1 and Y2 with joint density function −(y +y ) e 1 2 , y1 > 0, y2 > 0, f (y1 , y2 ) = 0, elsewhere. a Are Y1 and Y2 independent? b Does the result from part (a) explain the results you obtained in Exercise 5.25 (d)–(f )? Why?

Exercises

5.52

In Exercise 5.8, we derived the fact that 4y1 y2 , f (y1 , y2 ) = 0,

253

0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, elsewhere

is a valid joint probability density function. Are Y1 and Y2 independent?

5.53

In Exercise 5.9, we determined that f (y1 , y2 ) =

6(1 − y2 ),

0 ≤ y1 ≤ y2 ≤ 1,

0,

elsewhere

is a valid joint probability density function. Are Y1 and Y2 independent?

5.54

In Exercise 5.10, we proved that f (y1 , y2 ) =

1,

0 ≤ y1 ≤ 2, 0 ≤ y2 ≤ 1, 2y2 ≤ y1 ,

0, elsewhere

is a valid joint probability density function for Y1 , the amount of pollutant per sample collected above the stack without the cleaning device, and Y2 , the amount collected above the stack with the cleaner. Are the amounts of pollutants per sample collected with and without the cleaning device independent?

5.55

Suppose that, as in Exercise 5.11, Y1 and Y2 are uniformly distributed over the triangle shaded in the accompanying diagram. Are Y1 and Y2 independent?

5.56

In Exercise 5.12, we were given the following joint probability density function for the random variables Y1 and Y2 , which were the proportions of two components in a sample from a mixture of insecticide: 2, 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, 0 ≤ y1 + y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Are Y1 and Y2 independent?

5.57

In Exercises 5.13 and 5.31, the joint density function of Y1 and Y2 was given by 30y1 y22 , y1 − 1 ≤ y2 ≤ 1 − y1 , 0 ≤ y1 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Are the random variables Y1 and Y2 independent?

5.58

Suppose that the random variables Y1 and Y2 have joint probability density function, f (y1 , y2 ), given by (see Exercises 5.14 and 5.32) 6y12 y2 , 0 ≤ y1 ≤ y2 , y1 + y2 ≤ 2, f (y1 , y2 ) = 0, elsewhere. Show that Y1 and Y2 are dependent random variables.

5.59

If Y1 is the total time between a customer’s arrival in the store and leaving the service window and if Y2 is the time spent in line before reaching the window, the joint density of these variables, according to Exercise 5.15, is −y e 1 , 0 ≤ y2 ≤ y1 ≤ ∞ f (y1 , y2 ) = 0, elsewhere. Are Y1 and Y2 independent?

254

Chapter 5

Multivariate Probability Distributions

5.60

In Exercise 5.16, Y1 and Y2 denoted the proportions of time that employees I and II actually spent working on their assigned tasks during a workday. The joint density of Y1 and Y2 is given by y1 + y2 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Are Y1 and Y2 independent?

5.61

In Exercise 5.18, Y1 and Y2 denoted the lengths of life, in hundreds of hours, for components of types I and II, respectively, in an electronic system. The joint density of Y1 and Y2 is (1/8)y1 e−(y1 +y2 )/2 , y1 > 0, y2 > 0, f (y1 , y2 ) = 0, elsewhere. Are Y1 and Y2 independent?

5.62

Suppose that the probability that a head appears when a coin is tossed is p and the probability that a tail occurs is q = 1 − p. Person A tosses the coin until the ﬁrst head appears and stops. Person B does likewise. The results obtained by persons A and B are assumed to be independent. What is the probability that A and B stop on exactly the same number toss?

5.63

Let Y1 and Y2 be independent exponentially distributed random variables, each with mean 1. Find P( Y1 > Y2 | Y1 < 2Y2 ).

5.64

Let Y1 and Y2 be independent random variables that are both uniformly distributed on the interval (0, 1). Find P( Y1 < 2Y2 | Y1 < 3Y2 ).

*5.65

Suppose that, for −1 ≤ α ≤ 1, the probability density function of (Y1 , Y2 ) is given by [1 − α{(1 − 2e−y1 )(1 − 2e−y2 )}]e−y1 −y2 , 0 ≤ y1 , 0 ≤ y2 , f (y1 , y2 ) = 0, elsewhere. a Show that the marginal distribution of Y1 is exponential with mean 1. b What is the marginal distribution of Y2 ? c Show that Y1 and Y2 are independent if and only if α = 0. Notice that these results imply that there are inﬁnitely many joint densities such that both marginals are exponential with mean 1.

*5.66

Let F1 (y1 ) and F2 (y2 ) be two distribution functions. For any α, −1 ≤ α ≤ 1, consider Y1 and Y2 with joint distribution function F(y1 , y2 ) = F1 (y1 )F2 (y2 )[1 − α{1 − F1 (y1 )}{1 − F2 (y2 )}]. a What is F(y1 , ∞), the marginal distribution function of Y1 ? [Hint: What is F2 (∞)?] b What is the marginal distribution function of Y2 ? c If α = 0 why are Y1 and Y2 independent? d Are Y1 and Y2 independent if α =

0? Why? Notice that this construction can be used to produce an inﬁnite number of joint distribution functions that have the same marginal distribution functions.

5.67

In Section 5.2, we argued that if Y1 and Y2 have joint cumulative distribution function F(y1 , y2 ) then for any a < b and c < d P(a < Y1 ≤ b, c < Y2 ≤ d) = F(b, d) − F(b, c) − F(a, d) + F(a, c).

5.5

The Expected Value of a Function of Random Variables

255

If Y1 and Y2 are independent, show that P(a < Y1 ≤ b, c < Y2 ≤ d) = P(a < Y1 ≤ b) × P(c < Y2 ≤ d). [Hint: Express P(a < Y1 ≤ b) in terms of F1 (·).]

5.68

A bus arrives at a bus stop at a uniformly distributed time over the interval 0 to 1 hour. A passenger also arrives at the bus stop at a uniformly distributed time over the interval 0 to 1 hour. Assume that the arrival times of the bus and passenger are independent of one another and that the passenger will wait for up to 1/4 hour for the bus to arrive. What is the probability that the passenger will catch the bus? [Hint: Let Y1 denote the bus arrival time and Y2 the passenger arrival time; determine the joint density of Y1 and Y2 and ﬁnd P(Y2 ≤ Y1 ≤ Y2 + 1/4).]

5.69

The length of life Y for fuses of a certain type is modeled by the exponential distribution, with f (y) =

(1/3)e−y/3 ,

y > 0,

0,

elsewhere.

(The measurements are in hundreds of hours.) a b

5.70

If two such fuses have independent lengths of life Y1 and Y2 , ﬁnd the joint probability density function for Y1 and Y2 . One fuse in part (a) is in a primary system, and the other is in a backup system that comes into use only if the primary system fails. The total effective length of life of the two fuses is then Y1 + Y2 . Find P(Y1 + Y2 ≤ 1).

A supermarket has two customers waiting to pay for their purchases at counter I and one customer waiting to pay at counter II. Let Y1 and Y2 denote the numbers of customers who spend more than $50 on groceries at the respective counters. Suppose that Y1 and Y2 are independent binomial random variables, with the probability that a customer at counter I will spend more than $50 equal to .2 and the probability that a customer at counter II will spend more than $50 equal to .3. Find the a joint probability distribution for Y1 and Y2 . b probability that not more than one of the three customers will spend more than $50.

5.71

Two telephone calls come into a switchboard at random times in a ﬁxed one-hour period. Assume that the calls are made independently of one another. What is the probability that the calls are made a in the ﬁrst half hour? b within ﬁve minutes of each other?

5.5 The Expected Value of a Function of Random Variables You need only construct the multivariate analogue to the univariate situation to justify the following deﬁnition.

256

Chapter 5

Multivariate Probability Distributions

DEFINITION 5.9

Let g(Y1 , Y2 , . . . , Yk ) be a function of the discrete random variables, Y1 , Y2 , . . . , Yk , which have probability function p(y1 , y2 , . . . , yk ). Then the expected value of g(Y1 , Y2 , . . . , Yk ) is E[g(Y1 , Y2 , . . . , Yk )] = ··· g(y1 , y2 , . . . , yk ) p(y1 , y2 , . . . , yk ). all yk

all y2 all y1

If Y1 , Y2 , . . . , Yk are continuous random variables with joint density function f (y1 , y2 , . . . , yk ), then2 " ∞ " ∞" ∞ E[g(Y1 , Y2 , . . . , Yk )] = ··· g(y1 , y2 , . . . , yk ) −∞

−∞

−∞

× f (y1 , y2 , . . . , yk ) dy1 dy2 . . . dyk .

E X A M PL E 5.15

Let Y1 and Y2 have joint density given by $ 2y1 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Find E(Y1 Y2 ).

Solution

From Deﬁnition 5.9 we obtain " ∞" ∞ " y1 y2 f (y1 , y2 ) dy1 dy2 = E(Y1 Y2 ) = "

−∞ 1

=

y2 0

−∞

2y13 3

1

" dy2 = 0

1

1

"

1

y1 y2 (2y1 ) dy1 dy2

2 2 y22 1 1 = . y2 dy2 = 3 3 2 0 3

We will show that Deﬁnition 5.9 is consistent with Deﬁnition 4.5, in which we deﬁned the expected value of a univariate random variable. Consider two random variables Y1 and Y2 with density function f (y1 , y2 ). We wish to ﬁnd the expected value of g(Y1 , Y2 ) = Y1 . Then from Deﬁnition 5.9 we have " ∞" ∞ y1 f (y1 , y2 ) dy2 dy1 E(Y1 ) = " =

−∞ ∞ −∞

−∞

"

y1

∞ −∞

f (y1 , y2 ) dy2 dy1 .

The quantity within the brackets, by deﬁnition, is the marginal density function for Y1 . Therefore, we obtain " ∞ y1 f 1 (y1 ) dy1 , E(Y1 ) = −∞

which agrees with Deﬁnition 4.5. we say that the expectations exist if · · · |g(y1 , y2 , . . . , yn )| p(y1 , y2 , . . . , yk ) or if #2. Again, # · · · |g(y1 , y2 , . . . , yn )| f (y1 , y2 , . . . , yk ) dy1 . . . dyk is ﬁnite.

5.5

EXAMPLE 5.16

The Expected Value of a Function of Random Variables

257

Let Y1 and Y2 have a joint density given by $ 2y1 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Find the expected value of Y1 . "

Solution

1

E(Y1 ) =

"

"

y1 (2y1 ) dy1 dy2

0 1

=

1

2y13 3

1

"

1

dy2 = 0

2 2 dy2 = y2 3 3

1 = 0

2 . 3

Refer to Figure 5.6 and estimate the expected value of Y1 . The value E(Y1 ) = 2/3 appears to be quite reasonable.

EXAMPLE 5.17

In Figure 5.6 the mean value of Y2 appears to be equal to .5. Let us conﬁrm this visual estimate. Find E(Y2 ). "

Solution

1

E(Y2 ) = 0

"

1

"

EXAMPLE 5.18

y2 0

y2 dy2 =

1

y2 (2y1 ) dy1 dy2 =

0 1

=

"

1

y22 2

= 0

2y12 2

1 dy2 0

1 . 2

Let Y1 and Y2 be random variables with density function $ 2y1 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Find V (Y1 ).

Solution

The marginal density for Y1 obtained in Example 5.6 is $ 2y1 , 0 ≤ y1 ≤ 1, f 1 (y1 ) = 0, elsewhere. 2 Then V (Y1 ) = E Y1 − [E(Y1 )]2 , and E

Y k1

" =

∞

−∞

" y1k f 1 (y1 ) dy1

= 0

1

y1k (2y1 ) dy1

2y1k+2 = k+2

1 = 0

2 . k+2

258

Chapter 5

Multivariate Probability Distributions

2 If we let k = 1 and k = 2, it2 follows that2 E(Y1 ) and E Y2 1 are 2/3 and 1/2, respectively. Then V (Y1 ) = E Y 1 − [E(Y1 )] = 1/2 − (2/3) = 1/18.

E X A M PL E 5.19

A process for producing an industrial chemical yields a product containing two types of impurities. For a speciﬁed sample from this process, let Y1 denote the proportion of impurities in the sample and let Y2 denote the proportion of type I impurities among all impurities found. Suppose that the joint distribution of Y1 and Y2 can be modeled by the following probability density function: $ 2(1 − y1 ), 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Find the expected value of the proportion of type I impurities in the sample.

Solution

Because Y1 is the proportion of impurities in the sample and Y2 is the proportion of type I impurities among the sample impurities, it follows that Y1 Y2 is the proportion of type I impurities in the entire sample. Thus, we want to ﬁnd E(Y1 Y2 ): " 1" 1 " 1 1 2y1 y2 (1 − y1 ) dy2 dy1 = 2 y1 (1 − y1 ) dy1 E(Y1 Y2 ) = 2 0 0 0 2 1 " 1 y1 y3 1 1 1 − 1 y1 − y12 dy1 = = = − = . 2 3 2 3 6 0 0 Therefore, we would expect 1/6 of the sample to be made up of type I impurities.

5.6 Special Theorems Theorems that facilitate computation of the expected value of a constant, the expected value of a constant times a function of random variables, and the expected value of the sum of functions of random variables are similar to those for the univariate case. THEOREM 5.6

Let c be a constant. Then E(c) = c.

THEOREM 5.7

Let g(Y1 , Y2 ) be a function of the random variables Y1 and Y2 and let c be a constant. Then E[cg(Y1 , Y2 )] = cE[g(Y1 , Y2 )].

5.6

THEOREM 5.8

Special Theorems

259

Let Y1 and Y2 be random variables and g1 (Y1 , Y2 ), g2 (Y1 , Y2 ), . . . , gk (Y1 , Y2 ) be functions of Y1 and Y2 . Then E[g1 (Y1 , Y2 ) + g2 (Y1 , Y2 ) + · · · + gk (Y1 , Y2 )] = E[g1 (Y1 , Y2 )] + E[g2 (Y1 , Y2 )] + · · · + E[gk (Y1 , Y2 )]. The proofs of these three theorems are analogous to the univariate cases discussed in Chapters 3 and 4.

EXAMPLE 5.20 Solution

Refer to Example 5.4. The random variable Y1 − Y2 denotes the proportional amount of gasoline remaining at the end of the week. Find E(Y1 − Y2 ). Employing Theorem 5.8 with g1 (Y1 , Y2 ) = Y1 and g(Y1 , Y2 ) = −Y2 , we see that E(Y1 − Y2 ) = E(Y1 ) + E(−Y2 ). Theorem 5.7 applies, yielding E(−Y2 ) = −E(Y2 ); therefore, E(Y1 − Y2 ) = E(Y1 ) − E(Y2 ). Also,

"

1

E(Y1 ) = "

0 1

E(Y2 ) =

" "

=

3 4 y 8 1

y1

0 y1 0

1

1 3 4 3 y1 (3y1 ) dy2 dy1 = dy1 = y1 = , 4 4 0 0 2 y1 " 1 " 1 y2 3 3 y1 dy1 y2 (3y1 ) dy2 dy1 = 3y1 dy1 = 2 0 0 2 0 "

3y13

= 0

1

3 . 8

Thus, E(Y1 − Y2 ) = (3/4) − (3/8) = 3/8, so we would expect 3/8 of the tank to be ﬁlled at the end of the week’s sales.

If the random variables under study are independent, we sometimes can simplify the work involved in ﬁnding expectations. The following theorem is quite useful in this regard. THEOREM 5.9

Let Y1 and Y2 be independent random variables and g(Y1 ) and h(Y2 ) be functions of only Y1 and Y2 , respectively. Then E[g(Y1 )h(Y2 )] = E[g(Y1 )]E[h(Y2 )], provided that the expectations exist.

260

Chapter 5

Multivariate Probability Distributions

Proof

We will give the proof of the result for the continuous case. Let f (y1 , y2 ) denote the joint density of Y1 and Y2 . The product g(Y1 )h(Y2 ) is a function of Y1 and Y2 . Hence, by Deﬁnition 5.9 and the assumption that Y1 and Y2 are independent, " ∞" ∞ g(y1 )h(y2 ) f (y1 , y2 ) dy2 dy1 E [g(Y1 )h(Y2 )] = " = = =

−∞ ∞

−∞ " ∞ −∞ " ∞ −∞

"

−∞ ∞ −∞

g(y1 )h(y2 ) f 1 (y1 ) f 2 (y2 ) dy2 dy1 "

g(y1 ) f 1 (y1 )

∞ −∞

h(y2 ) f 2 (y2 ) dy2 dy1

g(y1 ) f 1 (y1 )E [h(Y2 )] dy1

= E [h(Y2 )]

"

∞

−∞

g(y1 ) f 1 (y1 ) dy1 = E [g(Y1 )] E [h(Y2 )] .

The proof for the discrete case follows in an analogous manner.

E X A M PL E 5.21

Solution

Refer to Example 5.19. In that example we found E(Y1 Y2 ) directly. By investigating the form of the joint density function given there, we can see that Y1 and Y2 are independent. Find E(Y1 Y2 ) by using the result that E(Y1 Y2 ) = E(Y1 )E(Y2 ) if Y1 and Y2 are independent. The joint density function is given by $ 2(1 − y1 ), 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Hence, #1 0

f 1 (y1 ) = and

2(1 − y1 ) dy2 = 2(1 − y1 ), 0 ≤ y1 ≤ 1,

0,

elsewhere,

# 1 1 2(1 − y ) dy = −(1 − y )2 = 1, 1 1 1 0 0 f 2 (y2 ) = 0,

0 ≤ y2 ≤ 1, elsewhere.

We then have "

1

E(Y1 ) =

y1 [2(1 − y1 )] dy1 = 2

E(Y2 ) = 1/2 because Y2 is uniformly distributed over (0, 1).

y3 y12 − 1 2 3

1 = 0

1 , 3

Exercises

261

It follows that E(Y1 Y2 ) = E(Y1 )E(Y2 ) = (1/3)(1/2) = 1/6, which agrees with the answer in Example 5.19.

Exercises 5.72

In Exercise 5.1, we determined that the joint distribution of Y1 , the number of contracts awarded to ﬁrm A, and Y2 , the number of contracts awarded to ﬁrm B, is given by the entries in the following table. y1 y2

1

2

0 1 2

1/9 2/9 1/9

2/9 2/9 0

1/9 0 0

The marginal probability function of Y1 was derived in Exercise 5.19 to be binomial with n = 2 and p = 1/3. Find a b c

E(Y1 ). V (Y1 ). E(Y1 − Y2 ).

5.73

In Exercise 5.3, we determined that the joint probability distribution of Y1 , the number of married executives, and Y2 , the number of never-married executives, is given by 3 2 4 y2 3 − y1 − y2 y1 p(y1 , y2 ) = , 9 3 where y1 and y2 are integers, 0 ≤ y1 ≤ 3, 0 ≤ y2 ≤ 3, and 1 ≤ y1 + y2 ≤ 3. Find the expected number of married executives among the three selected for promotion. (See Exercise 5.21.)

5.74

Refer to Exercises 5.6, 5.24, and 5.50. Suppose that a radioactive particle is randomly located in a square with sides of unit length. A reasonable model for the joint density function for Y1 and Y2 is $ 1, 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. a What is E(Y1 − Y2 )? b What is E(Y1 Y2 )? c What is E(Y12 + Y22 )? d What is V (Y1 Y2 )?

5.75

Refer to Exercises 5.7, 5.25, and 5.51. Let Y1 and Y2 have joint density function −(y +y ) e 1 2 , y1 > 0, y2 > 0 f (y1 , y2 ) = 0, elsewhere.

262

Chapter 5

Multivariate Probability Distributions

a b c d e

5.76

What are E(Y1 + Y2 ) and V (Y1 + Y2 )? What is P(Y1 − Y2 > 3)? What is P(Y1 − Y2 < −3)? What are E(Y1 − Y2 ) and V (Y1 − Y2 )? What do you notice about V (Y1 + Y2 ) and V (Y1 − Y2 )?

In Exercise 5.8, we derived the fact that $ 4y1 y2 , f (y1 , y2 ) = 0,

0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, elsewhere.

a Find E(Y1 ). b Find V (Y1 ). c Find E(Y1 − Y2 ).

5.77

In Exercise 5.9, we determined that

$

6(1 − y2 ), 0 ≤ y1 ≤ y2 ≤ 1, 0, elsewhere is a valid joint probability density function. Find f (y1 , y2 ) =

a b c

5.78

E(Y1 ) and E(Y2 ). V (Y1 ) and V (Y2 ). E(Y1 − 3Y2 ).

In Exercise 5.10, we proved that

$

1, 0 ≤ y1 ≤ 2, 0 ≤ y2 ≤ 1, 2y2 ≤ y1 , 0, elsewhere is a valid joint probability density function for Y1 , the amount of pollutant per sample collected above the stack without the cleaning device, and Y2 , the amount collected above the stack with the cleaner. f (y1 , y2 ) =

a Find E(Y1 ) and E(Y2 ). b Find V (Y1 ) and V (Y2 ). c The random variable Y1 − Y2 represents the amount by which the weight of pollutant can be reduced by using the cleaning device. Find E(Y1 − Y2 ). d Find V (Y1 − Y2 ). Within what limits would you expect Y1 − Y2 to fall?

5.79

Suppose that, as in Exercise 5.11, Y1 and Y2 are uniformly distributed over the triangle shaded in the accompanying diagram. Find E(Y1 Y2 ).

y2 (0, 1)

(–1, 0)

5.80

(1, 0)

y1

In Exercise 5.16, Y1 and Y2 denoted the proportions of time that employees I and II actually spent working on their assigned tasks during a workday. The joint density of Y1 and Y2 is

Exercises

given by

f (y1 , y2 ) =

y1 + y2 ,

0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1,

0,

elsewhere.

263

Employee I has a higher productivity rating than employee II and a measure of the total productivity of the pair of employees is 30Y1 + 25Y2 . Find the expected value of this measure of productivity.

5.81

In Exercise 5.18, Y1 and Y2 denoted the lengths of life, in hundreds of hours, for components of types I and II, respectively, in an electronic system. The joint density of Y1 and Y2 is (1/8)y1 e−(y1 +y2 )/2 , y1 > 0, y2 > 0, f (y1 , y2 ) = 0, elsewhere. One way to measure the relative efﬁciency of the two components is to compute the ratio Y2 /Y1 . Find E(Y2 /Y1 ). [Hint: In Exercise 5.61, we proved that Y1 and Y2 are independent.]

5.82

In Exercise 5.38, we determined that the joint density function for Y1 , the weight in tons of a bulk item stocked by a supplier, and Y2 , the weight of the item sold by the supplier, has joint density 1/y1 , 0 ≤ y2 ≤ y1 ≤ 1, f (y1 , y2 ) = 0, elsewhere. In this case, the random variable Y1 − Y2 measures the amount of stock remaining at the end of the week, a quantity of great importance to the supplier. Find E(Y1 − Y2 ).

5.83

In Exercise 5.42, we determined that the unconditional probability distribution for Y , the number of defects per yard in a certain fabric, is p(y) = (1/2) y+1 ,

y = 0, 1, 2, . . . .

Find the expected number of defects per yard.

5.84

In Exercise 5.62, we considered two individuals who each tossed a coin until the ﬁrst head appears. Let Y1 and Y2 denote the number of times that persons A and B toss the coin, respectively. If heads occurs with probability p and tails occurs with probability q = 1 − p, it is reasonable to conclude that Y1 and Y2 are independent and that each has a geometric distribution with parameter p. Consider Y1 − Y2 , the difference in the number of tosses required by the two individuals. a Find E(Y1 ), E(Y2 ), and E(Y1 − Y2 ). b Find E(Y12 ), E(Y22 ), and E(Y1 Y2 ) (recall that Y1 and Y2 are independent). c Find E(Y1 − Y2 )2 and V (Y1 − Y2 ). d Give an interval that will contain Y1 − Y2 with probability at least 8/9.

5.85

In Exercise 5.65, we considered random variables Y1 and Y2 that, for −1 ≤ α ≤ 1, have joint density function given by [1 − α{(1 − 2e−y1 )(1 − 2e−y2 )}]e−y1 −y2 , 0 ≤ y1 , 0 ≤ y2 , f (y1 , y2 ) = 0, elsewhere and established that the marginal distributions of Y1 and Y2 are both exponential with mean 1. Find a b

E(Y1 ) and E(Y2 ). V (Y1 ) and V (Y2 ).

264

Chapter 5

Multivariate Probability Distributions

c d e

E(Y1 − Y2 ). E(Y1 Y2 ). V (Y1 − Y2 ). Within what limits would you expect Y1 − Y2 to fall?

*5.86

Suppose that Z is a standard normal random variable and that Y1 and Y2 are χ 2 -distributed random variables with ν1 and ν2 degrees of freedom, respectively. Further, assume that Z , Y1 , and Y2 are independent. √ ) and V (W ). What assumptions do you need about the a Deﬁne W = Z / Y1 . Find E(W√ value of ν1 ? [Hint: W = Z (1/ Y1 ) = g(Z )h(Y1 ). Use Theorem 5.9. The results of Exercise 4.112(d) will also be useful.] b Deﬁne U = Y1 /Y2 . Find E(U ) and V (U ). What assumptions about ν1 and ν2 do you need? Use the hint from part (a).

5.87

Suppose that Y1 and Y2 are independent χ 2 random variables with ν1 and ν2 degrees of freedom, respectively. Find a b

5.88

E(Y1 + Y2 ). V (Y1 + Y2 ). [Hint: Use Theorem 5.9 and the result of Exercise 4.112(a).]

Suppose that you are told to toss a die until you have observed each of the six faces. What is the expected number of tosses required to complete your assignment? [Hint: If Y is the number of trials to complete the assignment, Y = Y1 + Y2 + Y3 + Y4 + Y5 + Y6 , where Y1 is the trial on which the ﬁrst face is tossed, Y1 = 1, Y2 is the number of additional tosses required to get a face different than the ﬁrst, Y3 is the number of additional tosses required to get a face different than the ﬁrst two distinct faces, . . . , Y6 is the number of additional tosses to get the last remaining face after all other faces have been observed. Notice further that for i = 1, 2, . . . , 6, Yi has a geometric distribution with success probability (7 − i)/6.]

5.7 The Covariance of Two Random Variables Intuitively, we think of the dependence of two random variables Y1 and Y2 as implying that one variable—say, Y1 —either increases or decreases as Y2 changes. We will conﬁne our attention to two measures of dependence: the covariance between two random variables and their correlation coefﬁcient. In Figure 5.8(a) and (b), we give plots of the observed values of two variables, Y1 and Y2 , for samples of n = 10 experimental units drawn from each of two populations. If all the points fall along a straight line, as indicated in Figure 5.8(a), Y1 and Y2 are obviously dependent. In contrast, Figure 5.8(b) indicates little or no dependence between Y1 and Y2 . Suppose that we knew the values of E(Y1 ) = µ1 and E(Y2 ) = µ2 and located this point on the graph in Figure 5.8. Now locate a plotted point, (y1 , y2 ), on Figure 5.8(a) and measure the deviations (y1 − µ1 ) and (y2 − µ2 ). Both deviations assume the same algebraic sign for any point, (y1 , y2 ), and their product (y1 − µ1 )(y2 − µ2 ) is positive. Points to the right of µ1 yield pairs of positive deviations; points to the left produce pairs of negative deviations; and the average of the product of the deviations (y1 −µ1 )(y2 −µ2 ) is large and positive. If the linear relation indicated in Figure 5.8(a) had sloped downward to the right, all corresponding pairs of deviations would have been of the opposite sign, and the average value of (y1 − µ1 )(y2 − µ2 ) would have been a large negative number.

5.7

F I G U R E 5.8 Dependent and independent observations for (y1 , y2 )

y2

y2

2

2

1

The Covariance of Two Random Variables

1

y1

(a)

265

y1 (b)

The situation just described does not occur for Figure 5.8(b), where little dependence exists between Y1 and Y2 . Their corresponding deviations (y1 −µ1 ) and (y2 −µ2 ) will assume the same algebraic sign for some points and opposite signs for others. Thus, the product (y1 − µ1 )(y2 − µ2 ) will be positive for some points, negative for others, and will average to some value near zero. Clearly, the average value of (Y1 − µ1 )(Y2 − µ2 ) provides a measure of the linear dependence between Y1 and Y2 . This quantity, E[(Y1 − µ1 )(Y2 − µ2 )], is called the covariance of Y1 and Y2 . DEFINITION 5.10

If Y1 and Y2 are random variables with means µ1 and µ2 , respectively, the covariance of Y1 and Y2 is Cov(Y1 , Y2 ) = E [(Y1 − µ1 )(Y2 − µ2 )] . The larger the absolute value of the covariance of Y1 and Y2 , the greater the linear dependence between Y1 and Y2 . Positive values indicate that Y1 increases as Y2 increases; negative values indicate that Y1 decreases as Y2 increases. A zero value of the covariance indicates that the variables are uncorrelated and that there is no linear dependence between Y1 and Y2 . Unfortunately, it is difﬁcult to employ the covariance as an absolute measure of dependence because its value depends upon the scale of measurement. As a result, it is difﬁcult to determine at ﬁrst glance whether a particular covariance is large or small. This problem can be eliminated by standardizing its value and using the correlation coefﬁcient, ρ, a quantity related to the covariance and deﬁned as

ρ=

Cov(Y1 , Y2 ) σ1 σ2

where σ1 and σ2 are the standard deviations of Y1 and Y2 , respectively. Supplemental discussions of the correlation coefﬁcient may be found in Hogg, Craig, and McKean (2005) and Myers (2000). A proof that the correlation coefﬁcient ρ satisﬁes the inequality −1 ≤ ρ ≤ 1 is outlined in Exercise 5.167.

266

Chapter 5

Multivariate Probability Distributions

The sign of the correlation coefﬁcient is the same as the sign of the covariance. Thus, ρ > 0 indicates that Y2 increases as Y1 increases, and ρ = +1 implies perfect correlation, with all points falling on a straight line with positive slope. A value of ρ = 0 implies zero covariance and no correlation. A negative coefﬁcient of correlation implies a decrease in Y2 as Y1 increases, and ρ = −1 implies perfect correlation, with all points falling on a straight line with negative slope. A convenient computational formula for the covariance is contained in the next theorem. THEOREM 5.10

Proof

If Y1 and Y2 are random variables with means µ1 and µ2 , respectively, then Cov(Y1 , Y2 ) = E [(Y1 − µ1 )(Y2 − µ2 )] = E(Y1 Y2 ) − E(Y1 )E(Y2 ). Cov(Y1 , Y2 ) = E [(Y1 − µ1 )(Y2 − µ2 )] = E(Y1 Y2 − µ1 Y2 − µ2 Y1 + µ1 µ2 ). From Theorem 5.8, the expected value of a sum is equal to the sum of the expected values; and from Theorem 5.7, the expected value of a constant times a function of random variables is the constant times the expected value. Thus, Cov(Y1 , Y2 ) = E(Y1 Y2 ) − µ1 E(Y2 ) − µ2 E(Y1 ) + µ1 µ2 . Because E(Y1 ) = µ1 and E(Y2 ) = µ2 , it follows that Cov(Y1 , Y2 ) = E(Y1 Y2 ) − E(Y1 )E(Y2 ) = E(Y1 Y2 ) − µ1 µ2 .

E X A M PL E 5.22 Solution

Refer to Example 5.4. Find the covariance between the amount in stock Y1 and amount of sales Y2 . Recall that Y1 and Y2 have joint density function given by $ 3y1 , 0 ≤ y2 ≤ y1 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Thus, 2 y1 " 1 " y1 " 1 y2 E(Y1 Y2 ) = dy1 y1 y2 (3y1 ) dy2 dy1 = 3y12 2 0 0 0 0 " 1 3 4 3 y15 1 3 y1 dy1 = . = = 2 5 0 10 0 2 From Example 5.20, we know that E(Y1 ) = 3/4 and E(Y2 ) = 3/8. Thus, using Theorem 5.10, we obtain Cov(Y1 , Y2 ) = E(Y1 Y2 ) − E(Y1 )E(Y2 ) = (3/10) − (3/4)(3/8) = .30 − .28 = .02. In this example, large values of Y2 can occur only with large values of Y1 and the density, f (y1 , y2 ), is larger for larger values of Y1 (see Figure 5.4). Thus, it is intuitive that the covariance between Y1 and Y2 should be positive.

5.7

EXAMPLE 5.23

The Covariance of Two Random Variables

267

Let Y1 and Y2 have joint density given by $ 2y1 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Find the covariance of Y1 and Y2 .

Solution

From Example 5.15, E(Y1 Y2 ) = 1/3. Also, from Examples 5.16 and 5.17, µ1 = E(Y1 ) = 2/3 and µ2 = E(Y2 ) = 1/2, so Cov(Y1 , Y2 ) = E(Y1 Y2 ) − µ1 µ2 = (1/3) − (2/3)(1/2) = 0.

Example 5.23 furnishes a speciﬁc example of the general result given in Theorem 5.11. THEOREM 5.11

If Y1 and Y2 are independent random variables, then Cov(Y1 , Y2 ) = 0. Thus, independent random variables must be uncorrelated.

Proof

Theorem 5.10 establishes that Cov(Y1 , Y2 ) = E(Y1 Y2 ) − µ1 µ2 . Because Y1 and Y2 are independent, Theorem 5.9 implies that E(Y1 Y2 ) = E(Y1 )E(Y2 ) = µ1 µ2 , and the desired result follows immediately. Notice that the random variables Y1 and Y2 of Example 5.23 are independent; hence, by Theorem 5.11, their covariance must be zero. The converse of Theorem 5.11 is not true, as will be illustrated in the following example.

EXAMPLE 5.24

Let Y1 and Y2 be discrete random variables with joint probability distribution as shown in Table 5.3. Show that Y1 and Y2 are dependent but have zero covariance.

Solution

Calculation of marginal probabilities yields p1 (−1) = p1 (1) = 5/16 = p2 (−1) = p2 (1), and p1 (0) = 6/16 = p2 (0). The value p(0, 0) = 0 in the center cell stands Table 5.3 Joint probability distribution, Example 5.24

y1 y2

−1

+1

−1 0 +1

1/16 3/16 1/16

3/16 0 3/16

1/16 3/16 1/16

268

Chapter 5

Multivariate Probability Distributions

out. Obviously, p(0, 0) =

p1 (0) p2 (0), and this is sufﬁcient to show that Y1 and Y2 are dependent. Again looking at the marginal probabilities, we see that E(Y1 ) = E(Y2 ) = 0. Also, y1 y2 p(y1 , y2 ) E(Y1 Y2 ) = all y1 all y2

= (−1)(−1)(1/16) + (−1)(0)(3/16) + (−1)(1)(1/16) + (0)(−1)(3/16) + (0)(0)(0) + (0)(1)(3/16) + (1)(−1)(1/16) + (1)(0)(3/16) + (1)(1)(1/16) = (1/16) − (1/16) − (1/16) + (1/16) = 0. Thus, Cov(Y1 , Y2 ) = E(Y1 Y2 ) − E(Y1 )E(Y2 ) = 0 − 0(0) = 0. This example shows that the converse of Theorem 5.11 is not true. If the covariance of two random variables is zero, the variables need not be independent.

Exercises 5.89

1

2

0 1 2

1/9 2/9 1/9

2/9 2/9 0

1/9 0 0

Find Cov(Y1 , Y2 ). Does it surprise you that Cov(Y1 , Y2 ) is negative? Why?

5.90

In Exercise 5.3, we determined that the joint probability distribution of Y1 , the number of married executives, and Y2 , the number of never-married executives, is given by 3 2 4 y2 3 − y1 − y2 y1 , p(y1 , y2 ) = 9 3 where y1 and y2 are integers, 0 ≤ y1 ≤ 3, 0 ≤ y2 ≤ 3, and 1 ≤ y1 + y2 ≤ 3. Find Cov(Y1 , Y2 ).

5.91

In Exercise 5.8, we derived the fact that 4y1 y2 , f (y1 , y2 ) = 0,

0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, elsewhere.

Show that Cov(Y1 , Y2 ) = 0. Does it surprise you that Cov(Y1 , Y2 ) is zero? Why?

Exercises

5.92

In Exercise 5.9, we determined that f (y1 , y2 ) =

269

6(1 − y2 ), 0 ≤ y1 ≤ y2 ≤ 1, 0,

elsewhere

is a valid joint probability density function. Find Cov(Y1 , Y2 ). Are Y1 and Y2 independent?

5.93

Let the discrete random variables Y1 and Y2 have the joint probability function p(y1 , y2 ) = 1/3,

for (y1 , y2 ) = (−1, 0), (0, 1), (1, 0).

Find Cov(Y1 , Y2 ). Notice that Y1 and Y2 are dependent. (Why?) This is another example of uncorrelated random variables that are not independent.

5.94

Let Y1 and Y2 be uncorrelated random variables and consider U1 = Y1 + Y2 and U2 = Y1 − Y2 . a Find the Cov(U1 , U2 ) in terms of the variances of Y1 and Y2 . b Find an expression for the coefﬁcient of correlation between U1 and U2 . c Is it possible that Cov(U1 , U2 ) = 0? When does this occur?

5.95

Suppose that, as in Exercises 5.11 and 5.79, Y1 and Y2 are uniformly distributed over the triangle shaded in the accompanying diagram. y2 (0, 1)

(–1, 0)

(1, 0)

y1

a Find Cov(Y1 , Y2 ). b Are Y1 and Y2 independent? (See Exercise 5.55.) c Find the coefﬁcient of correlation for Y1 and Y2 . d Does your answer to part (b) lead you to doubt your answer to part (a)? Why or why not?

5.96

Suppose that the random variables Y1 and Y2 have means µ1 and µ2 and variances σ12 and σ22 , respectively. Use the basic deﬁnition of the covariance of two random variables to establish that a Cov(Y1 , Y2 ) = Cov(Y2 , Y1 ). b Cov(Y1 , Y1 ) = V (Y1 ) = σ12 . That is, the covariance of a random variable and itself is just the variance of the random variable.

5.97

The random variables Y1 and Y2 are such that E(Y1 ) = 4, E(Y2 ) = −1, V (Y1 ) = 2 and V (Y2 ) = 8. a What is Cov(Y1 , Y1 )? b Assuming that the means and variances are correct, as given, is it possible that Cov(Y1 , Y2 ) = 7? [Hint: If Cov(Y1 , Y2 ) = 7, what is the value of ρ, the coefﬁcient of correlation?] c Assuming that the means and variances are correct, what is the largest possible value for Cov(Y1 , Y2 )? If Cov(Y1 , Y2 ) achieves this largest value, what does that imply about the relationship between Y1 and Y2 ?

270

Chapter 5

Multivariate Probability Distributions

d

5.98

Assuming that the means and variances are correct, what is the smallest possible value for Cov(Y1 , Y2 )? If Cov(Y1 , Y2 ) achieves this smallest value, what does that imply about the relationship between Y1 and Y2 ?

How big or small can Cov(Y1 , Y2 ) be? Use the fact that ρ 2 ≤ 1 to show that ( ( − V (Y1 ) × V (Y2 ) ≤ Cov(Y1 , Y2 ) ≤ V (Y1 ) × V (Y2 ).

5.99 5.100

If c is any constant and Y is a random variable such that E(Y ) exists, show that Cov(c, Y ) = 0. Let Z be a standard normal random variable and let Y1 = Z and Y2 = Z 2 . a What are E(Y1 ) and E(Y2 )? b What is E(Y1 Y2 )? [Hint: E(Y1 Y2 ) = E(Z 3 ), recall Exercise 4.199.] c What is Cov(Y1 , Y2 )? d Notice that P(Y2 > 1|Y1 > 1) = 1. Are Y1 and Y2 independent?

5.101

In Exercise 5.65, we considered random variables Y1 and Y2 that, for −1 ≤ α ≤ 1, have joint density function given by f (y1 , y2 ) =

[1 − α{(1 − 2e−y1 )(1 − 2e−y2 )}]e−y1 −y2 ,

0 ≤ y1 , 0 ≤ y2 ,

elsewhere.

We established that the marginal distributions of Y1 and Y2 are both exponential with mean 1 and showed that Y1 and Y2 are independent if and only if α = 0. In Exercise 5.85, we derived E(Y1 Y2 ). a Derive Cov(Y1 , Y2 ). b Show that Cov(Y1 , Y2 ) = 0 if and only if α = 0. c Argue that Y1 and Y2 are independent if and only if ρ = 0.

5.8 The Expected Value and Variance of Linear Functions of Random Variables In later chapters in this text, especially Chapters 9 and 11, we will frequently encounter parameter estimators that are linear functions of the measurements in a sample, Y1 , Y2 , . . . , Yn . If a1 , a2 , . . . , an are constants, we will need to ﬁnd the expected value and variance of a linear function of the random variables Y1 , Y2 , . . . , Yn , U1 = a1 Y1 + a2 Y2 + a3 Y3 + · · · + an Yn =

n

ai Yi .

i=1

We also may be interested in the covariance between two such linear combinations. Results that simplify the calculation of these quantities are summarized in the following theorem.

5.8

THEOREM 5.12

The Expected Value and Variance of Linear Functions of Random Variables

271

Let Y1 , Y2 , . . . , Yn and X 1 , X 2 , . . . , X m be random variables with E(Yi ) = µi and E(X j ) = ξ j . Deﬁne U1 =

n i=1

ai Yi

and U2 =

m

bj X j

j=1

for constants a1 , a2 , . . . , an and b1 , b2 , . . . , bm . Then the following hold: n a E(U1 ) = i=1 ai µi . n 2 b V (U1 ) = 1≤i< j≤n ai a j Cov(Yi , Y j ), where the i=1 ai V (Yi ) + 2 double sum is over all pairs (i, j) with i < j. n m c Cov(U1 , U2 ) = i=1 j=1 ai b j Cov(Yi , X j ).

Before proceeding with the proof of Theorem 5.12, we illustrate the use of the theorem with an example.

EXAMPLE 5.25

Let Y1 , Y2 , and Y3 be random variables, where E(Y1 ) = 1, E(Y2 ) = 2, E(Y3 ) = −1, V (Y1 ) = 1, V (Y2 ) = 3, V (Y3 ) = 5, Cov(Y1 , Y2 ) = −0.4, Cov(Y1 , Y3 ) = 1/2, and Cov(Y2 , Y3 ) = 2. Find the expected value and variance of U = Y1 − 2Y2 + Y3 . If W = 3Y1 + Y2 , ﬁnd Cov(U, W ).

Solution

U = a1 Y1 +a2 Y2 +a3 Y3 , where a1 = 1, a2 = −2, and a3 = 1. Then by Theorem 5.12, E(U ) = a1 E(Y1 ) + a2 E(Y2 ) + a3 E(Y3 ) = (1)(1) + (−2)(2) + (1)(−1) = −4. Similarly, V (U ) = a12 V (Y1 ) + a22 V (Y2 ) + a32 V (Y3 ) + 2a1 a2 Cov(Y1 , Y2 ) + 2a1 a3 Cov(Y1 , Y3 ) + 2a2 a3 Cov(Y2 , Y3 ) = (1)2 (1) + (−2)2 (3) + (1)2 (5) + (2)(1)(−2)(−0.4) + (2)(1)(1)(1/2) + (2)(−2)(1)(2) = 12.6. Notice that W = b1 Y1 + b2 Y2 , where b1 = 3 and b2 = 1. Thus, Cov(U, W ) = a1 b1 Cov(Y1 , Y1 ) + a1 b2 Cov(Y1 , Y2 ) + a2 b1 Cov(Y2 , Y1 ) + a2 b2 Cov(Y2 , Y2 ) + a3 b1 Cov(Y3 , Y1 ) + a3 b2 Cov(Y3 , Y2 ).

272

Chapter 5

Multivariate Probability Distributions

Notice that, as established in Exercise 5.96, Cov(Yi , Y j ) = Cov(Y j , Yi ) and Cov(Yi , Yi ) = V (Yi ). Therefore, Cov(U, W ) = (1)(3)(1) + (1)(1)(−0.4) + (−2)(3)(−0.4) + (−2)(1)(3) + (1)(3)(1/2) + (1)(1)(2) = 2.5. Because Cov(U, W ) =

0, it follows that U and W are dependent.

We now proceed with the proof of Theorem 5.12. Proof

The theorem consists of three parts, of which (a) follows directly from Theorems 5.7 and 5.8. To prove (b), we appeal to the deﬁnition of variance and write 2 n n 2 ai Yi − ai µi V (U1 ) = E [U1 − E(U1 )] = E =E

n

= E

=

i=1

ai (Yi − µi )

i=1 n

ai2 (Yi − µi )2 +

i=1 n

i=1

2

− µi ) +

ai2 E(Yi

n n i=1 i=1 i=

j

n n

2

i=1

i=1 i=1 i=

j

ai a j (Yi − µi )(Y j − µ j )

ai a j E (Yi − µi )(Y j − µ j ) .

By the deﬁnitions of variance and covariance, we have n n n 2 ai V (Yi ) + ai a j Cov(Yi , Y j ). V (U1 ) = i=1 i=1 i=

j

i=1

Because Cov(Yi , Y j ) = Cov(Y j , Yi ), we can write V (U1 ) =

n

ai2 V (Yi ) + 2

i=1

ai a j Cov(Yi , Y j ).

1≤i< j≤n

Similar steps can be used to obtain (c). We have Cov(U1 , U2 ) = E{[U1 − E(U1 )] [U2 − E(U2 )]} n n m m =E ai Yi − ai µi bj X j − bjξj =E

i=1 n i=1

i=1

m

ai (Yi − µi )

j=1

j=1

j=1

b j (X j − ξ j )

5.8

The Expected Value and Variance of Linear Functions of Random Variables

=E

n m

273

ai b j (Yi − µi )(X j − ξ j )

i=1 j=1

=

n m

ai b j E[(Yi − µi )(X j − ξ j )]

i=1 j=1

=

n m

ai b j Cov(Yi , X j ).

i=1 j=1

On observing that Cov(Yi , Yi ) = V (Yi ), we can see that (b) is a special case of (c).

EXAMPLE 5.26

Solution

Refer to Examples 5.4 and 5.20. In Example 5.20, we were interested in Y1 − Y2 , the proportional amount of gasoline remaining at the end of a week. Find the variance of Y1 − Y2 . Using Theorem 5.12, we have V (Y1 − Y2 ) = V (Y1 ) + V (Y2 ) − 2 Cov(Y1 , Y2 ). Because

f 1 (y1 ) =

and

f 2 (y2 ) =

it follows that

" E(Y 21 )

1

= "

E(Y 22 ) =

0 1 0

3y12 , 0 ≤ y1 ≤ 1, 0,

elsewhere,

(3/2)(1 − y22 ), 0 ≤ y2 ≤ 1, 0, elsewhere,

3y14 dy1 =

3 , 5

3 1 1 3 2 1 y2 (1 − y22 ) dy2 = − = . 2 2 3 5 5

From Example 5.20, we have E(Y1 ) = 3/4 and E(Y2 ) = 3/8. Thus, V (Y1 ) = (3/5) − (3/4)2 = .04

and

V (Y2 ) = (1/5) − (3/8)2 = .06.

In Example 5.22, we determined that Cov(Y1 , Y2 ) = .02. Therefore, V (Y1 − Y2 ) = V (Y1 ) + V (Y2 ) − 2 Cov(Y1 , Y2 ) = .04 + .06 − 2(.02) = .06. √ The standard deviation of Y1 − Y2 is then .06 = .245.

274

Chapter 5

Multivariate Probability Distributions

E X A M PL E 5.27

Let Y1 , Y2 , . . . , Yn be independent random variables with E(Yi ) = µ and V (Yi ) = σ 2 . (These variables may denote the outcomes of n independent trials of an experiment.) Deﬁne n 1 Y = Yi n i=1 and show that E(Y ) = µ and V (Y ) = σ 2 /n.

Solution

Notice that Y is a linear function of Y1 , Y2 , . . . , Yn with all constants ai equal to 1/n. That is, 1 1 Y = Y1 + · · · + Yn . n n By Theorem 5.12(a), E(Y ) =

n

ai µi =

i=1

n

ai µ = µ

i=1

n

ai = µ

i=1

n nµ 1 = = µ. n n i=1

By Theorem 5.12(b), V (Y ) =

n

ai2 V (Yi )

+2

i=1

n n i=1 i=1 i< j

ai a j Cov(Yi , Y j ).

The covariance terms all are zero because the random variables are independent. Thus, n 2 n 2 n 1 1 1 nσ 2 σ2 V (Y ) = . σi2 = σ2 = 2 σ2 = 2 = n n n i=1 n n i=1 i=1

E X A M PL E 5.28

The number of defectives Y in a sample of n = 10 items selected from a manufacturing process follows a binomial probability distribution. An estimator of the fraction defective in the lot is the random variable pˆ = Y /n. Find the expected value and variance of pˆ .

Solution

The term pˆ is a linear function of a single random variable Y , where pˆ = a1 Y and a1 = 1/n. Then by Theorem 5.12, E( pˆ ) = a1 E(Y ) =

1 E(Y ). n

The expected value and variance of a binomial random variable are np and npq, respectively. Substituting for E(Y ), we obtain E( pˆ ) =

1 (np) = p. n

5.8

The Expected Value and Variance of Linear Functions of Random Variables

275

Thus, the expected value of the number of defectives Y , divided by the sample size, is p. Similarly 2 1 pq . V ( pˆ ) = a12 V (Y ) = npq = n n

EXAMPLE 5.29

Suppose that an urn contains r red balls and (N − r ) black balls. A random sample of n balls is drawn without replacement and Y , the number of red balls in the sample, is observed. From Chapter 3 we know that Y has a hypergeometric probability distribution. Find the mean and variance of Y .

Solution

We will ﬁrst observe some characteristics of sampling without replacement. Suppose that the sampling is done sequentially and we observe outcomes for X 1 , X 2 , . . . , X n , where 1, if the ith draw results in a red ball, Xi = 0, otherwise. Unquestionably, P(X 1 = 1) = r/N . But it is also true that P(X 2 = 1) = r/N because P(X 2 = 1) = P(X 1 = 1, X 2 = 1) + P(X 1 = 0, X 2 = 1) = P(X 1 = 1)P(X 2 = 1|X 1 = 1) + P(X 1 = 0)P(X 2 = 1|X 1 = 0) r r − 1 N −r r r r (N − 1) = . = + = N N −1 N N −1 N (N − 1) N The same is true for X k ; that is, P(X k = 1) =

r , N

k = 1, 2, . . . , n.

Thus, the (unconditional) probability of drawing a red ball on any draw is r/N . In a similar way it can be shown that P(X j = 1, X k = 1) = Now, observe that Y =

n i=1

E(Y ) =

r (r − 1) , N (N − 1)

j=

k.

X i , and, hence,

n i=1

E(X i ) =

n r i=1

N

=n

r N

.

To ﬁnd V (Y ) we need V (X i ) and Cov(X i , X j ). Because X i is 1 with probability r/N and 0 with probability 1 − (r/N ), it follows that r r 1− . V (X i ) = N N

276

Chapter 5

Multivariate Probability Distributions

Also, Cov(X i , X j ) = E(X i X j ) − E(X i )E(X j ) =

r 2 r (r − 1) − N (N − 1) N

r 1 r 1− N N N −1 because X i X j = 1 if and only if X i = 1 and X j = 1 and X i X j = 0 otherwise. From Theorem 5.12, we know that n V (Y ) = V (X i ) + 2 Cov(X i , X j ) =−

i=1

i< j

r r 1 r = 1− +2 1− − N N N N N −1 i=1 i< j r r r r 1 =n 1− − n(n − 1) 1− N N N N N −1 because the double summation contains n(n −1)/2equal terms. A little algebra yields r r N −n V (Y ) = n 1− . N N N −1 n r

To appreciate the usefulness of Theorem 5.12, notice that the derivations contained in Example 5.29 are much simpler than those outlined in Exercise 3.216, where the mean and variance were derived by using the probabilities associated with the hypergeometric distribution.

Exercises 5.102

A ﬁrm purchases two types of industrial chemicals. Type I chemical costs $3 per gallon, whereas type II costs $5 per gallon. The mean and variance for the number of gallons of type I chemical purchased, Y1 , are 40 and 4, respectively. The amount of type II chemical purchased, Y2 , has E(Y2 ) = 65 gallons and V (Y2 ) = 8. Assume that Y1 and Y2 are independent and ﬁnd the mean and variance of the total amount of money spent per week on the two chemicals.

5.103

Assume that Y1 , Y2 , and Y3 are random variables, with E(Y1 ) = 2, V (Y1 ) = 4, Cov(Y1 , Y2 ) = 1,

E(Y2 ) = −1, V (Y2 ) = 6, Cov(Y1 , Y3 ) = −1,

E(Y3 ) = 4, V (Y3 ) = 8, Cov(Y2 , Y3 ) = 0.

Find E(3Y1 + 4Y2 − 6Y3 ) and V (3Y1 + 4Y2 − 6Y3 ).

5.104

In Exercise 5.3, we determined that the joint probability distribution of Y1 , the number of married executives, and Y2 , the number of never-married executives, is given by 3 2 4 y2 3 − y1 − y2 y1 p(y1 , y2 ) = 9 3

Exercises

277

where y1 and y2 are integers, 0 ≤ y1 ≤ 3, 0 ≤ y2 ≤ 3, and 1 ≤ y1 + y2 ≤ 3. a Find E(Y1 + Y2 ) and V (Y1 + Y2 ) by ﬁrst ﬁnding the probability distribution of Y1 + Y2 . b In Exercise 5.90, we determined that Cov(Y1 , Y2 ) = −1/3. Find E(Y1 + Y2 ) and V (Y1 + Y2 ) by using Theorem 5.12.

5.105

In Exercise 5.8, we established that f (y1 , y2 ) =

4y1 y2 ,

0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1,

0,

elsewhere

is a valid joint probability density function. In Exercise 5.52, we established that Y1 and Y2 are independent; in Exercise 5.76, we determined that E(Y1 − Y2 ) = 0 and found the value for V (Y1 ). Find V (Y1 − Y2 ).

5.106

In Exercise 5.9, we determined that f (y1 , y2 ) =

6(1 − y2 ),

0 ≤ y1 ≤ y2 ≤ 1,

0,

elsewhere

is a valid joint probability density function. In Exercise 5.76, we derived the fact that E(Y1 −3Y2 ) = −5/4; in Exercise 5.92, we proved that Cov(Y1 , Y2 ) = 1/40. Find V (Y1 −3Y2 ).

5.107

In Exercise 5.12, we were given the following joint probability density function for the random variables Y1 and Y2 , which were the proportions of two components in a sample from a mixture of insecticide: 2, 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, 0 ≤ y1 + y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. For the two chemicals under consideration, an important quantity is the total proportion Y1 + Y2 found in any sample. Find E(Y1 + Y2 ) and V (Y1 + Y2 ).

5.108

If Y1 is the total time between a customer’s arrival in the store and departure from the service window and if Y2 is the time spent in line before reaching the window, the joint density of these variables was given in Exercise 5.15 to be −y e 1 , 0 ≤ y2 ≤ y1 ≤ ∞, f (y1 , y2 ) = 0, elsewhere. The random variable Y1 − Y2 represents the time spent at the service window. Find E(Y1 − Y2 ) and V (Y1 − Y2 ). Is it highly likely that a randomly selected customer would spend more than 4 minutes at the service window?

5.109

In Exercise 5.16, Y1 and Y2 denoted the proportions of time that employees I and II actually spent working on their assigned tasks during a workday. The joint density of Y1 and Y2 is given by y1 + y2 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. In Exercise 5.80, we derived the mean of the productivity measure 30Y1 + 25Y2 . Find the variance of this measure of productivity. Give an interval in which you think the total productivity measures of the two employees should lie for at least 75% of the days in question.

5.110

Suppose that Y1 and Y2 have correlation coefﬁcient ρ = .2. What is is the value of the correlation coefﬁcient between

278

Chapter 5

Multivariate Probability Distributions

a 1 + 2Y1 and 3 + 4Y2 ? b 1 + 2Y1 and 3 − 4Y2 ? c 1 − 2Y1 and 3 − 4Y2 ?

5.111

A retail grocery merchant ﬁgures that her daily gain X from sales is a normally distributed random variable with µ = 50 and σ = 3 (measurements in dollars). X can be negative if she is forced to dispose of enough perishable goods. Also, she ﬁgures daily overhead costs Y to have a gamma distribution with α = 4 and β = 2. If X and Y are independent, ﬁnd the expected value and variance of her net daily gain. Would you expect her net gain for tomorrow to rise above $70?

5.112

In Exercise 5.18, Y1 and Y2 denoted the lengths of life, in hundreds of hours, for components of types I and II, respectively, in an electronic system. The joint density of Y1 and Y2 is (1/8)y1 e−(y1 +y2 )/2 , y1 > 0, y2 > 0, f (y1 , y2 ) = 0, elsewhere. The cost C of replacing the two components depends upon their length of life at failure and is given by C = 50 + 2Y1 + 4Y2 . Find E(C) and V (C).

5.113

Suppose that Y1 and Y2 have correlation coefﬁcient ρY1 ,Y2 and for constants a, b, c and d let W1 = a + bY1 and W2 = c + dY2 . a b

5.114

Show that the correlation coefﬁcient between W1 and W2 , ρW1 ,W2 , is such that |ρY1 ,Y2 | = |ρW1 ,W2 |. Does this result explain the results that you obtained in Exercise 5.110?

For the daily output of an industrial operation, let Y1 denote the amount of sales and Y2 , the costs, in thousands of dollars. Assume that the density functions for Y1 and Y2 are given by (1/6)y13 e−y1 , y1 > 0, (1/2)e−y2 /2 , y2 > 0, and f 2 (y2 ) = f 1 (y1 ) = 0, y1 ≤ 0, 0, y2 ≤ 0. The daily proﬁt is given by U = Y1 − Y2 . a Find E(U ). b Assuming that Y1 and Y2 are independent, ﬁnd V (U ). c Would you expect the daily proﬁt to drop below zero very often? Why?

5.115

Refer to Exercise 5.88. If Y denotes the number of tosses of the die until you observe each of the six faces, Y = Y1 + Y2 + Y3 + Y4 + Y5 + Y6 where Y1 is the trial on which the ﬁrst face is tossed, Y1 = 1, Y2 is the number of additional tosses required to get a face different than the ﬁrst, Y3 is the number of additional tosses required to get a face different than the ﬁrst two distinct faces, . . . , Y6 is the number of additional tosses to get the last remaining face after all other faces have been observed. a Show that Cov(Yi , Y j ) = 0, i, j = 1, 2, . . . , 6, i =

j. b Use Theorem 5.12 to ﬁnd V (Y ). c Give an interval that will contain Y with probability at least 3/4.

5.116 *5.117

Refer to Exercise 5.75. Use Theorem 5.12 to explain why V (Y1 + Y2 ) = V (Y1 − Y2 ). A population of N alligators is to be sampled in order to obtain an approximate measure of the difference between the proportions of sexually mature males and sexually mature females. Obviously, this parameter has important implications for the future of the population. Assume that n animals are to be sampled without replacement. Let Y1 denote the number of mature

5.9

The Multinomial Probability Distribution

279

females and Y2 the number of mature males in the sample. If the population contains proportions p1 and p2 of mature females and males, respectively (with p1 + p2 < 1), ﬁnd expressions for Y2 Y1 Y2 Y1 − and V − . E n n n n

5.118

The total sustained load on the concrete footing of a planned building is the sum of the dead load plus the occupancy load. Suppose that the dead load X 1 has a gamma distribution with α1 = 50 and β1 = 2, whereas the occupancy load X 2 has a gamma distribution with α2 = 20 and β2 = 2. (Units are in kips.) Assume that X 1 and X 2 are independent. a b

Find the mean and variance of the total sustained load on the footing. Find a value for the sustained load that will be exceeded with probability less than 1/16.

5.9 The Multinomial Probability Distribution Recall from Chapter 3 that a binomial random variable results from an experiment consisting of n trials with two possible outcomes per trial. Frequently we encounter similar situations in which the number of possible outcomes per trial is more than two. For example, experiments that involve blood typing typically have at least four possible outcomes per trial. Experiments that involve sampling for defectives may categorize the type of defects observed into more than two classes. A multinomial experiment is a generalization of the binomial experiment. DEFINITION 5.11

A multinomial experiment possesses the following properties: 1. The experiment consists of n identical trials. 2. The outcome of each trial falls into one of k classes or cells. 3. The probability that the outcome of a single trial falls into cell i, is pi , i = 1, 2, . . . , k and remains the same from trial to trial. Notice that p1 + p2 + p3 + · · · + pk = 1. 4. The trials are independent. 5. The random variables of interest are Y1 , Y2 , . . . , Yk , where Yi equals the number of trials for which the outcome falls into cell i. Notice that Y1 + Y2 + Y3 + · · · + Yk = n. The joint probability function for Y1 , Y2 , . . . , Yk is given by p(y1 , y2 , . . . , yk ) =

n! y y y p 1 p 2 · · · pk k , y1 !y2 ! · · · yk ! 1 2

where k i=1

pi = 1

and

k

yi = n.

i=1

Finding the probability that the n trials in a multinomial experiment result in (Y1 = y1 , Y2 = y2 , . . . , Yk = yk ) is an excellent application of the probabilistic methods of Chapter 2. We leave this problem as an exercise.

280

Chapter 5

Multivariate Probability Distributions

DEFINITION 5.12

k Assume that p1 , p2 , . . . , pk are such that i=1 pi = 1, and pi > 0 for i = 1, 2, . . . , k. The random variables Y1 , Y2 , . . . , Yk , are said to have a multinomial distribution with parameters n and p1 , p2 , . . . , pk if the joint probability function of Y1 , Y2 , . . . , Yk is given by n! y y y p 1 p 2 · · · pk k , p(y1 , y2 , . . . , yk ) = y1 !y2 ! · · · yk ! 1 2 k where, for each i, yi = 0, 1, 2, . . . , n and i=1 yi = n. Many experiments involving classiﬁcation are multinomial experiments. For example, classifying people into ﬁve income brackets results in an enumeration or count corresponding to each of ﬁve income classes. Or we might be interested in studying the reaction of mice to a particular stimulus in a psychological experiment. If the mice can react in one of three ways when the stimulus is applied, the experiment yields the number of mice falling into each reaction class. Similarly, a trafﬁc study might require a count and classiﬁcation of the types of motor vehicles using a section of highway. An industrial process might manufacture items that fall into one of three quality classes: acceptable, seconds, and rejects. A student of the arts might classify paintings into one of k categories according to style and period, or we might wish to classify philosophical ideas of authors in a study of literature. The result of an advertising campaign might yield count data indicating a classiﬁcation of consumer reactions. Many observations in the physical sciences are not amenable to measurement on a continuous scale and hence result in enumerative data that correspond to the numbers of observations falling into various classes. Notice that the binomial experiment is a special case of the multinomial experiment (when there are k = 2 classes).

E X A M PL E 5.30

According to recent census ﬁgures, the proportions of adults (persons over 18 years of age) in the United States associated with ﬁve age categories are as given in the following table. Age

Proportion

18–24 25–34 35–44 45–64 65↑

.18 .23 .16 .27 .16

If these ﬁgures are accurate and ﬁve adults are randomly sampled, ﬁnd the probability that the sample contains one person between the ages of 18 and 24, two between the ages of 25 and 34, and two between the ages of 45 and 64. Solution

We will number the ﬁve age classes 1, 2, 3, 4, and 5 from top to bottom and will assume that the proportions given are the probabilities associated with each of the

5.9

The Multinomial Probability Distribution

281

classes. Then we wish to ﬁnd n! y y y y y p 1 p 2 p 3 p 4 p 5, y1 ! y2 ! y3 ! y4 ! y5 ! 1 2 3 4 5 for n = 5 and y1 = 1, y2 = 2, y3 = 0, y4 = 2, and y5 = 0. Substituting these values into the formula for the joint probability function, we obtain 5! p(1, 2, 0, 2, 0) = (.18)1 (.23)2 (.16)0 (.27)2 (.16)0 1! 2! 0! 2! 0! = 30(.18)(.23)2 (.27)2 = .0208. p(y1 , y2 , y3 , y4 , y5 ) =

THEOREM 5.13

If Y1 , Y2 , . . . , Yk have a multinomial distribution with parameters n and p1 , p2 , . . . , pk , then 1. E(Yi ) = npi , V (Yi ) = npi qi .

t. 2. Cov(Ys , Yt ) = −nps pt , if s =

Proof

The marginal distribution of Yi can be used to derive the mean and variance. Recall that Yi may be interpreted as the number of trials falling into cell i. Imagine all of the cells, excluding cell i, combined into a single large cell. Then every trial will result in cell i or in a cell other than cell i, with probabilities pi and 1 − pi , respectively. Thus, Yi possesses a binomial marginal probability distribution. Consequently, E(Yi ) = npi

and

V (Yi ) = npi qi ,

where qi = 1 − pi .

The same results can be obtained by setting up the expectations and evaluating. For example, n! y y y p1 1 p2 2 · · · pk k . ··· y1 E(Y1 ) = y !y ! · · · y ! 1 2 k y1 y2 yk Because we have already derived the expected value and variance of Yi , we leave the summation of this expectation to the interested reader. The proof of part 2 uses Theorem 5.12. Think of the multinomial experiment as a sequence of n independent trials and deﬁne, for s =

t, $ 1, if trial i results in class s, Ui = 0, otherwise, and $ 1, if trial i results in class t, Wi = 0, otherwise. Then n n Ui and Yt = Wj. Ys = i=1

j=1

282

Chapter 5

Multivariate Probability Distributions

(Because Ui = 1 or 0 depending upon whether the ith trial resulted in class s, Ys is simply the sum of a series of 0s and 1s. A 1 occurs in the sum everytime we observe an item from class s, and a 0 occurs everytime we observe any other class. Thus, Ys is simply the number of times class s is observed. A similar interpretation applies to Yt .) Notice that Ui and Wi cannot both equal 1 (the ith item cannot simultaneously be in classes s and t). Thus, the product Ui Wi always equals zero, and E(Ui Wi ) = 0. The following results allow us to evaluate Cov(Ys , Yt ): E(Ui ) = ps E(W j ) = pt Cov(Ui , W j ) = 0,

if i =

j because the trials are independent

Cov(Ui , Wi ) = E(Ui Wi ) − E(Ui )E(Wi ) = 0 − ps pt From Theorem 5.12, we then have n n Cov(Ui , W j ) Cov(Ys , Yt ) = i=1 j=1

= =

n

Cov(Ui , Wi ) +

i=1

i=

j

(− ps pt ) +

Cov(Ui , W j )

i=

j

i=1 n

0 = −nps pt .

The covariance here is negative, which is to be expected because a large number of outcomes in cell s would force the number in cell t to be small. Inferential problems associated with the multinomial experiment will be discussed later.

Exercises 5.119

A learning experiment requires a rat to run a maze (a network of pathways) until it locates one of three possible exits. Exit 1 presents a reward of food, but exits 2 and 3 do not. (If the rat eventually selects exit 1 almost every time, learning may have taken place.) Let Yi denote the number of times exit i is chosen in successive runnings. For the following, assume that the rat chooses an exit at random on each run. a Find the probability that n = 6 runs result in Y1 = 3, Y2 = 1, and Y3 = 2. b For general n, ﬁnd E(Y1 ) and V (Y1 ). c Find Cov(Y2 , Y3 ) for general n. d To check for the rat’s preference between exits 2 and 3, we may look at Y2 − Y3 . Find E(Y2 − Y3 ) and V (Y2 − Y3 ) for general n.

5.120

A sample of size n is selected from a large lot of items in which a proportion p1 contains exactly one defect and a proportion p2 contains more than one defect (with p1 + p2 < 1). The cost of repairing the defective items in the sample is C = Y1 + 3Y2 , where Y1 denotes the number of

5.10

The Bivariate Normal Distribution (Optional)

283

items with one defect and Y2 denotes the number with two or more defects. Find the expected value and variance of C.

5.121

Refer to Exercise 5.117. Suppose that the number N of alligators in the population is very large, with p1 = .3 and p2 = .1. Find the probability that, in a sample of ﬁve alligators, Y1 = 2 and Y2 = 1. Y1 Y2 Y1 Y2 − − b If n = 5, ﬁnd E and V . n n n n a

5.122

The weights of a population of mice fed on a certain diet since birth are assumed to be normally distributed with µ = 100 and σ = 20 (measurement in grams). Suppose that a random sample of n = 4 mice is taken from this population. Find the probability that a exactly two weigh between 80 and 100 grams and exactly one weighs more than 100 grams. b all four mice weigh more than 100 grams.

5.123

The National Fire Incident Reporting Service stated that, among residential ﬁres, 73% are in family homes, 20% are in apartments, and 7% are in other types of dwellings. If four residential ﬁres are independently reported on a single day, what is the probability that two are in family homes, one is in an apartment, and one is in another type of dwelling?

5.124

The typical cost of damages caused by a ﬁre in a family home is $20,000. Comparable costs for an apartment ﬁre and for ﬁre in other dwelling types are $10,000 and $2000, respectively. If four ﬁres are independently reported, use the information in Exercise 5.123 to ﬁnd the a expected total damage cost. b variance of the total damage cost.

5.125

When commercial aircraft are inspected, wing cracks are reported as nonexistent, detectable, or critical. The history of a particular ﬂeet indicates that 70% of the planes inspected have no wing cracks, 25% have detectable wing cracks, and 5% have critical wing cracks. Five planes are randomly selected. Find the probability that a one has a critical crack, two have detectable cracks, and two have no cracks. b at least one plane has critical cracks.

5.126

A large lot of manufactured items contains 10% with exactly one defect, 5% with more than one defect, and the remainder with no defects. Ten items are randomly selected from this lot for sale. If Y1 denotes the number of items with one defect and Y2 , the number with more than one defect, the repair costs are Y1 + 3Y2 . Find the mean and variance of the repair costs.

5.127

Refer to Exercise 5.126. Let Y denote the number of items among the ten that contain at least one defect. Find the probability that Y a equals 2. b is at least 1.

5.10 The Bivariate Normal Distribution (Optional) No discussion of multivariate probability distributions would be complete without reference to the multivariate normal distribution, which is a keystone of much modern statistical theory. In general, the multivariate normal density function is deﬁned for

284

Chapter 5

Multivariate Probability Distributions

k continuous random variables, Y1 , Y2 , . . . , Yk . Because of its complexity, we will present only the bivariate density function (k = 2): f (y1 , y2 ) = where Q=

1 1 − ρ2

e−Q/2 ( , 2πσ1 σ2 1 − ρ 2

−∞ < y1 < ∞, −∞ < y2 < ∞,

(y1 − µ1 )(y2 − µ2 ) (y2 − µ2 )2 (y1 − µ1 )2 . − 2ρ + σ1 σ2 σ12 σ22

The bivariate normal density function is a function of ﬁve parameters: µ1 , µ2 , σ12 , and ρ. The choice of notation employed for these parameters is not coincidental. In Exercise 5.128, you will show that the marginal distributions of Y1 and Y2 are normal distributions with means µ1 and µ2 and variances σ12 and σ22 , respectively. With a bit of somewhat tedious integration, we can show that Cov(Y1 , Y2 ) = ρσ1 σ2 . If Cov(Y1 , Y2 ) = 0—or, equivalently, if ρ = 0—then σ22 ,

f (y1 , y2 ) = g(y1 )h(y2 ), where g(y1 ) is a nonnegative function of y1 alone and h(y2 ) is a nonnegative function of y2 alone. Therefore, if ρ = 0, Theorem 5.5 implies that Y1 and Y2 are independent. Recall that zero covariance for two random variables does not generally imply independence. However, if Y1 and Y2 have a bivariate normal distribution, they are independent if and only if their covariance is zero. The expression for the joint density function, k > 2, is most easily expressed by using the matrix algebra. A discussion of the general case can be found in the references at the end of this chapter.

Exercises *5.128

Let Y1 and Y2 have a bivariate normal distribution. a Show that the marginal distribution of Y1 is normal with mean µ1 and variance σ12 . b What is the marginal distribution of Y2 ?

*5.129

Let Y1 and Y2 have a bivariate normal distribution. Show that the conditional distribution of σ1 Y1 given that Y2 = y2 is a normal distribution with mean µ1 + ρ (y2 − µ2 ) and variance σ2 σ12 (1 − ρ 2 ).

*5.130

Let Y1 , Y2 , . . . , Yn be independent random variables with E(Yi ) = µ and V (Yi ) = σ 2 for i = 1, 2, . . . , n. Let n n ai Yi and U2 = bi Yi , U1 = i=1

i=1

where a1 , a2 , . . . , an , and b1 , b2 , . . . , bn are constants. U1 and U2 are said to be orthogonal if Cov(U1 , U2 ) = 0. n ai bi = 0. a Show that U1 and U2 are orthogonal if and only if i=1 b Suppose, in addition, that Y1 , Y2 , . . . , Yn have a multivariate normal distribution. Then U1 and U2 have a bivariate normal distribution. Show that U1 and U2 are independent if they are orthogonal.

5.11

*5.131

Conditional Expectations

285

Let Y1 and Y2 be independent normally distributed random variables with means µ1 and µ2 , respectively, and variances σ12 = σ22 = σ 2 . a Show that Y1 and Y2 have a bivariate normal distribution with ρ = 0. b Consider U1 = Y1 + Y2 and U2 = Y1 − Y2 . Use the result in Exercise 5.130 to show that U1 and U2 have a bivariate normal distribution and that U1 and U2 are independent.

*5.132

Refer to Exercise 5.131. What are the marginal distributions of U1 and U2 ?

5.11 Conditional Expectations Section 5.3 contains a discussion of conditional probability functions and conditional density functions, which we will now relate to conditional expectations. Conditional expectations are deﬁned in the same manner as univariate expectations except that conditional densities and probability functions are used in place of their marginal counterparts. DEFINITION 5.13

If Y1 and Y2 are any two random variables, the conditional expectation of g(Y1 ), given that Y2 = y2 , is deﬁned to be " ∞ g(y1 ) f (y1 | y2 ) dy1 E(g(Y1 ) | Y2 = y2 ) = −∞

if Y1 and Y2 are jointly continuous and E(g(Y1 ) | Y2 = y2 ) =

g(y1 ) p(y1 | y2 )

all y1

if Y1 and Y2 are jointly discrete.

EXAMPLE 5.31

Solution

Refer to the random variables Y1 and Y2 of Example 5.8, where the joint density function is given by $ 1/2, 0 ≤ y1 ≤ y2 ≤ 2, f (y1 , y2 ) = 0, elsewhere. Find the conditional expectation of the amount of sales, Y1 , given that Y2 = 1.5. In Example 5.8, we found that, if 0 < y2 ≤ 2, $ 1/y2 , 0 < y1 ≤ y2 , f (y1 | y2 ) = 0, elsewhere. Thus, from Deﬁnition 5.13, for any value of y2 such that 0 < y2 ≤ 2, " ∞ y1 f (y1 | y2 ) dy1 E(Y1 | Y2 = y2 ) = "

−∞

y2

=

y1 0

1 y2

1 dy1 = y2

y12 2

y2 = 0

y2 . 2

286

Chapter 5

Multivariate Probability Distributions

Because we are interested in the value y2 = 1.5, it follows that E(Y1 | Y2 = 1.5) = 1.5/2 = 0.75. That is, if the soft-drink machine contains 1.5 gallons at the start of the day, the expected amount to be sold that day is 0.75 gallon.

In general, the conditional expectation of Y1 given Y2 = y2 is a function of y2 . If we now let Y2 range over all of its possible values, we can think of the conditional expectation E(Y1 | Y2 ) as a function of the random variable Y2 . In Example 5.31, we obtained E(Y1 | Y2 = y2 ) = y2 /2. It follows that E(Y1 | Y2 ) = Y2 /2. Because E(Y1 | Y2 ) is a function of the random variable Y2 , it is itself a random variable; and as such, it has a mean and a variance. We consider the mean of this random variable in Theorem 5.14 and the variance in Theorem 5.15. THEOREM 5.14

Let Y1 and Y2 denote random variables. Then E(Y1 ) = E[E(Y1 | Y2 )], where on the right-hand side the inside expectation is with respect to the conditional distribution of Y1 given Y2 and the outside expectation is with respect to the distribution of Y2 .

Proof

Suppose that Y1 and Y2 are jointly continuous with joint density function f (y1 , y2 ) and marginal densities f 1 (y1 ) and f 2 (y2 ), respectively. Then " ∞" ∞ y1 f (y1 , y2 ) dy1 dy2 E(Y1 ) = " = = =

−∞ ∞

"

−∞ ∞

y1 f (y1 | y2 ) f 2 (y2 ) dy1 dy2

−∞ " ∞

−∞ $" ∞

−∞ " ∞

−∞

−∞

) y1 f (y1 | y2 ) dy1

f 2 (y2 ) dy2

E(Y1 | Y2 = y2 ) f 2 (y2 ) dy2 = E [E(Y1 | Y2 )] .

The proof is similar for the discrete case.

E X A M PL E 5.32

A quality control plan for an assembly line involves sampling n = 10 ﬁnished items per day and counting Y , the number of defectives. If p denotes the probability of observing a defective, then Y has a binomial distribution, assuming that a large number of items are produced by the line. But p varies from day to day and is assumed to have a uniform distribution on the interval from 0 to 1/4. Find the expected value of Y .

Solution

From Theorem 5.14, we know that E(Y ) = E [E(Y | p)]. For a given p, Y has a binomial distribution, and hence E(Y | p) = np. Thus, n 1/4 − 0 = , E(Y ) = E[E(Y | p)] = E(np) = n E( p) = n 2 8

5.11

Conditional Expectations

287

and for n = 10 E(Y ) = 10/8 = 1.25. In the long run, this inspection policy will average 1.25 defectives per day.

The conditional variance of Y1 given Y2 = y2 is deﬁned by analogy with an ordinary variance, again using the conditional density or probability function of Y1 given Y2 = y2 in place of the ordinary density or probability function of Y1 . That is, V (Y1 | Y2 = y2 ) = E(Y12 | Y2 = y2 ) − [E(Y1 | Y2 = y2 )]2 . As in the case of the conditional mean, the conditional variance is a function of y2 . Letting Y2 range over all of its possible values, we can deﬁne V (Y1 | Y2 ) as a random variable that is a function of Y2 . Speciﬁcally, if g(y2 ) = V (Y1 | Y2 = y2 ) is a particular function of the observed value, y2 , then g(Y2 ) = V (Y1 | Y2 ) is the same function of the random variable, Y2 . The expected value of V (Y1 | Y2 ) is useful in computing the variance of Y1 , as detailed in Theorem 5.15.

THEOREM 5.15

Let Y1 and Y2 denote random variables. Then V (Y1 ) = E V (Y1 | Y2 ) + V E(Y1 | Y2 ) .

Proof

As previously indicated, V (Y1 | Y2 ) is given by 2 V (Y1 | Y2 ) = E(Y12 | Y2 ) − E(Y1 | Y2 ) and

% 2 & E V (Y1 | Y2 ) = E E(Y12 | Y2 ) − E E(Y1 | Y2 ) .

By deﬁnition, 2 % &2 − E E(Y1 | Y2 ) . V E(Y1 | Y2 ) = E E(Y1 | Y2 ) The variance of Y1 is 2 V (Y1 ) = E Y12 − E(Y1 ) & % % &2 = E E Y12 | Y2 − E E(Y1 | Y2 ) & % % % 2 & 2 & = E E Y12 | Y2 − E E(Y1 | Y2 ) + E E(Y1 | Y2 ) &2 % − E E(Y1 | Y2 ) = E V (Y1 | Y2 ) + V E(Y1 | Y2 ) .

288

Chapter 5

Multivariate Probability Distributions

E X A M PL E 5.33 Solution

Refer to Example 5.32. Find the variance of Y . From Theorem 5.15 we know that V (Y1 ) = E V (Y1 | Y2 ) + V E(Y1 | Y2 ) . For a given p, Y has a binomial distribution, and hence E(Y | p) = np and V (Y | p) = npq. Thus, V (Y ) = E V (Y | p) + V E(Y | p) = E(npq) + V (np) = n E [ p(1 − p)] + n 2 V ( p). Because p is uniformly distributed on the interval (0, 1/4) and E( p 2 ) = V ( p) + [E( p)]2 , it follows that E( p) = Thus,

1 , 8

V ( p) =

(1/4 − 0)2 1 = , 12 192

E( p 2 ) =

1 1 1 + = . 192 64 48

V (Y ) = n E [ p(1 − p)] + n 2 V ( p) = n E( p) − E( p 2 ) + n 2 V ( p) 1 n2 1 5n 1 − + n2 = + , =n 8 48 192 48 192

and for n = 10, V (Y ) = 50/48 + 100/192 = 1.5625. √ Thus, the standard deviation of Y is σ = 1.5625 = 1.25.

The mean and variance of Y calculated in Examples 5.32 and 5.33 could be checked by ﬁnding the unconditional probability function of Y and computing E(Y ) and V (Y ) directly. In doing so, we would need to ﬁnd the joint distribution of Y and p. From this joint distribution, the marginal probability function of Y can be obtained and E(Y ) determined by evaluating y yp(y). The variance can be determined in the usual manner, again using the marginal probability function of Y . In Examples 5.32 and 5.33, we avoided working directly with these joint and marginal distributions. Theorems 5.14 and 5.15 permitted a much quicker calculation of the desired mean and variance. As always, the mean and variance of a random variable can be used with Tchebysheff’s theorem to provide bounds for probabilities when the distribution of the variable is unknown or difﬁcult to derive. In Examples 5.32 and 5.33, we encountered a situation where the distribution of a random variable (Y = the number of defectives) was given conditionally for possible values of a quantity p that could vary from day to day. The fact that p varied was accommodated by assigning a probability distribution to this variable. This is an example of a hierarchical model. In such models, the distribution of a variable of interest, say, Y , is given, conditional on the value of a “parameter” θ. Uncertainty about the actual value of θ is modeled by assigning a probability distribution to it. Once we specify the conditional distribution of Y given θ and the marginal distribution

Exercises

289

of θ, the joint distribution of Y and θ is obtained by multiplying the conditional by the marginal. The marginal distribution of Y is then obtained from the joint distribution by integrating or summing over the possible values of θ. The results of this section can be used to ﬁnd E(Y ) and V (Y ) without ﬁnding this marginal distribution. Other examples of hierarchical models are contained in Exercises 5.136, 5.138, 5.141 and 5.142.

Exercises 5.133

In Exercise 5.9, we determined that f (y1 , y2 ) =

6(1 − y2 ), 0 ≤ y1 ≤ y2 ≤ 1, 0,

elsewhere

is a valid joint probability density function. a Find E(Y1 |Y2 = y2 ). b Use the answer derived in part (a) to ﬁnd E(Y1 ). (Compare this with the answer found in Exercise 5.77.)

5.134

In Examples 5.32 and 5.33, we determined that if Y is the number of defectives, E(Y ) = 1.25 and V (Y ) = 1.5625. Is it likely that, on any given day, Y will exceed 6?

5.135

In Exercise 5.41, we considered a quality control plan that calls for randomly selecting three items from the daily production (assumed large) of a certain machine and observing the number of defectives. The proportion p of defectives produced by the machine varies from day to day and has a uniform distribution on the interval (0, 1). Find the a b

5.136

expected number of defectives observed among the three sampled items. variance of the number of defectives among the three sampled.

In Exercise 5.42, the number of defects per yard in a certain fabric, Y , was known to have a Poisson distribution with parameter λ. The parameter λ was assumed to be a random variable with a density function given by e−λ , λ ≥ 0, f (λ) = 0, elsewhere. a Find the expected number of defects per yard by ﬁrst ﬁnding the conditional expectation of Y for given λ. b Find the variance of Y . c Is it likely that Y exceeds 9?

5.137

In Exercise 5.38, we assumed that Y1 , the weight of a bulk item stocked by a supplier, had a uniform distribution over the interval (0, 1). The random variable Y2 denoted the weight of the item sold and was assumed to have a uniform distribution over the interval (0, y1 ), where y1 was a speciﬁc value of Y1 . If the supplier stocked 3/4 ton, what amount could be expected to be sold during the week?

5.138

Assume that Y denotes the number of bacteria per cubic centimeter in a particular liquid and that Y has a Poisson distribution with parameter λ. Further assume that λ varies from location to location and has a gamma distribution with parameters α and β, where α is a positive integer. If we randomly select a location, what is the

290

Chapter 5

Multivariate Probability Distributions

a expected number of bacteria per cubic centimeter? b standard deviation of the number of bacteria per cubic centimeter?

5.139

Suppose that a company has determined that the the number of jobs per week, N , varies from week to week and has a Poisson distribution with mean λ. The number of hours to complete each job, Yi , is gamma N distributed with parameters α and β. The total time to complete all jobs in a week is T = i=1 Yi . Note that T is the sum of a random number of random variables. What is a b

E(T | N = n)? E(T ), the expected total time to complete all jobs?

5.140

Why is E[V (Y1 |Y2 )] ≤ V (Y1 )?

5.141

Let Y1 have an exponential distribution with mean λ and the conditional density of Y2 given Y1 = y1 be 1/y1 , 0 ≤ y2 ≤ y1 , f (y2 | y1 ) = 0, elsewhere. Find E(Y2 ) and V (Y2 ), the unconditional mean and variance of Y2 .

5.142

Suppose that Y has a binomial distribution with parameters n and p but that p varies from day to day according to a beta distribution with parameters α and β. Show that a b

*5.143

E(Y ) = nα/(α + β). nαβ(α + β + n) V (Y ) = . (α + β)2 (α + β + 1)

If Y1 and Y2 are independent random variables, each having a normal distribution with mean 0 and variance 1, ﬁnd the moment-generating function of U = Y1 Y2 . Use this moment-generating function to ﬁnd E(U ) and V (U ). Check the result by evaluating E(U ) and V (U ) directly from the density functions for Y1 and Y2 .

5.12 Summary The multinomial experiment (Section 5.9) and its associated multinomial probability distribution convey the theme of this chapter. Most experiments yield sample measurements, y1 , y2 , . . . , yk , which may be regarded as observations on k random variables. Inferences about the underlying structure that generates the observations— the probabilities of falling into cells 1, 2, . . . , k—are based on knowledge of the probabilities associated with various samples (y1 , y2 , . . . , yk ). Joint, marginal, and conditional distributions are essential concepts in ﬁnding the probabilities of various sample outcomes. Generally we draw from a population a sample of n observations, which are speciﬁc values of Y1 , Y2 , . . . , Yn . Many times the random variables are independent and have the same probability distribution. As a consequence, the concept of independence is useful in ﬁnding the probability of observing the given sample. The objective of this chapter has been to convey the ideas contained in the two preceding paragraphs. The numerous details contained in the chapter are essential in providing a solid background for a study of inference. At the same time, you should be careful to avoid overemphasis on details; be sure to keep the broader inferential objectives in mind.

Supplementary Exercises

291

References and Further Readings Hoel, P. G. 1984. Introduction to Mathematical Statistics, 5th ed. New York: Wiley. Hogg, R. V., A. T. Craig, and J. W. McKean. 2005. Introduction to Mathematical Statistics, 6th ed. Upper Saddle River, N.J.: Pearson Prentice Hall. Mood, A. M., F. A. Graybill, and D. Boes. 1974. Introduction to the Theory of Statistics, 3d ed. New York: McGraw-Hill. Myers, R. H. 2000. Classical and Modern Regression with Applications, 2d ed. Paciﬁc Grove, CA: Duxbury Press. Parzen, E. 1992. Modern Probability Theory and Its Applications. New York: Wiley-Interscience.

Supplementary Exercises 5.144

Prove Theorem 5.9 when Y1 and Y2 are independent discrete random variables.

5.145

A technician starts a job at a time Y1 that is uniformly distributed between 8:00 A.M. and 8:15 A.M. The amount of time to complete the job, Y2 , is an independent random variable that is uniformly distributed between 20 and 30 minutes. What is the probability that the job will be completed before 8:30 A.M.?

5.146

A target for a bomb is in the center of a circle with radius of 1 mile. A bomb falls at a randomly selected point inside that circle. If the bomb destroys everything within 1/2 mile of its landing point, what is the probability that the target is destroyed?

5.147

Two friends are to meet at the library. Each independently and randomly selects an arrival time within the same one-hour period. Each agrees to wait a maximum of ten minutes for the other to arrive. What is the probability that they will meet?

5.148

A committee of three people is to be randomly selected from a group containing four Republicans, three Democrats, and two independents. Let Y1 and Y2 denote numbers of Republicans and Democrats, respectively, on the committee. a What is the joint probability distribution for Y1 and Y2 ? b Find the marginal distributions of Y1 and Y2 . c Find P(Y1 = 1|Y2 ≥ 1).

5.149

Let Y1 and Y2 have a joint density function given by f (y1 , y2 ) =

a b c d

3y1 ,

0 ≤ y2 ≤ y1 ≤ 1,

0,

elsewhere.

Find the marginal density functions of Y1 and Y2 . Find P(Y1 ≤ 3/4|Y2 ≤ 1/2). Find the conditional density function of Y1 given Y2 = y2 . Find P(Y1 ≤ 3/4|Y2 = 1/2).

292

Chapter 5

Multivariate Probability Distributions

5.150

Refer to Exercise 5.149. a Find E(Y2 |Y1 = y1 ). b Use Theorem 5.14 to ﬁnd E(Y2 ). c Find E(Y2 ) directly from the marginal density of Y2 .

5.151

The lengths of life Y for a type of fuse has an exponential distribution with a density function given by (1/β)e−y/β , y ≥ 0, f (y) = 0, elsewhere. a b

5.152

If two such fuses have independent life lengths Y1 and Y2 , ﬁnd their joint probability density function. One fuse from part (a) is in a primary system, and the other is in a backup system that comes into use only if the primary system fails. The total effective life length of the two fuses, therefore, is Y1 + Y2 . Find P(Y1 + Y2 ≤ a), where a > 0.

In the production of a certain type of copper, two types of copper powder (types A and B) are mixed together and sintered (heated) for a certain length of time. For a ﬁxed volume of sintered copper, the producer measures the proportion Y1 of the volume due to solid copper (some pores will have to be ﬁlled with air) and the proportion Y2 of the solid mass due to type A crystals. Assume that appropriate probability densities for Y1 and Y2 are 6y1 (1 − y1 ), 0 ≤ y1 ≤ 1, f 1 (y1 ) = 0, elsewhere, 2 3y2 , 0 ≤ y2 ≤ 1, f 2 (y2 ) = 0, elsewhere. The proportion of the sample volume due to type A crystals is then Y1 Y2 . Assuming that Y1 and Y2 are independent, ﬁnd P(Y1 Y2 ≤ .5).

5.153

Suppose that the number of eggs laid by a certain insect has a Poisson distribution with mean λ. The probability that any one egg hatches is p. Assume that the eggs hatch independently of one another. Find the a expected value of Y , the total number of eggs that hatch. b variance of Y .

5.154

In a clinical study of a new drug formulated to reduce the effects of rheumatoid arthritis, researchers found that the proportion p of patients who respond favorably to the drug is a random variable that varies from batch to batch of the drug. Assume that p has a probability density function given by 12 p2 (1 − p), 0 ≤ p ≤ 1, f ( p) = 0, elsewhere. Suppose that n patients are injected with portions of the drug taken from the same batch. Let Y denote the number showing a favorable response. Find a b

the unconditional probability distribution of Y for general n. E(Y ) for n = 2.

Supplementary Exercises

5.155

293

Suppose that Y1 , Y2 , and Y3 are independent χ 2 -distributed random variables with ν1 , ν2 , and ν3 degrees of freedom, respectively, and that W1 = Y1 + Y2 and W2 = Y1 + Y3 . a In Exercise 5.87, you derived the mean and variance of W1 . Find Cov(W1 , W2 ). b Explain why you expected the answer to part (a) to be positive.

5.156

Refer to Exercise 5.86. Suppose that Z is a standard normal random variable and that Y is an independent χ 2 random variable with ν degrees of freedom. √ a Deﬁne W = Z / Y . Find Cov(Z , W ). What assumption do you need about the value of ν? b With Z , Y , and W as above, ﬁnd Cov(Y, W ). c One of the covariances from parts (a) and (b) is positive, and the other is zero. Explain why.

5.157

A forester studying diseased pine trees models the number of diseased trees per acre, Y , as a Poisson random variable with mean λ. However, λ changes from area to area, and its random behavior is modeled by a gamma distribution. That is, for some integer α, 1 λα−1 e−λ/β , λ > 0, α (α)β f (λ) = 0, elsewhere. Find the unconditional probability distribution for Y .

5.158

A coin has probability p of coming up heads when tossed. In n independent tosses of the coin, let X i = 1 if the ith toss results in heads and X i = 0 if the ith toss results in tails. Then Y , thenumber of heads in the n tosses, has a binomial distribution and can be represented as n Y = i=1 X i . Find E(Y ) and V (Y ), using Theorem 5.12.

*5.159

The negative binomial random variable Y was deﬁned in Section 3.6 as the number of the trial on which the r th success occurs, in a sequence of independent trials with constant probability p of success on each trial. Let X i denote a random variable deﬁned as the number of the trial on which the ith success occurs, for i = 1, 2, . . . , r . Now deﬁne Wi = X i − X i−1 ,

i = 1, 2, . . . , r,

where X 0 is deﬁned to be zero. Then we can write Y = ri=1 Wi . Notice that the random variables W1 , W2 , . . . , Wr have identical geometric distributions and are mutually independent. Use Theorem 5.12 to show that E(Y ) = r/ p and V (Y ) = r (1 − p)/ p 2 .

5.160

A box contains four balls, numbered 1 through 4. One ball is selected at random from this box. Let X 1 = 1 if ball 1 or ball 2 is drawn, X 2 = 1 if ball 1 or ball 3 is drawn, X 3 = 1 if ball 1 or ball 4 is drawn. The X i values are zero otherwise. Show that any two of the random variables X 1 , X 2 , and X 3 are independent but that the three together are not.

5.161

Suppose that we are to observe two independent random samples: Y1 , Y2 , . . . , Yn denoting a random sample from a normal distribution with mean µ1 and variance σ12 ; and X 1 , X 2 , . . . , X m denoting a random sample from another normal distribution with mean µ2 and variance σ22 . An approximation for µ1 − µ2 is given by Y − X , the difference between the sample means. Find E(Y − X ) and V (Y − X ).

294

Chapter 5

Multivariate Probability Distributions

5.162

In Exercise 5.65, you determined that, for −1 ≤ α ≤ 1, the probability density function of (Y1 , Y2 ) is given by [1 − α{(1 − 2e−y1 )(1 − 2e−y2 )}]e−y1 −y2 , 0 ≤ y1 , 0 ≤ y2 , f (y1 , y2 ) = 0, elsewhere, and is such that the marginal distributions of Y1 and Y2 are both exponential with mean 1. You also showed that Y1 and Y2 are independent if and only if α = 0. Give two speciﬁc and different joint densities that yield marginal densities for Y1 and Y2 that are both exponential with mean 1.

*5.163

Refer to Exercise 5.66. If F1 (y1 ) and F2 (y2 ) are two distribution functions then for any α, −1 ≤ α ≤ 1, F(y1 , y2 ) = F1 (y1 )F2 (y2 )[1 − α{1 − F1 (y1 )}{1 − F2 (y2 )}] is a joint distribution function such that Y1 and Y2 have marginal distribution functions F1 (y1 ) and F2 (y2 ), respectively. If F1 (y1 ) and F2 (y2 ) are both distribution functions associated with exponentially distributed random variables with mean 1, show that the joint density function of Y1 and Y2 is the one given in Exercise 5.162. b If F1 (y1 ) and F2 (y2 ) are both distribution functions associated with uniform (0, 1) random variables, for any α, −1 ≤ α ≤ 1, evaluate F(y1 , y2 ). c Find the joint density functions associated with the distribution functions that you found in part (b). d Give two speciﬁc and different joint densities such that the marginal distributions of Y1 and Y2 are both uniform on the interval (0, 1). a

*5.164

Let X 1 , X 2 , and X 3 be random variables, either continuous or discrete. The joint momentgenerating function of X 1 , X 2 , and X 3 is deﬁned by m(t1 , t2 , t3 ) = E(et1 X 1 +t2 X 2 +t3 X 3 ). a Show that m(t, t, t) gives the moment-generating function of X 1 + X 2 + X 3 . b Show that m(t, t, 0) gives the moment-generating function of X 1 + X 2 . c Show that ∂ k1 +k2 +k3 m(t1 , t2 , t3 ) k1 k2 k3 = E X X X . 1 2 3 ∂t1k1 ∂t2k2 ∂t3k3 t =t =t =0 1

*5.165

2

3

Let X 1 , X 2 , and X 3 have a multinomial distribution with probability function p(x1 , x2 , x3 ) =

n! p x1 p x2 p x3 , x1 !x2 !x3 ! 1 2 3

n

xi = n.

i=1

Use the results of Exercise 5.164 to do the following: a Find the joint moment-generating function of X 1 , X 2 , and X 3 . b Use the answer to part (a) to show that the marginal distribution of X 1 is binomial with parameter p1 . c Use the joint moment-generating function to ﬁnd Cov(X 1 , X 2 ).

*5.166

A box contains N1 white balls, N2 black balls, and N3 red balls (N1 + N2 + N3 = N ). A random sample of n balls is selected from the box (without replacement). Let Y1 , Y2 , and Y3

Supplementary Exercises

295

denote the number of white, black, and red balls, respectively, observed in the sample. Find the correlation coefﬁcient for Y1 and Y2 . (Let pi = Ni /N , for i = 1, 2, 3.)

*5.167

Let Y1 and Y2 be jointly distributed random variables with ﬁnite variances. a Show that [E(Y1 Y2 )]2 ≤ E(Y 21 )E(Y 22 ). [Hint: Observe that E[(tY1 − Y2 )2 ] ≥ 0 for any real number t or, equivalently, t 2 E(Y 21 ) − 2t E(Y1 Y2 ) + E(Y 22 ) ≥ 0. This is a quadratic expression of the form At 2 + Bt + C; and because it is nonnegative, we must have B 2 − 4AC ≤ 0. The preceding inequality follows directly.] b Let ρ denote the correlation coefﬁcient of Y1 and Y2 . Using the inequality in part (a), show that ρ 2 ≤ 1.

CHAPTER

6

Functions of Random Variables 6.1 Introduction 6.2 Finding the Probability Distribution of a Function of Random Variables 6.3 The Method of Distribution Functions 6.4 The Method of Transformations 6.5 The Method of Moment-Generating Functions 6.6 Multivariable Transformations Using Jacobians (Optional) 6.7 Order Statistics 6.8 Summary References and Further Readings

6.1 Introduction As we indicated in Chapter 1, the objective of statistics is to make inferences about a population based on information contained in a sample taken from that population. Any truly useful inference must be accompanied by an associated measure of goodness. Each of the topics discussed in the preceding chapters plays a role in the development of statistical inference. However, none of the topics discussed thus far pertains to the objective of statistics as closely as the study of the distributions of functions of random variables. This is because all quantities used to estimate population parameters or to make decisions about a population are functions of the n random observations that appear in a sample. To illustrate, consider the problem of estimating a population mean, µ. Intuitively we draw a random sample of n observations, y1 , y2 , . . . , yn , from the population and employ the sample mean y= 296

n y1 + y2 + · · · + yn 1 yi = n n i=1

6.2

Finding the Probability Distribution of a Function of Random Variables

297

as an estimate for µ. How good is this estimate? The answer depends upon the behavior of the random variables Y1 , Y2 , . . . , Yn and their effect on the distribution of n Y = (1/n) i=1 Yi . A measure of the goodness of an estimate is the error of estimation, the difference between the estimate and the parameter estimated (for our example, the difference between y and µ). Because Y1 , Y2 , . . . , Yn are random variables, in repeated sampling Y is also a random variable (and a function of the n variables Y1 , Y2 , . . . , Yn ). Therefore, we cannot be certain that the error of estimation will be less than a speciﬁc value, say, B. However, if we can determine the probability distribution of the estimator Y , this probability distribution can be used to determine the probability that the error of estimation is less than or equal to B. To determine the probability distribution for a function of n random variables, Y1 , Y2 , . . . , Yn , we must ﬁnd the joint probability distribution for the random variables themselves. We generally assume that observations are obtained through random sampling, as deﬁned in Section 2.12. We saw in Section 3.7 that random sampling from a ﬁnite population (sampling without replacement) results in dependent trials but that these trials become essentially independent if the population is large when compared to the size of the sample. We will assume throughout the remainder of this text that populations are large in comparison to the sample size and consequently that the random variables obtained through a random sample are in fact independent of one another. Thus, in the discrete case, the joint probability function for Y1 , Y2 , . . . , Yn , all sampled from the same population, is given by p(y1 , y2 , . . . , yn ) = p(y1 ) p(y2 ) · · · p(yn ). In the continuous case, the joint density function is f (y1 , y2 , . . . , yn ) = f (y1 ) f (y2 ) · · · f (yn ). The statement “Y1 , Y2 , . . . , Yn is a random sample from a population with density f (y)” will mean that the random variables are independent with common density function f (y).

6.2 Finding the Probability Distribution of a Function of Random Variables We will present three methods for ﬁnding the probability distribution for a function of random variables and a fourth method for ﬁnding the joint distribution of several functions of random variables. Any one of these may be employed to ﬁnd the distribution of a given function of the variables, but one of the methods usually leads to a simpler derivation than the others. The method that works “best” varies from one application to another. Hence, acquaintance with the ﬁrst three methods is desirable. The fourth method is presented in (optional) Section 6.6. Although the ﬁrst three methods will be discussed separately in the next three sections, a brief summary of each of these methods is provided here.

298

Chapter 6

Functions of Random Variables

Consider random variables Y1 , Y2 , . . . , Yn and a function U (Y1 , Y2 , . . . , Yn ), denoted simply as U . Then three of the methods for ﬁnding the probability distribution of U are as follows: 1. The method of distribution functions: This method is typically used when the Y ’s have continuous distributions. First, ﬁnd the distribution function for U , FU (u) = P(U ≤ u), by using the methods that we discussed in Chapter 5. To do so, we must ﬁnd the region in the y1 , y2 , . . . , yn space for which U ≤ u and then ﬁnd P(U ≤ u) by integrating f (y1 , y2 , . . . , yn ) over this region. The density function for U is then obtained by differentiating the distribution function, FU (u). A detailed account of this procedure will be presented in Section 6.3. 2. The method of transformations: If we are given the density function of a random variable Y , the method of transformations results in a general expression for the density of U = h(Y ) for an increasing or decreasing function h(y). Then if Y1 and Y2 have a bivariate distribution, we can use the univariate result explained earlier to ﬁnd the joint density of Y1 and U = h(Y1 , Y2 ). By integrating over y1 , we ﬁnd the marginal probability density function of U , which is our objective. This method will be illustrated in Section 6.4. 3. The method of moment-generating functions: This method is based on a uniqueness theorem, Theorem 6.1, which states that, if two random variables have identical moment-generating functions, the two random variables possess the same probability distributions. To use this method, we must ﬁnd the moment-generating function for U and compare it with the moment-generating functions for the common discrete and continuous random variables derived in Chapters 3 and 4. If it is identical to one of these moment-generating functions, the probability distribution of U can be identiﬁed because of the uniqueness theorem. Applications of the method of moment-generating functions will be presented in Section 6.5. Probability-generating functions can be employed in a way similar to the method of moment-generating functions. If you are interested in their use, see the references at the end of the chapter.

6.3 The Method of Distribution Functions We will illustrate the method of distribution functions with a simple univariate example. If Y has probability density function f (y) and if U is some function of Y , then we can ﬁnd FU (u) = P(U ≤ u) directly by integrating f (y) over the region for which U ≤ u. The probability density function for U is found by differentiating FU (u). The following example illustrates the method. E X A M PL E 6.1

A process for reﬁning sugar yields up to 1 ton of pure sugar per day, but the actual amount produced, Y , is a random variable because of machine breakdowns and other slowdowns. Suppose that Y has density function given by $ 2y, 0 ≤ y ≤ 1, f (y) = 0, elsewhere.

6.3

The Method of Distribution Functions

299

The company is paid at the rate of $300 per ton for the reﬁned sugar, but it also has a ﬁxed overhead cost of $100 per day. Thus the daily proﬁt, in hundreds of dollars, is U = 3Y − 1. Find the probability density function for U . Solution

To employ the distribution function approach, we must ﬁnd u+1 . FU (u) = P(U ≤ u) = P(3Y − 1 ≤ u) = P Y ≤ 3 If u < −1, then (u + 1)/3 < 0 and, therefore, FU (u) = P (Y ≤ (u + 1)/3) = 0. Also, if u > 2, then (u + 1)/3 > 1 and FU (u) = P (Y ≤ (u + 1)/3) = 1. However, if −1 ≤ u ≤ 2, the probability can be written as an integral of f (y), and " (u+1)/3 " (u+1)/3 u+1 2 u+1 = f (y)dy = 2y dy = . P Y ≤ 3 3 −∞ 0 (Notice that, as Y ranges from 0 to 1, U ranges from −1 to 2.) Thus, the distribution function of the random variable U is given by 0, u < −1, 2 u+1 FU (u) = , −1 ≤ u ≤ 2, 3 1, u > 2, and the density function for U is fU (u) =

d FU (u) = du

$

(2/9)(u + 1), −1 ≤ u < 2, 0, elsewhere.

In the bivariate situation, let Y1 and Y2 be random variables with joint density f (y1 , y2 ) and let U = h(Y1 , Y2 ) be a function of Y1 and Y2 . Then for every point (y1 , y2 ), there corresponds one and only one value of U . If we can ﬁnd the region of values (y1 , y2 ) such that U ≤ u, then the integral of the joint density function f (y1 , y2 ) over this region equals P(U ≤ u) = FU (u). As before, the density function for U can be obtained by differentiation. We will illustrate these ideas with two examples.

E X A M PL E 6.2

In Example 5.4, we considered the random variables Y1 (the proportional amount of gasoline stocked at the beginning of a week) and Y2 (the proportional amount of gasoline sold during the week). The joint density function of Y1 and Y2 is given by $ 3y1 , 0 ≤ y2 ≤ y1 ≤ 1, f (y1 , y2 ) = 0, elsewhere. Find the probability density function for U = Y1 − Y2 , the proportional amount of gasoline remaining at the end of the week. Use the density function of U to ﬁnd E(U ).

300

Chapter 6

Functions of Random Variables

y2 1

u

F I G U R E 6.1 Region over which f (y1 , y2 ) is positive, Example 6.2

y1

Solution

u

–

y2

=

y1

1

The region over which f (y1 , y2 ) is not zero is sketched in Figure 6.1. Also shown there is the line y1 − y2 = u, for a value of u between 0 and 1. Notice that any point (y1 , y2 ) such that y1 − y2 ≤ u lies above the line y1 − y2 = u. If u < 0, the line y1 − y2 = u has intercept −u < 0 and FU (u) = P(Y1 − Y2 ≤ u) = 0. When u > 1, the line y1 − y2 = u has intercept −u < −1 and FU (u) = 1. For 0 ≤ u ≤ 1, FU (u) = P(Y1 − Y2 ≤ u) is the integral over the dark shaded region above the line y1 − y2 = u. Because it is easier to integrate over the lower triangular region, we can write, for 0 ≤ u ≤ 1, FU (u) = P(U ≤ u) = 1 − P(U ≥ u) " 1 " y1 −u 3y1 dy2 dy1 = 1− "

u

0 1

= 1−

3y1 (y1 − u) dy1

u

= 1−3

uy 2 y13 − 1 3 2

1

u3 3 = 1 − 1 − (u) + 2 2 =

u

1 (3u − u 3 ). 2

Summarizing, u < 0, 0, FU (u) = (3u − u 3 )/2, 0 ≤ u ≤ 1, 1, u > 1. A graph of FU (u) is given in Figure 6.2(a). It follows that $ d FU (u) 3(1 − u 2 )/2, 0 ≤ u ≤ 1, fU (u) = = du 0, elsewhere. The density function fU (u) is graphed in Figure 6.2(b).

6.3

F I G U R E 6.2 Distribution and density functions for Example 6.2

FU (u)

The Method of Distribution Functions

301

fU (u) 1.5

1

1

1

u

(a) Distribution Function

1

u

(b) Density Function

We can use this derived density function to ﬁnd E(U ), because 1 " 1 u4 3 3 u2 3 (1 − u 2 ) du = − u = , E(U ) = 2 2 2 4 8 0 0 which agrees with the value of E(Y1 − Y2 ) found in Example 5.20 by using the methods developed in Chapter 5 for ﬁnding the expected value of a linear function of random variables.

E X A M PL E 6.3 Solution

Let (Y1 , Y2 ) denote a random sample of size n = 2 from the uniform distribution on the interval (0, 1). Find the probability density function for U = Y1 + Y2 . The density function for each Yi is f (y) =

1, 0 ≤ y ≤ 1,

0, elsewhere. Therefore, because we have a random sample, Y1 and Y2 are independent, and 1, 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = f (y1 ) f (y2 ) = 0, elsewhere. The random variables Y1 and Y2 have nonzero density over the unit square, as shown in Figure 6.3. We wish to ﬁnd FU (u) = P(U ≤ u). The ﬁrst step is to ﬁnd the points (y1 , y2 ) that imply y1 + y2 ≤ u. The easiest way to ﬁnd this region is to locate the points that divide the regions U ≤ u and U > u. These points lie on the line y1 + y2 = u. Graphing this relationship in Figure 6.3 and arbitrarily selecting y2 as the dependent variable, we ﬁnd that the line possesses a slope equal to −1 and a y2 intercept equal to u. The points associated with U < u are either above or below the line and can be determined by testing points on either side of the line. Suppose that u = 1.5.

302

Chapter 6

Functions of Random Variables

F I G U R E 6.3 The region of integration for Example 6.3

y2 1

y1 +

y1 + y2 < u or U < u

y2 = u

y1

1

Let y1 = y2 = 1/4; then y1 + y2 = 1/4 + 1/4 = 1/2 and (y1 , y2 ) satisﬁes the inequality y1 + y2 < u. Therefore, y1 = y2 = 1/4 falls in the shaded region below the line. Similarly, all points such that y1 + y2 < u lie below the line y1 + y2 = u. Thus, "" FU (u) = P(U ≤ u) = P(Y1 + Y2 ≤ u) =

f (y1 , y2 ) dy1 dy2 . y1 +y2 ≤u

If u < 0,

""

FU (u) = P(U ≤ u) =

"" f (y1 , y2 ) dy1 dy2 =

y1 +y2 ≤u

and for u > 2,

""

FU (u) = P(U ≤ u) =

0 dy1 dy2 = 0 y1 +y2 ≤u

"

1

f (y1 , y2 ) dy1 dy2 = 0

y1 +y2 ≤u

"

1

(1) dy1 dy2 = 1.

For 0 ≤ u ≤ 2, the limits of integration depend upon the particular value of u (where u is the y2 intercept of the line y1 + y2 = u). Thus, the mathematical expression for FU (u) changes depending on whether 0 ≤ u ≤ 1 or 1 < u ≤ 2. If 0 ≤ u ≤ 1, the region y1 + y2 ≤ u, is the shaded area in Figure 6.4. Then for 0 ≤ u ≤ 1, we have "" " u " u−y2 " u FU (u) = f (y1 , y2 ) dy1 dy2 = (1) dy1 dy2 = (u − y2 ) dy2 y1 +y2 ≤u

u u2 y2 u2 = . = uy2 − 2 = u2 − 2 2 2 0 The solution, FU (u), 0 ≤ u ≤ 1, could have been acquired directly by using elementary geometry. The bivariate density f (y1 , y2 ) = 1 is uniform over the unit square, 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1. Hence, FU (u) is the volume of a solid with height equal to f (y1 , y2 ) = 1 and a triangular cross section, as shown in Figure 6.4. Hence, u2 u2 (1) = . 2 2 The distribution function can be acquired in a similar manner when u is deﬁned over the interval 1 < u ≤ 2. Although the geometric solution is easier, we will obtain FU (u) = (area of triangle) · (height) =

6.3

F I G U R E 6.4 The region y1 + y2 ≤ u for 0≤u≤1

The Method of Distribution Functions

303

y2 1

y1 + y2 = u

y1

1

FU (u) directly by integration. The region y1 + y2 ≤ u, 1 ≤ u ≤ 2 is the shaded area indicated in Figure 6.5. The complement of the event U ≤ u is the event that (Y1 , Y2 ) falls in the region A of Figure 6.5. Then for 1 < u ≤ 2, " " f (y1 , y2 ) dy1 dy2 FU (u) = 1 − " = 1− " = 1−

A

"

1 u−1

"

1

(1) dy1 dy2 = 1 −

u−y2

1

1 y1

u−1

u−y2

To summarize,

FU (u) =

0, u 2 /2,

u < 0, 0 ≤ u ≤ 1,

(−u 2 /2) + 2u − 1, 1 < u ≤ 2, 1, u > 2. The distribution function for U is shown in Figure 6.6(a). y2 1

A y1 + y2 = u

dy2

y2 1 (1 − u + y2 ) dy2 = 1 − (1 − u)y2 + 2 2 u−1 u−1 1

= (−u 2 /2) + 2u − 1.

F I G U R E 6.5 The region y1 + y2 ≤ u, 1 2, du

or, more simply,

0 ≤ u ≤ 1, u, fU (u) = 2 − u, 1 < u ≤ 2, 0, otherwise. A graph of fU (u) is shown in Figure 6.6(b).

Summary of the Distribution Function Method Let U be a function of the random variables Y1 , Y2 , . . . , Yn . 1. Find the region U = u in the (y1 , y2 , . . . , yn ) space. 2. Find the region U ≤ u. 3. Find FU (u) = P(U ≤ u) by integrating f (y1 , y2 , . . . , yn ) over the region U ≤ u. 4. Find the density function fU (u) by differentiating FU (u). Thus, fU (u) = d FU (u)/du. To illustrate, we will consider the case U = h(Y ) = Y 2 , where Y is a continuous random variable with distribution function FY (y) and density function f Y (y). If u ≤ 0, FU (u) = P(U ≤ u) = P(Y 2 ≤ u) = 0 and for u > 0 (see Figure 6.7), FU (u) = P(U ≤ u) = P(Y 2 ≤ u) √ √ = P(− u ≤ Y ≤ u) " √u √ √ = √ f (y) dy = FY ( u) − FY (− u). − u

6.3

F I G U R E 6.7 The function h(y) = y 2

The Method of Distribution Functions

h ( y)

h ( y) = y 2

u

–

u

In general,

u

FU (u) =

y

√ √ FY ( u) − FY (− u), u > 0, 0,

otherwise.

On differentiating with respect to u, we see that √ 1 1 f (√u) √ (− u) + f , u > 0, √ Y Y fU (u) = 2 u 2 u 0, otherwise, or, more simply, √ 1 √ √ f Y ( u) + f Y (− u) , fU (u) = 2 u 0,

E X A M PL E 6.4

u > 0, otherwise.

Let Y have probability density function given by y+1 , −1 ≤ y ≤ 1, f Y (y) = 2 0, elsewhere. Find the density function for U = Y 2 .

Solution

We know that

√ 1 √ √ f Y ( u) + f Y (− u) , u > 0, fU (u) = 2 u 0, otherwise, and on substituting into this equation, we obtain √ √ 1 u+1 − u+1 1 √ + = √ , 0 < u ≤ 1, 2 2 fU (u) = 2 u 2 u 0, elsewhere.

305

306

Chapter 6

Functions of Random Variables

Because Y has positive density only over the interval −1 ≤ y ≤ 1, it follows that U = Y 2 has positive density only over the interval 0 < u ≤ 1.

In some instances, it is possible to ﬁnd a transformation that, when applied to a random variable with a uniform distribution on the interval (0, 1), results in a random variable with some other speciﬁed distribution function, say, F(y). The next example illustrates a technique for achieving this objective. A brief discussion of one practical use of this transformation follows the example. E X A M PL E 6.5

Let U be a uniform random variable on the interval (0, 1). Find a transformation G(U ) such that G(U ) possesses an exponential distribution with mean β.

Solution

If U possesses a uniform distribution on the interval (0, 1), then the distribution function of U (see Exercise 4.38) is given by 0, u < 0, FU (u) = u, 0 ≤ u ≤ 1, 1, u > 1. Let Y denote a random variable that has an exponential distribution with mean β. Then (see Section 4.6) Y has distribution function $ 0, y < 0, FY (y) = −y/β 1−e , y ≥ 0. Notice that FY (y) is strictly increasing on the interval [0, ∞). Let 0 < u < 1 and observe that there is a unique value y such that FY (y) = u. Thus, FY−1 (u), 0 < u < 1, is well deﬁned. In this case, FY (y) = 1 − e−y/β = u if and only if y = −β ln(1−u) = FY−1 (u). Consider the random variable FY−1 (U ) = −β ln(1−U ) and observe that, if y > 0, P F Y−1 (U ) ≤ y = P[−β ln(1 − U ) ≤ y] = P[ln(1 − U ) ≥ −y/β] = P(U ≤ 1 − e−y/β ) = 1 − e−y/β .

Also, P FY−1 (U ) ≤ y = 0 if y ≤ 0. Thus, FY−1 (U ) = −β ln(1 − U ) possesses an exponential distribution with mean β, as desired.

Computer simulations are frequently used to evaluate proposed statistical techniques. Typically, these simulations require that we obtain observed values of random variables with a prescribed distribution. As noted in Section 4.4, most computer systems contain a subroutine that provides observed values of a random variable U that has a uniform distribution on the interval (0, 1). How can the result of Example 6.5 be used to generate a set of observations from an exponential distribution

Exercises

307

with mean β? Simply use the computer’s random number generator to produce values u 1 , u 2 , . . . , u n from a uniform (0, 1) distribution and then calculate yi = −β ln(1 − u i ), i = 1, 2, . . . , n to obtain values of random variables with the required exponential distribution. As long as a prescribed distribution function F(y) possesses a unique inverse F −1 (·), the preceding technique can be applied. In instances such as that illustrated in Example 6.5, we can readily write down the form of F −1 (·) and proceed as earlier. If the form of a distribution function cannot be written in an easily invertible form (recall that the distribution functions of normally, gamma-, and beta- distributed random variables are given in tables that were obtained by using numerical integration techniques), our task is more difﬁcult. In these instances, other methods are used to generate observations with the desired distribution. In the following exercise set, you will ﬁnd problems that can be solved by using the techniques presented in this section. The exercises that involve ﬁnding F −1 (U ) for some speciﬁc distribution F(y) focus on cases where F −1 (·) exists in a closed form.

Exercises 6.1

Let Y be a random variable with probability density function given by $ 2(1 − y), 0 ≤ y ≤ 1, f (y) = 0, elsewhere. a Find the density function of U1 = 2Y − 1. b Find the density function of U2 = 1 − 2Y . c Find the density function of U3 = Y 2 . d Find E(U1 ), E(U2 ), and E(U3 ) by using the derived density functions for these random variables. e Find E(U1 ), E(U2 ), and E(U3 ) by the methods of Chapter 4.

6.2

Let Y be a random variable with a density function given by $ (3/2)y 2 , −1 ≤ y ≤ 1, f (y) = 0, elsewhere. a Find the density function of U1 = 3Y . b Find the density function of U2 = 3 − Y . c Find the density function of U3 = Y 2 .

6.3

A supplier of kerosene has a weekly demand Y possessing a probability density function given by y, 0 ≤ y ≤ 1, f (y) = 1, 1 < y ≤ 1.5, 0, elsewhere, with measurements in hundreds of gallons. (This problem was introduced in Exercise 4.13.) The supplier’s proﬁt is given by U = 10Y − 4. a Find the probability density function for U . b Use the answer to part (a) to ﬁnd E(U ). c Find E(U ) by the methods of Chapter 4.

308

Chapter 6

Functions of Random Variables

6.4

The amount of ﬂour used per day by a bakery is a random variable Y that has an exponential distribution with mean equal to 4 tons. The cost of the ﬂour is proportional to U = 3Y + 1. a Find the probability density function for U . b Use the answer in part (a) to ﬁnd E(U ).

6.5

The waiting time Y until delivery of a new component for an industrial operation is uniformly distributed over the interval from 1 to 5 days. The cost of this delay is given by U = 2Y 2 + 3. Find the probability density function for U .

6.6

The joint distribution of amount of pollutant emitted from a smokestack without a cleaning device (Y1 ) and a similar smokestack with a cleaning device (Y2 ) was given in Exercise 5.10 to be $ 1, 0 ≤ y1 ≤ 2, 0 ≤ y2 ≤ 1, 2y2 ≤ y1 , f (y1 , y2 ) = 0, elsewhere. The reduction in amount of pollutant due to the cleaning device is given by U = Y1 − Y2 . a Find the probability density function for U . b Use the answer in part (a) to ﬁnd E(U ). Compare your results with those of Exercise 5.78(c).

6.7

Suppose that Z has a standard normal distribution. a Find the density function of U = Z 2 . b Does U have a gamma distribution? What are the values of α and β? c What is another name for the distribution of U ?

6.8

Assume that Y has a beta distribution with parameters α and β. a Find the density function of U = 1 − Y . b Identify the density of U as one of the types we studied in Chapter 4. Be sure to identify any parameter values. c How is E(U ) related to E(Y )? d How is V (U ) related to V (Y )?

6.9

Suppose that a unit of mineral ore contains a proportion Y1 of metal A and a proportion Y2 of metal B. Experience has shown that the joint probability density function of Y1 and Y2 is uniform over the region 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, 0 ≤ y1 + y2 ≤ 1. Let U = Y1 + Y2 , the proportion of either metal A or B per unit. Find a the probability density function for U . b E(U ) by using the answer to part (a). c E(U ) by using only the marginal densities of Y1 and Y2 .

6.10

The total time from arrival to completion of service at a fast-food outlet, Y1 , and the time spent waiting in line before arriving at the service window, Y2 , were given in Exercise 5.15 with joint density function $ −y1 e , 0 ≤ y2 ≤ y1 < ∞, f (y1 , y2 ) = 0, elsewhere. Another random variable of interest is U = Y1 − Y2 , the time spent at the service window. Find a the probability density function for U . b E(U ) and V (U ). Compare your answers with the results of Exercise 5.108.

Exercises

6.11

309

Suppose that two electronic components in the guidance system for a missile operate independently and that each has a length of life governed by the exponential distribution with mean 1 (with measurements in hundreds of hours). Find the a probability density function for the average length of life of the two components. b mean and variance of this average, using the answer in part (a). Check your answer by computing the mean and variance, using Theorem 5.12.

6.12

Suppose that Y has a gamma distribution with parameters α and β and that c > 0 is a constant. a Derive the density function of U = cY . b Identify the density of U as one of the types we studied in Chapter 4. Be sure to identify any parameter values. c The parameters α and β of a gamma-distributed random variable are, respectively, “shape” and “scale” parameters. How do the scale and shape parameters for U compare to those for Y ?

6.13

If Y1 and Y2 are independent exponential random variables, both with mean β, ﬁnd the density function for their sum. (In Exercise 5.7, we considered two independent exponential random variables, both with mean 1 and determined P(Y1 + Y2 ≤ 3).)

6.14

In a process of sintering (heating) two types of copper powder (see Exercise 5.152), the density function for Y1 , the volume proportion of solid copper in a sample, was given by $ 6y1 (1 − y1 ), 0 ≤ y1 ≤ 1, f 1 (y1 ) = 0, elsewhere. The density function for Y2 , the proportion of type A crystals among the solid copper, was given as $ 2 3y2 , 0 ≤ y2 ≤ 1, f 2 (y2 ) = 0, elsewhere. The variable U = Y1 Y2 gives the proportion of the sample volume due to type A crystals. If Y1 and Y2 are independent, ﬁnd the probability density function for U .

6.15

Let Y have a distribution function given by $ 0, y < 0, F(y) = 2 1 − e−y , y ≥ 0. Find a transformation G(U ) such that, if U has a uniform distribution on the interval (0, 1), G(U ) has the same distribution as Y .

6.16

In Exercise 4.15, we determined that

6.17

b , y ≥ b, f (y) = y 2 0, elsewhere, is a bona ﬁde probability density function for a random variable, Y . Assuming b is a known constant and U has a uniform distribution on the interval (0, 1), transform U to obtain a random variable with the same distribution as Y . A member of the power family of distributions has a distribution function given by 0, y < 0, y α , 0 ≤ y ≤ θ, F(y) = θ 1, y > θ, where α, θ > 0.

310

Chapter 6

Functions of Random Variables

a Find the density function. b For ﬁxed values of α and θ, ﬁnd a transformation G(U ) so that G(U ) has a distribution function of F when U possesses a uniform (0, 1) distribution. c Given that a random sample of size 5 from a uniform distribution on the interval (0, 1) yielded the values .2700, .6901, .1413, .1523, and .3609, use the transformation derived in part (b) to give values associated with a random variable with a power family distribution with α = 2, θ = 4.

6.18

A member of the Pareto family of distributions (often used in economics to model income distributions) has a distribution function given by y < β, 0, α β F(y) = , y ≥ β, 1 − y where α, β > 0. a Find the density function. b For ﬁxed values of β and α, ﬁnd a transformation G(U ) so that G(U ) has a distribution function of F when U has a uniform distribution on the interval (0, 1). c Given that a random sample of size 5 from a uniform distribution on the interval (0, 1) yielded the values .0058, .2048, .7692, .2475 and .6078, use the transformation derived in part (b) to give values associated with a random variable with a Pareto distribution with α = 2, β = 3.

6.19

Refer to Exercises 6.17 and 6.18. If Y possesses a Pareto distribution with parameters α and β, prove that X = 1/Y has a power family distribution with parameters α and θ = β −1 .

6.20

Let the random variable Y possess a uniform distribution on the interval (0, 1). Derive the a distribution of the random variable W = Y 2 . √ b distribution of the random variable W = Y .

*6.21

Suppose that Y is a random variable that takes on only integer values 1, 2, . . . . Let F(y) denote the distribution function of this random variable. As discussed in Section 4.2, this distribution function is a step function, and the magnitude of the step at each integer value is the probability that Y takes on that value. Let U be a continuous random variable that is uniformly distributed on the interval (0, 1). Deﬁne a variable X such that X = k if and only if F(k − 1) < U ≤ F(k), k = 1, 2, . . . . Recall that F(0) = 0 because Y takes on only positive integer values. Show that P(X = i) = F(i) − F(i − 1) = P(Y = i), i = 1, 2, . . . . That is, X has the same distribution as Y . [Hint: Recall Exercise 4.5.]1

*6.22

Use the results derived in Exercises 4.6 and 6.21 to describe how to generate values of a geometrically distributed random variable.

6.4 The Method of Transformations The transformation method for ﬁnding the probability distribution of a function of random variables is an offshoot of the distribution function method of Section 6.3. Through the distribution function approach, we can arrive at a simple method of 1. Exercises preceded by an asterisk are optional.

6.4

F I G U R E 6.8 An increasing function

The Method of Transformations

311

u u1 = h ( y 1 )

u=

h(

y)

y1 = h –1( u1 )

y

writing down the density function of U = h(Y ), provided that h(y) is either decreasing or increasing. [By h(y) increasing, we mean that if y1 < y2 , then h(y1 ) < h(y2 ) for any real numbers y1 and y2 .] The graph of an increasing function h(y) appears in Figure 6.8. Suppose that h(y) is an increasing function of y and that U = h(Y ), where Y has density function f Y (y). Then h −1 (u) is an increasing function of u: If u 1 < u 2 , then h −1 (u 1 ) = y1 < y2 = h −1 (u 2 ). We see from Figure 6.8 that the set of points y such that h(y) ≤ u 1 is precisely the same as the set of points y such that y ≤ h −1 (u 1 ). Therefore (see Figure 6.8), P(U ≤ u) = P[h(Y ) ≤ u] = P{h −1 [h(Y )] ≤ h −1 (u)} = P[Y ≤ h −1 (u)] or FU (u) = FY [h −1 (u)]. Then differentiating with respect to u, we have d FY [h −1 (u)] d FU (u) d[h −1 (u)] = = f Y (h −1 (u)) . du du du To simplify notation, we will write dh −1 /du instead of d[h −1 (u)]/du and fU (u) =

dh −1 . du Thus, we have acquired a new way to ﬁnd fU (u) that evolved from the general method of distribution functions. To ﬁnd fU (u), solve for y in terms of u; that is, ﬁnd y = h −1 (u) and substitute this expression into f Y (y). Then multiply this quantity by dh −1 /du. We will illustrate the procedure with an example. fU (u) = f Y [h −1 (u)]

E X A M PL E 6.6

In Example 6.1, we worked with a random variable Y (amount of sugar produced) with a density function given by $ 2y, 0 ≤ y ≤ 1, f Y (y) = 0, elsewhere. We were interested in a new random variable (proﬁt) given by U = 3Y − 1. Find the probability density function for U by the transformation method.

312

Chapter 6

Functions of Random Variables

Solution

The function of interest here is h(y) = 3y −1, which is increasing in y. If u = 3y −1, then d u+1 dh −1 1 u+1 −1 3 and = = . y = h (u) = 3 du du 3 Thus, dh −1 fU (u) = f Y [h −1 (u)] du −1 u+1 1 u+1 dh 2[h −1 (u)] =2 , 0≤ ≤ 1, = du 3 3 3 0, elsewhere, or, equivalently, $ fU (u) =

2(u + 1)/9, −1 ≤ u ≤ 2, 0, elsewhere.

The range over which fU (u) is positive is simply the interval 0 ≤ y ≤ 1 transformed to the u axis by the function u = 3y − 1. This answer agrees with that of Example 6.1.

If h(y) is a decreasing function of y, then h −1 (u) is a decreasing function of u. That is, if u 1 < u 2 , then h −1 (u 1 ) = y1 > y2 = h −1 (u 2 ). Also, as in Figure 6.9, the set of points y such that h(y) ≤ u 1 is the same as the set of points such that y ≥ h −1 (u 1 ). It follows that, for U = h(Y ), as shown in Figure 6.9, P(U ≤ u) = P[Y ≥ h −1 (u)] or

FU (u) = 1 − FY [h −1 (u)].

If we differentiate with respect to u, we obtain fU (u) = − f Y [h −1 (u)] F I G U R E 6.9 A decreasing function

d[h −1 (u)] . du

u u

=

h(

y)

u1 = h ( y 1 )

y1 = h –1( u1 )

y

6.4

The Method of Transformations

313

If we again use the simpliﬁed notation dh −1 /du instead of d[h −1 (u)]/du and recall that dh −1 /du is negative because h −1 (u) is a decreasing function of u, the density of U is * −1 * * dh * *. fU (u) = f Y [h −1 (u)] ** du * Actually, it is not necessary that h(y) be increasing or decreasing (and hence invertable) for all values of y. The function h(·) need only be increasing or decreasing for the values of y such that f Y (y) > 0. The set of points {y : f Y (y) > 0} is called the support of the density f Y (y). If y = h −1 (u) is not in the support of the density, then f Y [h −1 (u)] = 0. These results are combined in the following statement: Let Y have probability density function f Y (y). If h(y) is either increasing or decreasing for all y such that f Y (y) > 0, then U = h(Y ) has density function * −1 * * dh * dh −1 d[h −1 (u)] −1 *, where = . fU (u) = f Y [h (u)] ** * du du du

E X A M PL E 6.7

Solution

Let Y have the probability density function given by $ 2y, 0 ≤ y ≤ 1, f Y (y) = 0, elsewhere. Find the density function of U = −4Y + 3. In this example, the set of values of y such that f Y (y) > 0 are the values 0 < y ≤ 1. The function of interest, h(y) = −4y + 3, is decreasing for all y, and hence for all 0 < y ≤ 1, if u = −4y + 3, then dh −1 1 3−u and =− . 4 du 4 Notice that h −1 (u) is a decreasing function of u and that dh −1 /du < 0. Thus, * * 3−u 3 − u ** 1 ** * −1 * 2 * dh * *− 4 *, 0 ≤ 4 ≤ 1, −1 * * 4 fU (u) = f Y [h (u)] * = du * 0, elsewhere. Finally, some simple algebra gives 3−u , −1 ≤ u ≤ 3, fU (u) = 8 0, elsewhere. y = h −1 (u) =

Direct application of the method of transformation requires that the function h(y) be either increasing or decreasing for all y such that f Y (y) > 0. If you want to use this method to ﬁnd the distribution of U = h(Y ), you should be very careful to check that

314

Chapter 6

Functions of Random Variables

the function h(·) is either increasing or decreasing for all y in the support of f Y (y). If it is not, the method of transformations cannot be used, and you should instead use the method of distribution functions discussed in Section 6.3. The transformation method can also be used in multivariate situations. The following example illustrates the bivariate case. E X A M PL E 6.8

Let Y1 and Y2 have a joint density function given by $ −(y1 +y2 ) e , 0 ≤ y1 , 0 ≤ y2 , f (y1 , y2 ) = 0, elsewhere. Find the density function for U = Y1 + Y2 .

Solution

This problem must be solved in two stages: First, we will ﬁnd the joint density of Y1 and U ; second, we will ﬁnd the marginal density of U . The approach is to let Y1 be ﬁxed at a value y1 ≥ 0. Then U = y1 + Y2 , and we can consider the one-dimensional transformation problem in which U = h(Y2 ) = y1 + Y2 . Letting g(y1 , u) denote the joint density of Y1 and U , we have, with y2 = u − y1 = h −1 (u), * −1 * * dh * −1 * = e−(y1 +u−y1 ) (1), 0 ≤ y1 , 0 ≤ u − y1 , f [y1 , h (u)]** * g(y1 , u) = du 0, elsewhere. Simplifying, we obtain

$ g(y1 , u) =

e−u , 0 ≤ y1 ≤ u, 0, elsewhere.

(Notice that Y1 ≤ U .) The marginal density of U is then given by " ∞ fU (u) = g(y1 , u) dy1 −∞

=

"

u

e−u dy1 = ue−u ,

0 ≤ u,

0,

elsewhere.

We will illustrate the use of the bivariate transformation with another example, this one involving the product of two random variables. E X A M PL E 6.9

In Example 5.19, we considered a random variable Y1 , the proportion of impurities in a chemical sample, and Y2 , the proportion of type I impurities among all impurities in the sample. The joint density function was given by $ 2(1 − y1 ), 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f (y1 , y2 ) = 0, elsewhere. We are interested in U = Y1 Y2 , which is the proportion of type I impurities in the sample. Find the probability density function for U and use it to ﬁnd E(U ).

6.4

Solution

The Method of Transformations

315

Because we are interested in U = Y1 Y2 , let us ﬁrst ﬁx Y1 at a value y1 , 0 < y1 ≤ 1, and think in terms of the univariate transformation U = h(Y2 ) = y1 Y2 . We can then determine the joint density function for Y1 and U (with y2 = u/y1 = h −1 (u)) to be * −1 * * dh * −1 * g(y1 , u) = f [y1 , h (u)] ** du * * * *1* 2(1 − y1 ) ** ** , 0 < y1 ≤ 1, 0 ≤ u/y1 ≤ 1, = y1 0, elsewhere. Equivalently, g(y1 , u) =

2(1 − y1 )

1 y1

, 0 ≤ u ≤ y1 ≤ 1,

0,

elsewhere.

(U also ranges between 0 and 1, but Y1 always must be greater than or equal to U .) Further, " ∞ g(y1 , u) dy1 fU (u) = −∞

=

"

1

2(1 − y1 )

u

1 y1

dy1 , 0 ≤ u ≤ 1,

0,

elsewhere.

Because, for 0 ≤ u ≤ 1, " 1 " 1 1 1 2(1 − y1 ) − 1 dy1 dy1 = 2 y1 y1 u u 1 1 = 2 ln y1 u − y1 u = 2 (−ln u − 1 + u) = 2(u − ln u − 1), we obtain

$ fU (u) =

2(u − ln u − 1), 0 ≤ u ≤ 1, 0, elsewhere.

(The symbol ln stands for natural logarithm.) We now ﬁnd E(U ): " " ∞ u fU (u) du = E(U ) = −∞

$"

=2 0

u3 3

"

1

u du − 1

"

1 0

"

u(ln u) du −

− 0

2u(u − ln u − 1) du

2

=2

1

1

u2 u(ln u) du − 2

)

1

u du 0

1 . 0

316

Chapter 6

Functions of Random Variables

The middle integral is most easily solved by using integration by parts, which yields 2 1 " 1 2 " 1 u u 1 1 u2 1 u(ln u) du = =− . (ln u) − du = 0 − 2 2 u 4 4 0 0 0 0 Thus, E(U ) = 2[(1/3) − (−1/4) − (1/2)] = 2(1/12) = 1/6. This answer agrees with the answer to Example 5.21, where E(U ) = E(Y1 Y2 ) was found by a different method.

Summary of the Transformation Method Let U = h(Y ), where h(y) is either an increasing or decreasing function of y for all y such that f Y (y) > 0. 1. Find the inverse function, y = h −1 (u). d[h −1 (u)] dh −1 = . 2. Evaluate du du 3. Find fU (u) by

* −1 * * dh * *. fU (u) = f Y [h (u)] ** du * −1

Exercises 6.23

In Exercise 6.1, we considered a random variable Y with probability density function given by $ 2(1 − y), 0 ≤ y ≤ 1, f (y) = 0, elsewhere, and used the method of distribution functions to ﬁnd the density functions of a U1 = 2Y − 1. b U2 = 1 − 2Y . c U3 = Y 2 . Use the method of transformation to ﬁnd the densities of U1 , U2 , and U3 .

6.24

In Exercise 6.4, we considered a random variable Y that possessed an exponential distribution with mean 4 and used the method of distribution functions to derive the density function for U = 3Y + 1. Use the method of transformations to derive the density function for U .

6.25

In Exercise 6.11, we considered two electronic components that operate independently, each with life length governed by the exponential distribution with mean 1. We proceeded to use the method of distribution functions to obtain the distribution of the average length of life for the two components. Use the method of transformations to obtain the density function for the average life length of the two components.

Exercises

6.26

317

The Weibull density function is given by 1 my m−1 e−y m /α , f (y) = α 0,

y > 0, elsewhere,

where α and m are positive constants. This density function is often used as a model for the lengths of life of physical systems. Suppose Y has the Weibull density just given. Find a b

the density function of U = Y m . E(Y k ) for any positive integer k.

6.27

Let Y have an exponential distribution with mean β. √ a Prove that W = Y has a Weibull density with α = β and m = 2. b Use the result in Exercise 6.26(b) to give E(Y k/2 ) for any positive integer k.

6.28

Let Y have a uniform (0, 1) distribution. Show that U = −2 ln(Y ) has an exponential distribution with mean 2.

6.29

The speed of a molecule in a uniform gas at equilibrium is a random variable V whose density function is given by f (v) = av 2 e−bv , 2

v > 0,

where b = m/2kT and k, T , and m denote Boltzmann’s constant, the absolute temperature, and the mass of the molecule, respectively. a Derive the distribution of W = mV 2 /2, the kinetic energy of the molecule. b Find E(W ).

6.30

A ﬂuctuating electric current I may be considered a uniformly distributed random variable over the interval (9, 11). If this current ﬂows through a 2-ohm resistor, ﬁnd the probability density function of the power P = 2I 2 .

6.31

The joint distribution for the length of life of two different types of components operating in a system was given in Exercise 5.18 by $ f (y1 , y2 ) =

(1/8)y1 e−(y1 +y2 )/2 ,

y1 > 0, y2 > 0,

0,

elsewhere.

The relative efﬁciency of the two types of components is measured by U = Y2 /Y1 . Find the probability density function for U .

6.32

In Exercise 6.5, we considered a random variable Y that has a uniform distribution on the interval [1, 5]. The cost of delay is given by U = 2Y 2 + 3. Use the method of transformations to derive the density function of U .

6.33

The proportion of impurities in certain ore samples is a random variable Y with a density function given by $ f (y) =

(3/2)y 2 + y, 0 ≤ y ≤ 1, 0,

elsewhere.

The dollar value of such samples is U = 5−(Y/2). Find the probability density function for U .

318

Chapter 6

Functions of Random Variables

6.34

A density function sometimes used by engineers to model lengths of life of electronic components is the Rayleigh density, given by 2y e−y 2 /θ , y > 0, θ f (y) = 0, elsewhere. a If Y has the Rayleigh density, ﬁnd the probability density function for U = Y 2 . b Use the result of part (a) to ﬁnd E(Y ) and V (Y ).

6.35

Let Y1 and Y2 be independent random variables, both uniformly distributed on (0, 1). Find the probability density function for U = Y1 Y2 .

6.36

Refer to Exercise 6.34. Let Y1 and Y2 be independent Rayleigh-distributed random variables. Find the probability density function for U = Y 12 + Y 22 . [Hint: Recall Example 6.8.]

6.5 The Method of Moment-Generating Functions The moment-generating function method for ﬁnding the probability distribution of a function of random variables Y1 , Y2 , . . . , Yn is based on the following uniqueness theorem. THEOREM 6.1

Let m X (t) and m Y (t) denote the moment-generating functions of random variables X and Y , respectively. If both moment-generating functions exist and m X (t) = m Y (t) for all values of t, then X and Y have the same probability distribution. (The proof of Theorem 6.1 is beyond the scope of this text.) If U is a function of n random variables, Y1 , Y2 , . . . , Yn , the ﬁrst step in using Theorem 6.1 is to ﬁnd the moment-generating function of U : m U (t) = E(etU ). Once the moment-generating function for U has been found, it is compared with the moment-generating functions for random variables with well-known distributions. If m U (t) is identical to one of these, say, the moment-generating function for a random variable V , then, by Theorem 6.1, U and V possess identical probability distributions. The density functions, means, variances, and moment-generating functions for some frequently encountered random variables are presented in Appendix 2. We will illustrate the procedure with a few examples.

E X A M PL E 6.10

Suppose that Y is a normally distributed random variable with mean µ and variance σ 2 . Show that Y −µ Z= σ has a standard normal distribution, a normal distribution with mean 0 and variance 1.

6.5

319

The Method of Moment-Generating Functions

Solution

We have seen in Example 4.16 that Y − µ has moment-generating function et σ /2 . Hence, t 2 2 2 = e(t/σ ) (σ /2) = et /2 . m Z (t) = E(et Z ) = E[e(t/σ )(Y −µ) ] = m (Y −µ) σ On comparing m Z (t) with the moment-generating function of a normal random variable, we see that Z must be normally distributed with E(Z ) = 0 and V (Z ) = 1.

EXAMPLE 6.11

Let Z be a normally distributed random variable with mean 0 and variance 1. Use the method of moment-generating functions to ﬁnd the probability distribution of Z 2 .

Solution

2

The moment-generating function for Z 2 is " ∞ " 2 2 et z f (z) dz = m Z 2 (t) = E(et Z ) = "

−∞ ∞

∞

2

−z /2 2 e et z √ dz 2π −∞ 2

1 2 √ e−(z /2)(1−2t) dz. 2π −∞ This integral can be evaluated either by consulting a table of integrals or by noting that, if 1 − 2t > 0 (equivalently, t < 1/2), the integrand 2 2 + z z −1 exp − exp − (1 − 2t) (1 − 2t) 2 2 = √ √ 2π 2π is proportional to the density function of a normally distributed random variable with mean 0 and variance (1 − 2t)−1 . To make the integrand a normal density function (so that the deﬁnite integral is equal to 1), multiply the numerator and denominator by the standard deviation, (1 − 2t)−1/2 . Then 2 + " ∞ z 1 1 −1 m Z 2 (t) = exp − (1 − 2t) dz. √ (1 − 2t)1/2 −∞ 2π(1 − 2t)−1/2 2 =

Because the integral equals 1, if t < 1/2, 1 = (1 − 2t)−1/2 . m Z 2 (t) = (1 − 2t)1/2 A comparison of m Z 2 (t) with the moment-generating functions in Appendix 2 shows that m Z 2 (t) is identical to the moment-generating function for the gammadistributed random variable with α = 1/2 and β = 2. Thus, using Deﬁnition 4.10, Z 2 has a χ 2 distribution with ν = 1 degree of freedom. It follows that the density function for U = Z 2 is given by −1/2 −u/2 e u , u ≥ 0, fU (u) = (1/2)21/2 0, elsewhere.

320

Chapter 6

Functions of Random Variables

The method of moment-generating functions is often very useful for ﬁnding the distributions of sums of independent random variables. THEOREM 6.2

Let Y1 , Y2 , . . . , Yn be independent random variables with momentgenerating functions m Y1 (t), m Y2 (t), . . . , m Yn (t), respectively. If U = Y1 + Y2 + · · · + Yn , then m U (t) = m Y1 (t) × m Y2 (t) × · · · × m Yn (t).

Proof

We know that, because the random variables Y1 , Y2 , . . . , Yn are independent (see Theorem 5.9), m U (t) = E et (Y1 +···+Yn ) = E etY1 etY2 · · · etYn = E etY1 × E etY2 × · · · × E etYn . Thus, by the deﬁnition of moment-generating functions, m U (t) = m Y1 (t) × m Y2 (t) × · · · × m Yn (t).

E X A M PL E 6.12

The number of customer arrivals at a checkout counter in a given interval of time possesses approximately a Poisson probability distribution (see Section 3.8). If Y1 denotes the time until the ﬁrst arrival, Y2 denotes the time between the ﬁrst and second arrival, . . . , and Yn denotes the time between the (n − 1)st and nth arrival, then it can be shown that Y1 , Y2 , . . . , Yn are independent random variables, with the density function for Yi given by 1 −yi /θ , yi > 0, e f Yi (yi ) = θ 0, otherwise. [Because the Yi , for i = 1, 2, . . . , n, are exponentially distributed, it follows that E(Yi ) = θ; that is, θ is the average time between arrivals.] Find the probability density function for the waiting time from the opening of the counter until the nth customer arrives. (If Y1 , Y2 , . . . denote successive interarrival times, we want the density function of U = Y1 + Y2 + · · · + Yn .)

Solution

To use Theorem 6.2, we must ﬁrst know m Yi (t), i = 1, 2, . . . , n. Because each of the Yi ’s is exponentially distributed with mean θ , m Yi (t) = (1 − θt)−1 and, by Theorem 6.2, m U (t) = m Y1 (t) × m Y1 (t) × · · · × m Yn (t) = (1 − θt)−1 × (1 − θt)−1 × · · · × (1 − θt)−1 = (1 − θt)−n . This is the moment-generating function of a gamma-distributed random variable with α = n and β = θ . Theorem 6.1 implies that U actually has this gamma distribution and therefore that 1 (u n−1 e−u/θ ), u > 0, fU (u) = (n)θ n 0, elsewhere.

6.5

The Method of Moment-Generating Functions

321

The method of moment-generating functions can be used to establish some interesting and useful results about the distributions of functions of normally distributed random variables. Because these results will be used throughout Chapters 7–9, we present them in the form of theorems. THEOREM 6.3

Let Y1 , Y2 , . . . , Yn be independent normally distributed random variables with E(Yi ) = µi and V (Yi ) = σi2 , for i = 1, 2, . . . , n, and let a1 , a2 , . . . , an be constants. If n ai Yi = a1 Y1 + a2 Y2 + · · · + an Yn , U= i=1

then U is a normally distributed random variable with n ai µi = a1 µ1 + a2 µ2 + · · · + an µn E(U ) = i=1

and V (U ) =

n

ai2 σi2 = a12 σ12 + a22 σ22 + · · · + an2 σn2 .

i=1

Proof

Because Yi is normally distributed with mean µi and variance σi2 , Yi has moment-generating function given by σi2 t 2 . m Yi (t) = exp µi t + 2 [Recall that exp(·) is a more convenient way to write e(·) when the term in the exponent is long or complex.] Therefore, ai Yi has moment-generating function given by a2σ 2t 2 . m ai Yi (t) = E(etai Yi ) = m Yi (ai t) = exp µi ai t + i i 2 Because the random variables Yi are independent, the random variables ai Yi are independent, for i = 1, 2, . . . , n, and Theorem 6.2 implies that m U (t) = m a1 Y1 (t) × m a2 Y2 (t) × · · · × m an Yn (t) a2σ 2t 2 a2σ 2t 2 = exp µ1 a1 t + 1 1 × · · · × exp µn an t + n n 2 2 n n t2 ai µi + a2σ 2 . = exp t 2 i=1 i i i=1 n Thus, U has a normal distribution with mean i=1 ai µi and variance n 2 2 a σ . i=1 i i

THEOREM 6.4

Let Y1 , Y2 , . . . , Yn be deﬁned as in Theorem 6.3 and deﬁne Z i by Yi − µi , i = 1, 2, . . . , n. Zi = σi n Then i=1 Z i2 has a χ 2 distribution with n degrees of freedom.

322

Chapter 6

Functions of Random Variables

Proof

Because Yi is normally distributed with mean µi and variance σi2 , the result of Example 6.10 implies that Z i is normally distributed with mean 0 and variance 1. From Example 6.11, we then have that Z i2 is a χ 2 -distributed random variable with 1 degree of freedom. Thus, m Z i2 (t) = (1 − 2t)−1/2 , n Z i2 , and from Theorem 6.2, with V = i=1 m V (t) = m Z 12 (t) × m Z 22 (t) × · · · × m Z n2 (t) = (1 − 2t)−1/2 × (1 − 2t)−1/2 × · · · × (1 − 2t)−1/2 = (1 − 2t)−n/2 . Because moment-generating functions are unique, V has a χ 2 distribution with n degrees of freedom. Theorem 6.4 provides some clariﬁcation of the degrees of freedom associated with a χ 2 distribution. If n independent, standard normal random variables are squared and added together, the resulting sum has a χ 2 distribution with n degrees of freedom. Summary of the Moment-Generating Function Method Let U be a function of the random variables Y1 , Y2 , . . . , Yn . 1. Find the moment-generating function for U, m U (t). 2. Compare m U (t) with other well-known moment-generating functions. If m U (t) = m V (t) for all values of t, Theorem 6.1 implies that U and V have identical distributions.

Exercises 6.37

Let Y1 , Y2 , . . . , Yn be independent and identically distributed random variables such that for 0 < p < 1, P(Yi = 1) = p and P(Yi = 0) = q = 1 − p. (Such random variables are called Bernoulli random variables.) a Find the moment-generating function for the Bernoulli random variable Y1 . b Find the moment-generating function for W = Y1 + Y2 + · · · + Yn . c What is the distribution of W ?

6.38

Let Y1 and Y2 be independent random variables with moment-generating functions m Y1 (t) and m Y2 (t), respectively. If a1 and a2 are constants, and U = a1 Y1 + a2 Y2 show that the moment-generating function for U is m U (t) = m Y1 (a1 t) × m Y2 (a2 t).

6.39

In Exercises 6.11 and 6.25, we considered two electronic components that operate independently, each with a life length governed by the exponential distribution with mean 1. Use the method of moment-generating functions to obtain the density function for the average life length of the two components.

Exercises

323

6.40

Suppose that Y1 and Y2 are independent, standard normal random variables. Find the density function of U = Y12 + Y22 .

6.41

Let Y1 , Y2 , . . . , Yn be independent, normal random variables, each with mean µ and variance σ 2 . Let a1 , a2 , . . . , an denote known constants. Find the density function of the linear combination n U = i=1 ai Yi .

6.42

A type of elevator has a maximum weight capacity Y1 , which is normally distributed with mean 5000 pounds and standard deviation 300 pounds. For a certain building equipped with this type of elevator, the elevator’s load, Y2 , is a normally distributed random variable with mean 4000 pounds and standard deviation 400 pounds. For any given time that the elevator is in use, ﬁnd the probability that it will be overloaded, assuming that Y1 and Y2 are independent.

6.43

Refer to Exercise 6.41. Let Y1 , Y2 , . . . , Yn be independent, normal random variables, each with mean µ and variance σ 2 . n 1 a Find the density function of Y = Yi . n i=1 b c

If σ 2 = 16 and n = 25, what is the probability that the sample mean, Y , takes on a value that is within one unit of the population mean, µ? That is, ﬁnd P(|Y − µ| ≤ 1). If σ 2 = 16, ﬁnd P(|Y − µ| ≤ 1) if n = 36, n = 64, and n = 81. Interpret the results of your calculations.

*6.44

The weight (in pounds) of “medium-size” watermelons is normally distributed with mean 15 and variance 4. A packing container for several melons has a nominal capacity of 140 pounds. What is the maximum number of melons that should be placed in a single packing container if the nominal weight limit is to be exceeded only 5% of the time? Give reasons for your answer.

6.45

The manager of a construction job needs to ﬁgure prices carefully before submitting a bid. He also needs to account for uncertainty (variability) in the amounts of products he might need. To oversimplify the real situation, suppose that a project manager treats the amount of sand, in yards, needed for a construction project as a random variable Y1 , which is normally distributed with mean 10 yards and standard deviation .5 yard. The amount of cement mix needed, in hundreds of pounds, is a random variable Y2 , which is normally distributed with mean 4 and standard deviation .2. The sand costs $7 per yard, and the cement mix costs $3 per hundred pounds. Adding $100 for other costs, he computes his total cost to be U = 100 + 7Y1 + 3Y2 . If Y1 and Y2 are independent, how much should the manager bid to ensure that the true costs will exceed the amount bid with a probability of only .01? Is the independence assumption reasonable here?

6.46

Suppose that Y has a gamma distribution with α = n/2 for some positive integer n and β equal to some speciﬁed value. Use the method of moment-generating functions to show that W = 2Y /β has a χ 2 distribution with n degrees of freedom.

6.47

A random variable Y has a gamma distribution with α = 3.5 and β = 4.2. Use the result in Exercise 6.46 and the percentage points for the χ 2 distributions given in Table 6, Appendix 3, to ﬁnd P(Y > 33.627).

6.48

In a missile-testing program, one random variable of interest is the distance between the point at which the missile lands and the center of the target at which the missile was aimed. If we think of the center of the target as the origin of a coordinate system, we can let Y1 denote

324

Chapter 6

Functions of Random Variables

the north–south distance between the landing point and the target center and let Y2 denote the corresponding east–west distance. (Assume that north and east deﬁne positive directions.) The ( distance between the landing point and the target center is then U = Y12 + Y22 . If Y1 and Y2 are independent, standard normal random variables, ﬁnd the probability density function for U .

6.49

Let Y1 be a binomial random variable with n 1 trials and probability of success given by p. Let Y2 be another binomial random variable with n 2 trials and probability of success also given by p. If Y1 and Y2 are independent, ﬁnd the probability function of Y1 + Y2 .

6.50

Let Y be a binomial random variable with n trials and probability of success given by p. Show that n − Y is a binomial random variable with n trials and probability of success given by 1 − p.

6.51

Let Y1 be a binomial random variable with n 1 trials and p1 = .2 and Y2 be an independent binomial random variable with n 2 trials and p2 = .8. Find the probability function of Y1 + n 2 − Y2 .

6.52

Let Y1 and Y2 be independent Poisson random variables with means λ1 and λ2 , respectively. Find the a probability function of Y1 + Y2 . b conditional probability function of Y1 , given that Y1 + Y2 = m.

6.53

Let Y1 , Y2 , . . . , Yn be independent binomial random variable with n i trials and probability of success given by pi , i = 1, 2, . . . , n. n a If all of the n i ’s are equal and all of the p’s are equal, ﬁnd the distribution of i=1 Yi . n b If all of the n i ’s are different and all of the p’s are equal, ﬁnd the distribution of i=1 Yi . c If all of the n ’s are different and all of the p’s are equal, ﬁnd the conditional distribution i n Y1 given i=1 Yi = m. d If all of the n i ’s different and all of the p’s are equal, ﬁnd the conditional distribution are n Y1 + Y2 given i=1 Yi = m. e If all of the p’s are different, n does the method of moment-generating functions work well to ﬁnd the distribution of i=1 Yi ? Why?

6.54

Let Y1 , Y2 , . . . , Yn be independent Poisson random variables with means λ1 , λ2 , . . . , λn , respectively. Find the n a probability function of i=1 Yi . n b conditional probability function of Y1 , given that i=1 Yi = m. n c conditional probability function of Y1 + Y2 , given that i=1 Yi = m.

6.55

Customers arrive at a department store checkout counter according to a Poisson distribution with a mean of 7 per hour. In a given two-hour period, what is the probability that 20 or more customers will arrive at the counter?

6.56

The length of time necessary to tune up a car is exponentially distributed with a mean of .5 hour. If two cars are waiting for a tune-up and the service times are independent, what is the probability that the total time for the two tune-ups will exceed 1.5 hours? [Hint: Recall the result of Example 6.12.]

6.57

Let Y1 , Y2 , . . . , Yn be independent random variables such that each Yi has a gamma distribution with parameters αi and β. That is, the distributions of the Y ’s might have different α’s, but all have the same value for β. Prove that U = Y1 + Y2 + · · · + Yn has a gamma distribution with parameters α1 + α2 + · · · + αn and β.

6.58

We saw in Exercise 5.159 that the negative binomial random variable Y can be written as Y = ri=1 Wi , where W1 , W2 , . . . , Wr are independent geometric random variables with parameter p.

6.6

Multivariable Transformations Using Jacobians (Optional)

325

a Use this fact to derive the moment-generating function for Y . b Use the moment-generating function to show that E(Y ) = r/ p and V (Y ) = r (1 − p)/ p 2 . c Find the conditional probability function for W1 , given that Y = W1 + W2 + · · · + Wr = m.

6.59

Show that if Y1 has a χ 2 distribution with ν1 degrees of freedom and Y2 has a χ 2 distribution with ν2 degrees of freedom, then U = Y1 + Y2 has a χ 2 distribution with ν1 + ν2 degrees of freedom, provided that Y1 and Y2 are independent.

6.60

Suppose that W = Y1 + Y2 where Y1 and Y2 are independent. If W has a χ 2 distribution with ν degrees of freedom and W1 has a χ 2 distribution with ν1 < ν degrees of freedom, show that Y2 has a χ 2 distribution with ν − ν1 degrees of freedom.

6.61

Refer to Exercise 6.52. Suppose that W = Y1 + Y2 where Y1 and Y2 are independent. If W has a Poisson distribution with mean λ and W1 has a Poisson distribution with mean λ1 < λ, show that Y2 has a Poisson distribution with mean λ − λ1 .

*6.62

Let Y1 and Y2 be independent normal random variables, each with mean 0 and variance σ 2 . Deﬁne U1 = Y1 + Y2 and U2 = Y1 − Y2 . Show that U1 and U2 are independent normal random variables, each with mean 0 and variance 2σ 2 . [Hint: If (U1 , U2 ) has a joint moment-generating function m(t1 , t2 ), then U1 and U2 are independent if and only if m(t1 , t2 ) = m U1 (t1 )m U2 (t2 ).]

6.6 Multivariable Transformations Using Jacobians (Optional) If Y is a random variable with density function f Y (y), the method of transformations (Section 6.4) can be used to ﬁnd the density function for U = h(Y ), provided that h(y) is either increasing or decreasing for all y such that f Y (y) > 0. If h(y) is increasing or decreasing for all y in the support of f Y (y), the function h(·) is one-to-one, and there is an inverse function, h −1 (·) such that u = h −1 (y). Further, the density function for U is given by * −1 * * dh (u) * −1 *. fU (u) = f Y (h (u)) ** du * Suppose that Y1 and Y2 are jointly continuous random variables and that U1 = Y1 +Y2 and U2 = Y1 − Y2 . How can we ﬁnd the joint density function of U1 and U2 ? For the rest of this section, we will write the joint density of Y1 and Y2 as f Y1 ,Y2 (y1 , y2 ). Extending the ideas of Section 6.4, the support of the joint density f Y1 ,Y2 (y1 , y2 ) is the set of all values of (y1 , y2 ) such that f Y1 ,Y2 (y1 , y2 ) > 0. The Bivariate Transformation Method Suppose that Y1 and Y2 are continuous random variables with joint density function f Y1 ,Y2 (y1 , y2 ) and that for all (y1 , y2 ), such that f Y1 ,Y2 (y1 , y2 ) > 0, u 1 = h 1 (y1 , y2 ) and

u 2 = h 2 (y1 , y2 )

is a one-to-one transformation from (y1 , y2 ) to (u 1 , u 2 ) with inverse y1 = h −1 1 (u 1 , u 2 ) and

y2 = h −1 2 (u 1 , u 2 ).

326

Chapter 6

Functions of Random Variables

−1 If h −1 1 (u 1 , u 2 ) and h 2 (u 1 , u 2 ) have continuous partial derivatives with respect to u 1 and u 2 and Jacobian −1 ∂h −1 ∂h 1 1 ∂u 1 −1 −1 −1 ∂u 2 ∂h −1 ∂h 1 ∂h 2 2 ∂h 1 = − =

0, J = det −1 ∂h 2 ∂u 1 ∂u 2 ∂u 1 ∂u 2 ∂h −1 2 ∂u 1 ∂u 2

then the joint density of U1 and U2 is −1 fU1 ,U2 (u 1 , u 2 ) = f Y1 ,Y2 h −1 1 (u 1 , u 2 ), h 2 (u 1 , u 2 ) |J |, where |J | is the absolute value of the Jacobian. We will not prove this result, but it follows from calculus results used for change of variables in multiple integration. (Recall that sometimes double integrals are more easily calculated if we use polar coordinates instead of Euclidean coordinates; see Exercise 4.194.) The absolute value of the Jacobian, |J |, in the multivariate transformation is analogous to the quantity |dh −1 (u)/du| that is used when making the one-variable transformation U = h(Y ). A word of caution is in order. Be sure that the bivariate transformation u 1 = h 1 (y1 , y2 ), u 2 = h 2 (y1 , y2 ) is a one-to-one transformation for all (y1 , y2 ) such that f Y1 ,Y2 (y1 , y2 ) > 0. This step is easily overlooked. If the bivariate transformation is not one-to-one and this method is blindly applied, the resulting “density” function will not have the necessary properties of a valid density function. We illustrate the use of this method in the following examples. E X A M PL E 6.13

Solution

Let Y1 and Y2 be independent standard normal random variables. If U1 = Y1 + Y2 and U2 = Y1 − Y2 , both U1 and U2 are linear combinations of independent normally distributed random variables, and Theorem 6.3 implies that U1 is normally distributed with mean 0 + 0 = 0 and variance 1 + 1 = 2. Similarly, U2 has a normal distribution with mean 0 and variance 2. What is the joint density of U1 and U2 ? The density functions for Y1 and Y2 are e−(1/2)y1 , f 1 (y1 ) = √ 2π 2

−∞ < y1 < ∞

e−(1/2)y2 , −∞ < y2 < ∞, f 2 (y2 ) = √ 2π and the independence of Y1 and Y2 implies that their joint density is 2

e−(1/2)y1 −(1/2)y2 , −∞ < y1 < ∞, −∞ < y2 < ∞. 2π In this case f Y1 ,Y2 (y1 , y2 ) > 0 for all −∞ < y1 < ∞ and − ∞ < y2 < ∞, and we are interested in the transformation 2

2

f Y1 ,Y2 (y1 , y2 ) =

u 1 = y1 + y2 = h 1 (y1 , y2 ) and

u 2 = y1 − y2 = h 2 (y1 , y2 )

6.6

Multivariable Transformations Using Jacobians (Optional)

327

with inverse transformation y1 = (u 1 + u 2 )/2 = h −1 1 (u 1 , u 2 ) and

y2 = (u 1 − u 2 )/2 = h −1 2 (u 1 , u 2 ).

−1 −1 −1 Because ∂h −1 1 /∂u 1 = 1/2, ∂h 1 /∂u 2 = 1/2, ∂h 2 /∂u 1 = 1/2 and ∂h 2 /∂u 2 = −1/2, the Jacobian of this transformation is 1/2 1/2 = (1/2)(−1/2) − (1/2)(1/2) = −1/2 J = det 1/2 −1/2

and the joint density of U1 and U2 is [with exp(·) = e(·) ] . 2 1 u 1 −u 2 2 * * 2 exp − 12 u 1 +u −2 2 * 1 * −∞ < (u 1 + u 2 )/2 < ∞, 2 *− * , fU1 ,U2 (u 1 , u 2 ) = * 2 * −∞ < (u − u )/2 < ∞. 2π 1 2 A little algebra yields 1 u1 + u2 2 1 u1 − u2 2 1 1 − − = − u 21 − u 22 2 2 2 2 4 4 and {(u 1 , u 2 ) : −∞ < (u 1 + u 2 )/2 < ∞, −∞ < (u 1 − u 2 )/2 < ∞} = {(u 1 , u 2 ) : −∞ < u 1 < ∞, −∞ < u 2 < ∞}. √ √ √ √ Finally, because 4π = 2 2π 2 2π , e−u 1 /4 e−u 2 /4 fU1 ,U2 (u 1 , u 2 ) = √ √ √ √ , 2 2π 2 2π 2

2

−∞ < u 1 < ∞, −∞ < u 2 < ∞.

Notice that U1 and U2 are independent and normally distributed, both with mean 0 and variance 2. The extra information provided by the joint distribution of U1 and U2 is that the two variables are independent!

The multivariable transformation method is also useful if we are interested in a single function of Y1 and Y2 —say, U1 = h(Y1 , Y2 ). Because we have only one function of Y1 and Y2 , we can use the method of bivariate transformations to ﬁnd the joint distribution of U1 and another function U2 = h 2 (Y1 , Y2 ) and then ﬁnd the desired marginal density of U1 by integrating the joint density. Because we are really interested in only the distribution of U1 , we would typically choose the other function U2 = h 2 (Y1 , Y2 ) so that the bivariate transformation is easy to invert and the Jacobian is easy to work with. We illustrate this technique in the following example.

EXAMPLE 6.14

Let Y1 and Y2 be independent exponential random variables, both with mean β > 0. Find the density function of U=

Y1 . Y1 + Y2

328

Chapter 6

Functions of Random Variables

Solution

The density functions for Y1 and Y2 are, again using exp(·) = e(·) , 1 exp(−y /β), 0 < y , 1 1 f 1 (y1 ) = β 0, otherwise, and 1 exp(−y /β), 0 < y , 2 2 f 2 (y2 ) β 0, otherwise. Their joint density is 1 exp[−(y + y )/β], 0 < y , 0 < y , 1 2 1 2 f Y1 ,Y2 (y1 , y2 ) = β 2 0, otherwise, because Y1 and Y2 are independent. In this case, f Y1 ,Y2 (y1 , y2 ) > 0 for all (y1 , y2 ) such that 0 < y1 , 0 < y2 , and we are interested in the function U1 = Y1 /(Y1 + Y2 ). If we consider the function u 1 = y1 /(y1 + y2 ), there are obviously many values for (y1 , y2 ) that will give the same value for u 1 . Let us deﬁne y1 u1 = = h 1 (y1 , y2 ) and u 2 = y1 + y2 = h 2 (y1 , y2 ). y1 + y2 This choice of u 2 yields a convenient inverse transformation: y1 = u 1 u 2 = h −1 1 (u 1 , u 2 )

and

y2 = u 2 (1 − u 1 ) = h −1 2 (u 1 , u 2 ).

The Jacobian of this transformation is u2 u1 J = det = u 2 (1 − u 1 ) − (−u 2 )(u 1 ) = u 2 , −u 2 1 − u 1 and the joint density of U1 and U2 is fU1 ,U2 (u 1 , u 2 ) 1 exp {− [u u + u (1 − u )] /β} |u | , 0 < u u , 0 < u (1 − u ), 1 2 2 1 2 1 2 2 1 = β2 0, otherwise. In this case, fU1 ,U2 (u 1 , u 2 ) > 0 if u 1 and u 2 are such that 0 < u 1 u 2 , 0 < u 2 (1 − u 1 ). Notice that if 0 < u 1 u 2 , then 0 < u 2 (1 − u 1 ) = u 2 − u 1 u 2

⇔

0 < u1u2 < u2

⇔

0 < u 1 < 1.

If 0 < u 1 < 1, then 0 < u 2 (1 − u 1 ) implies that 0 < u 2 . Therefore, the region of support for the joint density of U1 and U2 is {(u 1 , u 2 ): 0 < u 1 < 1, 0 < u 2 }, and the joint density of U1 and U2 is given by 1 u e−u 2 /β , 0 < u < 1, 0 < u , 2 1 2 fU1 ,U2 (u 1 , u 2 ) = β 2 0, otherwise. Using Theorem 5.5 it is easily seen that U1 and U2 are independent. The marginal densities of U1 and U2 can be obtained by integrating the joint density derived earlier.

6.6

Multivariable Transformations Using Jacobians (Optional)

329

In Exercise 6.63 you will show that U1 is uniformly distributed over (0, 1) and that U2 has a gamma density with parameters α = 2 and β.

The technique described in this section can be viewed to be a one-step version of the two-step process illustrated in Example 6.9. In Example 6.14, it was more difﬁcult to ﬁnd the region of support (where the joint density is positive) than it was to ﬁnd the equation of the joint density function. As you will see in the next example and the exercises, this is often the case. EXAMPLE 6.15

In Example 6.9, we considered a random variables Y1 and Y2 with joint density function $ 2(1 − y1 ), 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, f Y1 ,Y2 (y1 , y2 ) = 0, elsewhere, and were interested in U = Y1 Y2 . Find the probability density function for U by using the bivariate transformation method.

Solution

In this case f Y1 ,Y2 (y1 , y2 ) > 0 for all (y1 , y2 ), such that 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, and we are interested in the function U2 = Y1 Y2 . If we consider the function u 2 = y1 y2 , this function alone is not a one-to-one function of the variables (y1 , y2 ). Consider u 1 = y1 = h 1 (y1 , y2 ) and

u 2 = y1 y2 = h 2 (y1 , y2 ).

For this choice of u 1 , and 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, the transformation from (y1 , y2 ) to (u 1 , u 2 ) is one-to-one and y1 = u 1 = h −1 1 (u 1 , u 2 ) The Jacobian is J = det

1

−u 2 /u 21

1/u 1

and

y2 = u 2 /u 1 = h −1 2 (u 1 , u 2 ).

= 1(1/u 1 ) − (−u 2 /u 21 )(0) = 1/u 1 .

The original variable of interest is U2 = Y1 Y2 , and the joint density of U1 and U2 is * * *1* 2(1 − u 1 ) ** **, 0 ≤ u 1 ≤ 1, 0 ≤ u 2 /u 1 ≤ 1, fU1 ,U2 (u 1 , u 2 ) = u1 0, otherwise. Because {(u 1 , u 2 ): 0 ≤ u 1 ≤ 1, 0 ≤ u 2 /u 1 ≤ 1} = {(u 1 , u 2 ): 0 ≤ u 2 ≤ u 1 ≤ 1}, the joint density of U1 and U2 is 1 2(1 − u 1 ) , 0 ≤ u 2 ≤ u 1 ≤ 1, fU1 ,U2 (u 1 , u 2 ) = u 1 0, otherwise. This joint density is exactly the same as the joint density obtained in Example 6.9 if we identify the variables Y1 and U used in Example 6.9 with the variables U1 and

330

Chapter 6

Functions of Random Variables

U2 , respectively, used here. With this identiﬁcation, the marginal density of U2 is precisely the density of U obtained in Example 6.9—that is, $ 2(u 2 − ln u 2 − 1), 0 ≤ u 2 ≤ 1, f 2 (u 2 ) = 0, elsewhere.

If Y1 , Y2 , . . . , Yk are jointly continuous random variables and U1 = h 1 (Y1 , Y2 , . . . , Yk ), U2 = h 2 (Y1 , Y2 , . . . , Yk ), . . . , Uk = h k (Y1 , Y2 , . . . , Yk ), where the transformation u 1 = h 1 (y1 , y2 , . . . , yk ), u 2 = h 2 (y1 , y2 , . . . , yk ), . . . , u k = h k (y1 , y2 , . . . , yk ) is a one-to-one transformation from (y1 , y2 , . . . , yk ) to (u 1 , u 2 , . . . , u k ) with inverse −1 y1 = h −1 1 (u 1 , u 2 , . . . , u k ), y2 = h 2 (u 1 , u 2 , . . . , u k ), . . . ,

yk = h −1 k (u 1 , u 2 , . . . , u k ), −1 −1 and h −1 1 (u 1 , u 2 , . . . , u k ), h 2 (u 1 , u 2 , . . . , u k ), . . . , h k (u 1 , u 2 , . . . , u k ) have continuous partial derivatives with respect to u 1 , u 2 , . . . , u k and Jacobian −1 ∂h 1 ∂h −1 ∂h −1 1 1 · · · ∂u ∂u 2 ∂u k 1 −1 −1 −1 ∂h 2 ∂h 2 ∂h 2 ··· ∂u 2 ∂u k

0, J = det ∂u 1 = . . . . . . . . . . . . −1 −1 ∂h k ∂h −1 ∂h k k ··· ∂u 1 ∂u 2 ∂u k

then a result analogous to the one presented in this section can be used to ﬁnd the joint density of U1 , U2 , . . . , Uk . This requires the user to ﬁnd the determinant of a k × k matrix, a skill that is not required in the rest of this text. For more details, see “References and Further Readings” at the end of the chapter.

Exercises *6.63

In Example 6.14, Y1 and Y2 were independent exponentially distributed random variables, both with mean β. We deﬁned U1 = Y1 /(Y1 + Y2 ) and U2 = Y1 + Y2 and determined the joint density of (U1 , U2 ) to be 1 u e−u 2 /β , 0 < u < 1, 0 < u , 2 1 2 fU1 ,U2 (u 1 , u 2 ) = β 2 0, otherwise. a Show that U1 is uniformly distributed over the interval (0, 1). b Show that U2 has a gamma density with parameters α = 2 and β. c Establish that U1 and U2 are independent.

Exercises

*6.64

331

Refer to Exercise 6.63 and Example 6.14. Suppose that Y1 has a gamma distribution with parameters α1 and β, that Y1 is gamma distributed with parameters α2 and β, and that Y1 and Y2 are independent. Let U1 = Y1 /(Y1 + Y2 ) and U2 = Y1 + Y2 . a Derive the joint density function for U1 and U2 . b Show that the marginal distribution of U1 is a beta distribution with parameters α1 and α2 . c Show that the marginal distribution of U2 is a gamma distribution with parameters α = α1 + α2 and β. d Establish that U1 and U2 are independent.

6.65

Let Z 1 and Z 2 be independent standard normal random variables and U1 = Z 1 and U2 = Z1 + Z2. a b c d

*6.66

Derive the joint density of U1 and U2 . Use Theorem 5.12 to give E(U1 ), E(U2 ), V (U1 ), V (U2 ), and Cov(U1 , U2 ). Are U1 and U2 independent? Why? Refer to Section 5.10. Show that U1 and U2 have a bivariate normal distribution. Identify all the parameters of the appropriate bivariate normal distribution.

Let (Y1 , Y2 ) have joint density function f Y1 ,Y2 (y1 , y2 ) and let U1 = Y1 + Y2 and U2 = Y2 . a

Show that the joint density of (U1 , U2 ) is fU1 , U2 (u 1 , u 2 ) = f Y1 ,Y2 (u 1 − u 2 , u 2 ).

b

Show that the marginal density function for U1 is " ∞ fU1 (u 1 ) = f Y1 ,Y2 (u 1 − u 2 , u 2 ) du 2 .

c

If Y1 and Y2 are independent, show that the marginal density function for U1 is " ∞ fU1 (u 1 ) = f Y1 (u 1 − u 2 ) f Y2 (u 2 ) du 2 .

−∞

−∞

That is, that the density of Y1 + Y2 is the convolution of the densities f Y1 (·) and f Y2 (·)

*6.67

Let (Y1 , Y2 ) have joint density function f Y1 ,Y2 (y1 , y2 ) and let U1 = Y1 /Y2 and U2 = Y2 . a

Show that the joint density of (U1 , U2 ) is fU1 , U2 (u 1 , u 2 ) = f Y1 ,Y2 (u 1 u 2 , u 2 )|u 2 |.

b

Show that the marginal density function for U1 is " ∞ fU1 (u 1 ) = f Y1 ,Y2 (u 1 u 2 , u 2 )|u 2 | du 2 .

c

If Y1 and Y2 are independent, show that the marginal density function for U1 is " ∞ fU1 (u 1 ) = f Y1 (u 1 u 2 ) f Y2 (u 2 )|u 2 | du 2 .

−∞

−∞

*6.68

Let Y1 and Y2 have joint density function $ 8y1 y2 , f Y1 ,Y2 (y1 , y2 ) = 0, and U1 = Y1 /Y2 and U2 = Y2 .

0 ≤ y1 < y2 ≤ 1, otherwise,

332

Chapter 6

Functions of Random Variables

a Derive the joint density function for (U1 , U2 ). b Show that U1 and U2 are independent.

*6.69

The random variables Y1 and Y2 are independent, both with density 1 , 1 < y, f (y) = y 2 0, otherwise. Let U1 =

Y1 and U2 = Y1 + Y2 . Y1 + Y2

a What is the joint density of Y1 and Y2 ? b Show that the joint density of U1 and U2 is given by 1/u 1 < u 2 , 0 < u 1 < 1/2 and 1 , fU1 ,U2 (u 1 , u 2 ) = u 21 (1 − u 1 )2 u 32 1/(1 − u 1 ) < u 2 , 1/2 ≤ u 1 ≤ 1, 0, otherwise. c Sketch the region where fU1 ,U2 (u 1 , u 2 ) > 0. d Show that the marginal density of U1 is 1 , 0 ≤ u 1 < 1/2, 2(1 − u 1 )2 1 fU1 (u 1 ) = , 1/2 ≤ u 1 ≤ 1, 2 2u 1 0, otherwise. e

*6.70

Are U1 and U2 are independent? Why or why not?

Suppose that Y1 and Y2 are independent and that both are uniformly distributed on the interval (0, 1), and let U1 = Y1 + Y2 and U2 = Y1 − Y2 . a

Show that the joint density of U1 and U2 is given by 1/2, −u 1 < u 2 < u 1 , 0 < u 1 < 1 and fU1 ,U2 (u 1 , u 2 ) = u 1 − 2 < u 2 < 2 − u 1 , 1 ≤ u 1 < 2, 0, otherwise.

b Sketch the region where fU1 ,U2 (u 1 , u 2 ) > 0. c Show that the marginal density of U1 is 0 < u 1 < 1, u1, fU1 (u 1 ) = 2 − u 1 , 1 ≤ u 1 < 2, 0, otherwise. d

e

*6.71

Show that the marginal density of U2 is 1 + u2, fU2 (u 2 ) = 1 − u 2 , 0,

−1 < u 2 < 0, 0 ≤ u 1 < 1, otherwise.

Are U1 and U2 independent? Why or why not?

Suppose that Y1 and Y2 are independent exponentially distributed random variables, both with mean β, and deﬁne U1 = Y1 + Y2 and U2 = Y1 /Y2 .

6.7

a Show that the joint density of (U1 , U2 ) is 1 1 u e−u 1 /β , 1 (1 + u 2 )2 fU1 ,U2 (u 1 , u 2 ) = β 2 0, b

Order Statistics

333

0 < u1, 0 < u2, otherwise.

Are U1 and U2 are independent? Why?

6.7 Order Statistics Many functions of random variables of interest in practice depend on the relative magnitudes of the observed variables. For instance, we may be interested in the fastest time in an automobile race or the heaviest mouse among those fed on a certain diet. Thus, we often order observed random variables according to their magnitudes. The resulting ordered variables are called order statistics. Formally, let Y1 , Y2 , . . . , Yn denote independent continuous random variables with distribution function F(y) and density function f (y). We denote the ordered random variables Yi by Y(1) , Y(2) , . . . , Y(n) , where Y(1) ≤ Y(2) ≤ · · · ≤ Y(n) . (Because the random variables are continuous, the equality signs can be ignored.) Using this notation, Y(1) = min(Y1 , Y2 , . . . , Yn ) is the minimum of the random variables Yi , and Y(n) = max(Y1 , Y2 , . . . , Yn ) is the maximum of the random variables Yi . The probability density functions for Y(1) and Y(n) can be found using the method of distribution functions. We will derive the density function of Y(n) ﬁrst. Because Y(n) is the maximum of Y1 , Y2 , . . . , Yn , the event (Y(n) ≤ y) will occur if and only if the events (Yi ≤ y) occur for every i = 1, 2, . . . , n. That is, P(Y(n) ≤ y) = P(Y1 ≤ y, Y2 ≤ y, . . . , Yn ≤ y). Because the Yi are independent and P(Yi ≤ y) = F(y) for i = 1, 2, . . . , n, it follows that the distribution function of Y(n) is given by FY(n) (y) = P(Y(n) ≤ y) = P(Y1 ≤ y)P(Y2 ≤ y) · · · P(Yn ≤ y) = [F(y)]n . Letting g(n) (y) denote the density function of Y(n) , we see that, on taking derivatives of both sides, g(n) (y) = n[F(y)]n−1 f (y). The density function for Y(1) can be found in a similar manner. The distribution function of Y(1) is FY(1) (y) = P(Y(1) ≤ y) = 1 − P(Y(1) > y). Because Y(1) is the minimum of Y1 , Y2 , . . . , Yn , it follows that the event (Y(1) > y) occurs if and only if the events (Yi > y) occur for i = 1, 2, . . . , n. Because the Yi are

334

Chapter 6

Functions of Random Variables

independent and P(Yi > y) = 1 − F(y) for i = 1, 2, . . . , n, we see that FY(1) (y) = P(Y(1) ≤ y) = 1 − P(Y(1) > y) = 1 − P(Y1 > y, Y2 > y, . . . , Yn > y) = 1 − [P(Y1 > y)P(Y2 > y) · · · P(Yn > y)] = 1 − [1 − F(y)]n . Thus, if g(1) (y) denotes the density function of Y(1) , differentiation of both sides of the last expression yields g(1) (y) = n[1 − F(y)]n−1 f (y). Let us now consider the case n = 2 and ﬁnd the joint density function for Y(1) and Y(2) . The event (Y(1) ≤ y1 , Y(2) ≤ y2 ) means that either (Y1 ≤ y1 , Y2 ≤ y2 ) or (Y2 ≤ y1 , Y1 ≤ y2 ). [Notice that Y(1) could be either Y1 or Y2 , whichever is smaller.] Therefore, for y1 ≤ y2 , P(Y(1) ≤ y1 , Y(2) ≤ y2 ) is equal to the probability of the union of the two events (Y1 ≤ y1 , Y2 ≤ y2 ) and (Y2 ≤ y1 , Y1 ≤ y2 ). That is, P(Y(1) ≤ y1 , Y(2) ≤ y2 ) = P[(Y1 ≤ y1 , Y2 ≤ y2 ) ∪ (Y2 ≤ y1 , Y1 ≤ y2 )]. Using the additive law of probability and recalling that y1 ≤ y2 , we see that P(Y(1) ≤ y1 , Y(2) ≤ y2 ) = P(Y1 ≤ y1 , Y2 ≤ y2 ) + P(Y2 ≤ y1 , Y1 ≤ y2 ) − P(Y1 ≤ y1 , Y2 ≤ y1 ). Because Y1 and Y2 are independent and P(Yi ≤ w) = F(w), for i = 1, 2, it follows that, for y1 ≤ y2 , P(Y(1) ≤ y1 , Y(2) ≤ y2 ) = F(y1 )F(y2 ) + F(y2 )F(y1 ) − F(y1 )F(y1 ) = 2F(y1 )F(y2 ) − [F(y1 )]2 . If y1 > y2 (recall that Y(1) ≤ Y(2) ), P(Y(1) ≤ y1 , Y(2) ≤ y2 ) = P(Y(1) ≤ y2 , Y(2) ≤ y2 ) = P(Y1 ≤ y2 , Y2 ≤ y2 ) = [F(y2 )]2 . Summarizing, the joint distribution function of Y(1) and Y(2) is $ 2F(y1 )F(y2 ) − [F(y1 )]2 , y1 ≤ y2 , FY(1) Y(2) (y1 , y2 ) = y1 > y2 . [F(y2 )]2 , Letting g(1)(2) (y1 , y2 ) denote the joint density of Y(1) and Y(2) , we see that, on differentiating ﬁrst with respect to y2 and then with respect to y1 , $ 2 f (y1 ) f (y2 ), y1 ≤ y2 , g(1)(2) (y1 , y2 ) = 0, elsewhere. The same method can be used to ﬁnd the joint density of Y(1) , Y(2) , . . . , Y(n) , which turns out to be $ n! f (y1 ) f (y2 ), . . . , f (yn ), y1 ≤ y2 ≤ · · · ≤ yn , g(1)(2)···(n) (y1 , y2 , . . . , yn ) = 0, elsewhere.

6.7

Order Statistics

335

The marginal density function for any of the order statistics can be found from this joint density function, but we will not pursue this matter formally in this text. EXAMPLE 6.16

Electronic components of a certain type have a length of life Y , with probability density given by $ (1/100)e−y/100 , y > 0, f (y) = 0, elsewhere. (Length of life is measured in hours.) Suppose that two such components operate independently and in series in a certain system (hence, the system fails when either component fails). Find the density function for X , the length of life of the system.

Solution

Because the system fails at the ﬁrst component failure, X = min(Y1 , Y2 ), where Y1 and Y2 are independent random variables with the given density. Then, because F(y) = 1 − e−y/100 , for y ≥ 0, f X (y) = g(1) (y) = n[1 − F(y)]n−1 f (y) $ −y/100 (1/100)e−y/100 , y > 0, 2e = 0, elsewhere, and it follows that $ (1/50)e−y/50 , y > 0, f X (y) = 0, elsewhere. Thus, the minimum of two exponentially distributed random variables has an exponential distribution. Notice that the mean length of life for each component is 100 hours, whereas the mean length of life for the system is E(X ) = E(Y(1) ) = 50 = 100/2.

EXAMPLE 6.17

Solution

Suppose that the components in Example 6.16 operate in parallel (hence, the system does not fail until both components fail). Find the density function for X , the length of life of the system. Now X = max(Y1 , Y2 ), and f X (y) = g(2) (y) = n[F(y)]n−1 f (y) $ 2(1 − e−y/100 )(1/100)e−y/100 , y > 0, = 0, elsewhere, and, therefore, $ (1/50)(e−y/100 − e−y/50 ), y > 0, f X (y) = 0, elsewhere. We see here that the maximum of two exponential random variables is not an exponential random variable.

336

Chapter 6

Functions of Random Variables

Although a rigorous derivation of the density function of the kth-order statistic (k an integer, 1 < k < n) is somewhat complicated, the resulting density function has an intuitively sensible structure. Once that structure is understood, the density can be written down with little difﬁculty. Think of the density function of a continuous random variable at a particular point as being proportional to the probability that the variable is “close” to that point. That is, if Y is a continuous random variable with density function f (y), then P(y ≤ Y ≤ y + dy) ≈ f (y) dy. Now consider the kth-order statistic, Y(k) . If the kth-largest value is near yk , then k − 1 of the Y ’s must be less than yk , one of the Y ’s must be near yk , and the remaining n −k of the Y ’s must be larger than yk . Recall the multinomial distribution, Section 5.9. In the present case, we have three classes of values of Y : Class 1: Y ’s that have values less than yk need k − 1. Class 2: Y ’s that have values near yk need 1. Class 3: Y ’s that have values larger than yk need n − k. The probabilities of each of these classes are, respectively, p1 = P(Y < yk ) = F(yk ), p2 = P(yk ≤ Y ≤ yk +dyk ) ≈ f (yk )dyk , and p3 = P(y > yk ) = 1− F(yk ). Using the multinomial probabilities discussed earlier, we see that P(yk ≤ Y(k) ≤ yk + dyk ) ≈ P[(k − 1) from class 1, 1 from class 2, (n − k) from class 3] n ≈ p k−1 p21 p3n−k k−1 1 n−k 1 % & n! [F(yk )]k−1 f (yk ) dyk [1 − F(yk )]n−k ≈ (k − 1)! 1! (n − k)! and g(k) (yk ) dyk ≈

n! F k−1 (yk ) f (yk ) [1 − F(yk )]n−k dyk . (k − 1)! 1! (n − k)!

The density of the kth-order statistic and the joint density of two-order statistics are given in the following theorem.

THEOREM 6.5

Let Y1 , . . . , Yn be independent identically distributed continuous random variables with common distribution function F(y) and common density function f (y). If Y(k) denotes the kth-order statistic, then the density function of Y(k) is given by n! [F(yk )]k−1 [1 − F(yk )]n−k f (yk ), g(k) (yk ) = (k − 1)! (n − k)! −∞ < yk < ∞.

6.7

Order Statistics

337

If j and k are two integers such that 1 ≤ j < k ≤ n, the joint density of Y( j) and Y(k) is given by g( j)(k) (y j , yk ) =

n! [F(y j )] j−1 ( j − 1)! (k − 1 − j)! (n − k)! × [F(yk ) − F(y j )]k−1− j × [1 − F(yk )]n−k f (y j ) f (yk ), −∞ < y j < yk < ∞.

The heuristic, intuitive derivation of the joint density given in Theorem 6.5 is similar to that given earlier for the density of a single order statistic. For y j < yk , the joint density can be interpreted as the probability that the jth largest observation is close to y j and the kth largest is close to yk . Deﬁne ﬁve classes of values of Y : Class 1: Y ’s that have values less than y j need j − 1. Class 2: Y ’s that have values near y j need 1. Class 3: Y ’s that have values between y j and yk need k − 1 − j. Class 4: Y ’s that have values near yk need 1. Class 5: Y ’s that have values larger than yk need n − k. Again, use the multinomial distribution to complete the heuristic argument.

EXAMPLE 6.18

Suppose that Y1 , Y2 , . . . , Y5 denotes a random sample from a uniform distribution deﬁned on the interval (0, 1). That is, $ 1, 0 ≤ y ≤ 1, f (y) = 0, elsewhere. Find the density function for the second-order statistic. Also, give the joint density function for the second- and fourth-order statistics.

Solution

The distribution function associated with each of the Y ’s is 0, y < 0, F(y) = y, 0 ≤ y ≤ 1, 1, y > 1. The density function of the second-order statistic, Y(2) , can be obtained directly from Theorem 6.5 with n = 5, k = 2. Thus, with f (y) and F(y) as noted, 5! [F(y2 )]2−1 [1 − F(y2 )]5−2 f (y2 ), −∞ < y2 < ∞, g(2) (y2 ) = (2 − 1)! (5 − 2)! $ 0 ≤ y2 ≤ 1, 20y2 (1 − y2 )3 , = 0, elsewhere. The preceding density is a beta density with α = 2 and β = 4. In general, the kthorder statistic based on a sample of size n from a uniform (0, 1) distribution has a beta density with α = k and β = n − k + 1.

338

Chapter 6

Functions of Random Variables

The joint density of the second- and fourth-order statistics is readily obtained from the second result in Theorem 6.5. With f (y) and F(y) as before, j = 2, k = 4, and n = 5, 5! [F(y2 )]2−1 [F(y4 ) − F(y2 )]4−1−2 g(2)(4) (y2 , y4 ) = (2 − 1)! (4 − 1 − 2)! (5 − 4)! × [1 − F(y4 )]5−4 f (y2 ) f (y4 ), −∞ < y2 < y4 < ∞ $ 0 ≤ y2 < y4 ≤ 1 5! y2 (y4 − y2 )(1 − y4 ), = 0, elsewhere. Of course, this joint density can be used to evaluate joint probabilities about Y(2) and Y(4) or to evaluate the expected value of functions of these two variables.

Exercises 6.72

Let Y1 and Y2 be independent and uniformly distributed over the interval (0, 1). Find a b

6.73

As in Exercise 6.72, let Y1 and Y2 be independent and uniformly distributed over the interval (0, 1). Find a b

6.74

the probability density function of U1 = min(Y1 , Y2 ). E (U1 ) and V (U1 ).

the probability density function of U2 = max(Y1 , Y2 ). E (U2 ) and V (U2 ).

Let Y1 , Y2 , . . . , Yn be independent, uniformly distributed random variables on the interval [0, θ ]. Find the a probability distribution function of Y(n) = max(Y1 , Y2 , . . . , Yn ). b density function of Y(n) . c mean and variance of Y(n) .

6.75

Refer to Exercise 6.74. Suppose that the number of minutes that you need to wait for a bus is uniformly distributed on the interval [0, 15]. If you take the bus ﬁve times, what is the probability that your longest wait is less than 10 minutes?

*6.76

Let Y1 , Y2 , . . . , Yn be independent, uniformly distributed random variables on the interval [0, θ ]. a

Find the density function of Y(k) , the kth-order statistic, where k is an integer between 1 and n. b Use the result from part (a) to ﬁnd E(Y(k) ). c Find V (Y(k) ). d Use the result from part (c) to ﬁnd E(Y(k) − Y(k−1) ), the mean difference between two successive order statistics. Interpret this result.

*6.77

Let Y1 , Y2 , . . . , Yn be independent, uniformly distributed random variables on the interval [0, θ ]. a Find the joint density function of Y( j) and Y(k) where j and k are integers 1 ≤ j < k ≤ n. b Use the result from part (a) to ﬁnd Cov(Y( j) , Y(k) ) when j and k are integers 1 ≤ j < k ≤ n.

Exercises

339

c Use the result from part (b) and Exercise 6.76 to ﬁnd V (Y(k) − Y( j) ), the variance of the difference between two order statistics.

6.78

Refer to Exercise 6.76. If Y1 , Y2 , . . . , Yn are independent, uniformly distributed random variables on the interval [0, 1], show that Y(k) , the kth-order statistic, has a beta density function with α = k and β = n − k + 1.

6.79

Refer to Exercise 6.77. If Y1 , Y2 , . . . , Yn are independent, uniformly distributed random variables on the interval [0, θ ], show that U = Y(1) /Y(n) and Y(n) are independent.

6.80

Let Y1 , Y2 , . . . , Yn be independent random variables, each with a beta distribution, with α = β = 2. Find a the probability distribution function of Y(n) = max(Y1 , Y2 , . . . , Yn ). b the density function of Y(n) . c E(Y(n) ) when n = 2.

6.81

Let Y1 , Y2 , . . . , Yn be independent, exponentially distributed random variables with mean β. a Show that Y(1) = min(Y1 , Y2 , . . . , Yn ) has an exponential distribution, with mean β/n. b If n = 5 and β = 2, ﬁnd P(Y(1) ≤ 3.6).

6.82

If Y is a continuous random variable and m is the median of the distribution, then m is such that P(Y ≤ m) = P(Y ≥ m) = 1/2. If Y1 , Y2 , . . . , Yn are independent, exponentially distributed random variables with mean β and median m, Example 6.17 implies that Y(n) = max(Y1 , Y2 , . . . , Yn ) does not have an exponential distribution. Use the general form of FY(n) (y) to show that P(Y(n) > m) = 1 − (.5)n .

6.83

Refer to Exercise 6.82. If Y1 , Y2 , . . . , Yn is a random sample from any continuous distribution with mean m, what is P(Y(n) > m)?

6.84

Refer to Exercise 6.26. The Weibull density function is given by 1 my m−1 e−y m /α , y > 0, f (y) = α 0, elsewhere, where α and m are positive constants. If a random sample of size n is taken from a Weibull distributed population, ﬁnd the distribution function and density function for Y(1) = min(Y1 , Y2 , . . . , Yn ). Does Y(1) = have a Weibull distribution?

6.85

Let Y1 and Y2 be independent and uniformly distributed over the interval (0, 1). Find P(2Y(1) < Y(2) ).

*6.86

Let Y1 , Y2 , . . . , Yn be independent, exponentially distributed random variables with mean β. Give the a density function for Y(k) , the kth-order statistic, where k is an integer between 1 and n. b joint density function for Y( j) and Y(k) where j and k are integers 1 ≤ j < k ≤ n.

6.87

The opening prices per share Y1 and Y2 of two similar stocks are independent random variables, each with a density function given by $ (1/2)e−(1/2)(y−4) , y ≥ 4, f (y) = 0, elsewhere. On a given morning, an investor is going to buy shares of whichever stock is less expensive. Find the

340

Chapter 6

Functions of Random Variables

a probability density function for the price per share that the investor will pay. b expected cost per share that the investor will pay.

6.88

Suppose that the length of time Y it takes a worker to complete a certain task has the probability density function given by $ f (y) =

e−(y−θ ) ,

y > θ,

0,

elsewhere,

where θ is a positive constant that represents the minimum time until task completion. Let Y1 , Y2 , . . . , Yn denote a random sample of completion times from this distribution. Find a b

the density function for Y(1) = min(Y1 , Y2 , . . . , Yn ). E(Y(1) ).

*6.89

Let Y1 , Y2 , . . . , Yn denote a random sample from the uniform distribution f (y) = 1, 0 ≤ y ≤ 1. Find the probability density function for the range R = Y(n) − Y(1) .

*6.90

Suppose that the number of occurrences of a certain event in time interval (0, t) has a Poisson distribution. If we know that n such events have occurred in (0, t), then the actual times, measured from 0, for the occurrences of the event in question form an ordered set of random variables, which we denote by W(1) ≤ W(2) ≤ · · · ≤ W(n) . [W(i) actually is the waiting time from 0 until the occurrence of the ith event.] It can be shown that the joint density function for W(1) , W(2) , . . . , W(n) is given by n! , f (w 1 , w 2 , . . . , w n ) = t n 0,

w1 ≤ w2 ≤ · · · ≤ wn, elsewhere.

[This is the density function for an ordered sample of size n from a uniform distribution on the interval (0, t).] Suppose that telephone calls coming into a switchboard follow a Poisson distribution with a mean of ten calls per minute. A slow period of two minutes’ duration had only four calls. Find the a probability that all four calls came in during the ﬁrst minute; that is, ﬁnd P(W(4) ≤ 1). b expected waiting time from the start of the two-minute period until the fourth call.

*6.91

Suppose that n electronic components, each having an exponentially distributed length of life with mean θ , are put into operation at the same time. The components operate independently and are observed until r have failed (r ≤ n). Let W j denote the length of time until the jth failure, with W1 ≤ W2 ≤ · · · ≤ Wr . Let T j = W j − W j−1 for j ≥ 2 and T1 = W1 . Notice that T j measures the time elapsed between successive failures. a Show that T j , for j = 1, 2, . . . , r , has an exponential distribution with mean θ/(n − j + 1). b Show that Ur =

r j=1

W j + (n − r )Wr =

r

(n − j + 1)T j

j=1

and hence that E(Ur ) = r θ . [Ur is called the total observed life, and we can use Ur /r as an approximation to (or “estimator” of ) θ.]

Supplementary Exercises

341

6.8 Summary This chapter has been concerned with ﬁnding probability distributions for functions of random variables. This is an important problem in statistics because estimators of population parameters are functions of random variables. Hence, it is necessary to know something about the probability distributions of these functions (or estimators) in order to evaluate the goodness of our statistical procedures. A discussion of estimation will be presented in Chapters 8 and 9. The methods for ﬁnding probability distributions for functions of random variables are the distribution function method (Section 6.3), the transformation method (Section 6.4), and the moment-generating-function method (Section 6.5). It should be noted that no particular method is best for all situations because the method of solution depends a great deal upon the nature of the function involved. If U1 and U2 are two functions of the continuous random variables Y1 and Y2 , the joint density function for U1 and U2 can be found using the Jacobian technique in Section 6.6. Facility for handling these methods can be achieved only through practice. The exercises at the end of each section and at the end of the chapter provide a good starting point. The density functions of order statistics were presented in Section 6.7. Some special functions of random variables that are particularly useful in statistical inference will be considered in Chapter 7.

References and Further Readings Casella, G., and R. L. Berger. 2002. Statistical Inference, 2d ed. Paciﬁc Grove, Calif.: Duxbury. Hoel, P. G. 1984. Introduction to Mathematical Statistics, 5th ed. New York: Wiley. Hogg, R. V., A. T. Craig, and J. W. McKean. 2005. Introduction to Mathematical Statistics, 6th ed. Upper Saddle River, N.J.: Pearson Prentice Hall. Mood, A. M., F. A. Graybill, and D. Boes. 1974. Introduction to the Theory of Statistics, 3d ed. New York: McGraw-Hill. Parzen, E. 1992. Modern Probability Theory and Its Applications. New York: Wiley-Interscience.

Supplementary Exercises 6.92

If Y1 and Y2 are independent and identically distributed normal random variables with mean µ and variance σ 2 , ﬁnd the probability density function for U = (1/2)(Y1 − 3Y2 ).

6.93

When current I ﬂows through resistance R, the power generated is given by W = I 2 R. Suppose that I has a uniform distribution over the interval (0, 1) and R has a density function given by $ 2r, 0 ≤ r ≤ 1, f (r ) = 0, elsewhere. Find the probability density function for W . (Assume that I is independent of R.)

342

Chapter 6

Functions of Random Variables

6.94

Two efﬁciency experts take independent measurements Y1 and Y2 on the length of time workers take to complete a certain task. Each measurement is assumed to have the density function given by $ (1/4)ye−y/2 , y > 0, f (y) = 0, elsewhere. Find the density function for the average U = (1/2)(Y1 + Y2 ). [Hint: Use the method of moment-generating functions.]

6.95

Let Y1 and Y2 be independent and uniformly distributed over the interval (0, 1). Find the probability density function of each of the following: a U1 = Y1 /Y2 . b U2 = −ln (Y1 Y2 ). c U3 = Y1 Y2 .

6.96

Suppose that Y1 is normally distributed with mean 5 and variance 1 and Y2 is normally distributed with mean 4 and variance 3. If Y1 and Y2 are independent, what is P(Y1 > Y2 )?

6.97

Suppose that Y1 is a binomial random variable with four trials and success probability .2 and that Y2 is an independent binomial random variable with three trials and success probability .5. Let W = Y1 + Y2 . According to Exercise 6.53(e), W does not have a binomial distribution. Find the probability mass function for W . [Hint: P(W = 0) = P(Y1 = 0, Y2 = 0); P(W = 1) = P(Y1 = 1, Y2 = 0) + P(Y1 = 0, Y2 = 1); etc.]

*6.98

The length of time that a machine operates without failure is denoted by Y1 and the length of time to repair a failure is denoted by Y2 . After a repair is made, the machine is assumed to operate like a new machine. Y1 and Y2 are independent and each has the density function $ −y e , y > 0, f (y) = 0, elsewhere. Find the probability density function for U = Y1 /(Y1 + Y2 ), the proportion of time that the machine is in operation during any one operation–repair cycle.

*6.99

Refer to Exercise 6.98. Show that U , the proportion of time that the machine is operating during any one operation–repair cycle, is independent of Y1 + Y2 , the length of the cycle.

6.100

The time until failure of an electronic device has an exponential distribution with mean 15 months. If a random sample of ﬁve such devices are tested, what is the probability that the ﬁrst failure among the ﬁve devices occurs a after 9 months? b before 12 months?

*6.101

A parachutist wants to land at a target T , but she ﬁnds that she is equally likely to land at any point on a straight line (A, B), of which T is the midpoint. Find the probability density function of the distance between her landing point and the target. [Hint: Denote A by −1, B by +1, and T by 0. Then the parachutist’s landing point has a coordinate X , which is uniformly distributed between −1 and +1. The distance between X and T is |X |.]

6.102

Two sentries are sent to patrol a road 1 mile long. The sentries are sent to points chosen independently and at random along the road. Find the probability that the sentries will be less than 1/2 mile apart when they reach their assigned posts.

*6.103

Let Y1 and Y2 be independent, standard normal random variables. Find the probability density function of U = Y1 /Y2 .

Supplementary Exercises

6.104

343

Let Y1 and Y2 be independent random variables, each having the same geometric distribution. a Find P(Y1 = Y2 ) = P(Y1 − Y2 = 0). [Hint: Your answer will involve evaluating an inﬁnite geometric series. The results in Appendix A1.11 will be useful.] b Find P(Y1 − Y2 = 1). *c If U = Y1 − Y2 , ﬁnd the (discrete) probability function for U . [Hint: Part (a) gives P(U = 0), and part (b) gives P(U = 1). Consider the positive and negative integer values for U separately.]

6.105

A random variable Y has a beta distribution of the second kind, if, for α > 0 and β > 0, its density is y α−1 , y > 0, f Y (y) = B(α, β)(1 + y)α+β 0, elsewhere. Derive the density function of U = 1/(1 + Y ).

6.106

If Y is a continuous random variable with distribution function F(y), ﬁnd the probability density function of U = F(Y ).

6.107

Let Y be uniformly distributed over the interval (−1, 3). Find the probability density function of U = Y 2 .

6.108

If Y denotes the length of life of a component and F(y) is the distribution function of Y , then P(Y > y) = 1− F(y) is called the reliability of the component. Suppose that a system consists of four components with identical reliability functions, 1 − F(y), operating as indicated in Figure 6.10. The system operates correctly if an unbroken chain of components is in operation between A and B. If the four components operate independently, ﬁnd the reliability of the system in terms of F(y).

F I G U R E 6.10 Circuit diagram

C3

A

C1

C2

B

C4

6.109

The percentage of alcohol in a certain compound is a random variable Y , with the following density function: $ 20y 3 (1 − y), 0 < y < 1 f (y) = 0, otherwise. Suppose that the compound’s selling price depends on its alcohol content. Speciﬁcally, if 1/3 < y < 2/3, the compound sells for C1 dollars per gallon; otherwise, it sells for C2 dollars per gallon. If the production cost is C3 dollars per gallon, ﬁnd the probability distribution of the proﬁt per gallon.

344

Chapter 6

Functions of Random Variables

6.110

An engineer has observed that the gap times between vehicles passing a certain point on a highway have an exponential distribution with mean 10 seconds. Find the a probability that the next gap observed will be no longer than one minute. b probability density function for the sum of the next four gap times to be observed. What assumptions are necessary for this answer to be correct?

*6.111

If a random variable U is normally distributed with mean µ and variance σ 2 and Y = eU [equivalently, U = ln(Y )], then Y is said to have a log-normal distribution. The log-normal distribution is often used in the biological and physical sciences to model sizes, by volume or weight, of various quantities, such as crushed coal particles, bacteria colonies, and individual animals. Let U and Y be as stated. Show that a the density function for Y is 1 2 2 e−(ln y−µ) /(2σ ) , √ f (y) = yσ 2π 0, b

*6.112

µ+(σ 2 /2)

2µ+σ 2

y > 0, elsewhere.

σ2

and V (Y ) = e (e − 1). [Hint: Recall that E(Y ) = E(eU ) and E(Y ) = e 2 2U E(Y ) = E(e ), where U is normally distributed with mean µ and variance σ 2 . Recall that the moment-generating function of U is m U (t) = etU .]

If a random variable U has a gamma distribution with parameters α > 0 and β > 0, then Y = eU [equivalently, U = ln(Y )] is said to have a log-gamma distribution. The log-gamma distribution is used by actuaries as part of an important model for the distribution of insurance claims. Let U and Y be as stated. a Show that the density function for Y is 1 y −(1+β)/β (ln y)α−1 , f (y) = (α)β α 0,

y > 1, elsewhere.

b If β < 1, show that E(Y ) = (1 − β)−α . [See the hint for part (c).] c If β < .5, show that V (Y ) = (1 − 2β)−α − (1 − β)−2α . [Hint: Recall that E(Y ) = E(eU ) and E(Y 2 ) = E(e2U ), where U is gamma distributed with parameters α > 0 and β > 0, and that the moment-generating function of a gamma-distributed random variable only exists if t < β −1 ; see Example 4.13.]

*6.113

Let (Y1 , Y2 ) have joint density function f Y1 ,Y2 (y1 , y2 ) and let U1 = Y1 Y2 and U2 = Y2 . a Show that the joint density of (U1 , U2 ) is fU1 , U2 (u 1 , u 2 ) = f Y1 ,Y2

u1 , u2 u2

1 . |u 2 |

b

Show that the marginal density function for U1 is " ∞ u1 1 du 2 . fU1 (u 1 ) = f Y1 ,Y2 , u2 u |u 2 2| −∞

c

If Y1 and Y2 are independent, show that the marginal density function for U1 is " ∞ u1 1 f Y2 (u 2 ) fU1 (u 1 ) = du 2 . f Y1 u2 |u 2 | −∞

Supplementary Exercises

345

*6.114

A machine produces spherical containers whose radii vary according to the probability density function given by $ 2r, 0 ≤ r ≤ 1, f (r ) = 0, elsewhere. Find the probability density function for the volume of the containers.

*6.115

Let v denote the volume of a three-dimensional ﬁgure. Let Y denote the number of particles observed in volume v and assume that Y has a Poisson distribution with mean λv. The particles might represent pollution particles in air, bacteria in water, or stars in the heavens. a If a point is chosen at random within the volume v, show that the distance R to the nearest particle has the probability density function given by 3 4λπr 2 e−(4/3)λπr , r > 0, f (r ) = 0, elsewhere. b

*6.116

If R is as in part (a), show that U = R 3 has an exponential distribution.

Let (Y1 , Y2 ) have joint density function f Y1 ,Y2 (y1 , y2 ) and let U1 = Y1 − Y2 and U2 = Y2 . a Show that the joint density of (U1 , U2 ) is fU1 , U2 (u 1 , u 2 ) = f Y1 ,Y2 (u 1 + u 2 , u 2 ). b

Show that the marginal density function for U1 is " ∞ fU1 (u 1 ) = f Y1 ,Y2 (u 1 + u 2 , u 2 ) du 2 .

c

If Y1 and Y2 are independent, show that the marginal density function for U1 is " ∞ fU1 (u 1 ) = f Y1 (u 1 + u 2 ) f Y2 (u 2 ) du 2 .

−∞

−∞

CHAPTER

7

Sampling Distributions and the Central Limit Theorem 7.1 Introduction 7.2 Sampling Distributions Related to the Normal Distribution 7.3 The Central Limit Theorem 7.4 A Proof of the Central Limit Theorem (Optional) 7.5 The Normal Approximation to the Binomial Distribution 7.6 Summary References and Further Readings

7.1 Introduction In Chapter 6, we presented methods for ﬁnding the distributions of functions of random variables. Throughout this chapter, we will be working with functions of the variables Y1 , Y2 , . . . , Yn observed in a random sample selected from a population of interest. As discussed in Chapter 6, the random variables Y1 , Y2 , . . . , Yn are independent and have the same distribution. Certain functions of the random variables observed in a sample are used to estimate or make decisions about unknown population parameters. For example, suppose that we want to estimate a population mean µ. If we obtain a random sample of n observations, y1 , y2 , . . . , yn , it seems reasonable to estimate µ with the sample mean y=

n 1 yi . n i=1

The goodness of this estimate depends on the behavior of the random variables n Y1 , Y2 , . . . , Yn and the effect that this behavior has on Y = (1/n) i=1 Yi . Notice that the random variable Y is a function of (only) the random variables Y1 , Y2 , . . . , Yn and the (constant) sample size n. The random variable Y is therefore an example of a statistic. 346

7.1

DEFINITION 7.1

Introduction

347

A statistic is a function of the observable random variables in a sample and known constants. You have already encountered many statistics, the sample mean Y , the sample variance S 2 , Y(n) = max(Y1 , Y2 , . . . , Yn ), Y(1) = min(Y1 , Y2 , . . . , Yn ), the range R = Y(n) − Y(1) , the sample median, and so on. Statistics are used to make inferences (estimates or decisions) about unknown population parameters. Because all statistics are functions of the random variables observed in a sample, all statistics are random variables. Consequently, all statistics have probability distributions, which we will call their sampling distributions. From a practical point of view, the sampling distribution of a statistic provides a theoretical model for the relative frequency histogram of the possible values of the statistic that we would observe through repeated sampling. The next example provides a sampling distribution of the sample mean when sampling from a familiar population, the one associated with tossing a balanced die.

E X A M PL E 7.1

A balanced die is tossed three times. Let Y1 , Y2 , and Y3 denote the number of spots observed on the upper face for tosses 1, 2, and 3, respectively. Suppose we are interested in Y = (Y1 + Y2 + Y3 )/3, the average number of spots observed in a sample of size 3. What are the mean, µY , and standard deviation, σY , of Y ? How can we ﬁnd the sampling distribution of Y ?

Solution

In Exercise 3.22, you showed that µ = E(Yi ) = 3.5 and σ 2 = V (Yi ) = 2.9167, i = 1, 2, 3. Since Y1 , Y2 and Y3 are independent random variables, the result derived in Example 5.27 (using Theorem 5.12) implies that √ 2.9167 σ2 E(Y ) = µ = 3.5, = = .9722, σY = .9722 = .9860. V (Y ) = 3 3 How can we derive the distribution of the random variable Y ? The possible values of the random variable W = Y1 + Y2 + Y3 are 3, 4, 5, . . . , 18 and Y = W/3. Because the die is balanced, each of the 63 = 216 distinct values of the multivariate random variable (Y1 , Y2 , Y3 ) are equally likely and P(Y1 = y1 , Y2 = y2 , Y3 = y3 ) = p(y1 , y2 , y3 ) = 1/216, yi = 1, 2, . . . , 6, i = 1, 2, 3. Therefore, P(Y = 1) = P(W = 3) = p(1, 1, 1) = 1/216 P(Y = 4/3) = P(W = 4) = p(1, 1, 2) + p(1, 2, 1) + p(2, 1, 1) = 3/216 P(Y = 5/3) = P(W = 5) = p(1, 1, 3) + p(1, 3, 1) + p(3, 1, 1) . . .

+ p(1, 2, 2) + p(2, 1, 2) + p(2, 2, 1) = 6/216

The probabilities P(Y = i/3), i = 7, 8, . . . , 18 are obtained similarly.

348

Chapter 7

Sampling Distributions and the Central Limit Theorem

F I G U R E 7.1 (a) Simulated sampling distribution for Y, Example 7.1; (b) mean and standard deviation of the 4000 simulated values of Y

Frequency

Number of Rolls = 4000

516

387

258

129

1

2

3

4

5

6 Mean of 3 Dice

(a)

Pop Prob: (1) 0.167 (2) 0.167 (3) 0.167 (4) 0.167 (5) 0.167 (6) 0.167 Population: Mean = 3.500 StDev = 1.708 Samples = 4000 of size 3 Mean = 3.495 StDev = 0.981 +/− 1 StDev: 0.683 +/− 2 StDev: 0.962 +/− 3 StDev: 1.000 (b)

The derivation of the sampling distribution of the random variable Y sketched in Example 7.1 utilizes the sample point approach that was introduced in Chapter 2. Although it is not difﬁcult to complete the calculations in Example 7.1 and give the exact sampling distribution for Y , the process is tedious. How can we get an idea about the shape of this sampling distribution without going to the bother of completing these calculations? One way is to simulate the sampling distribution by taking repeated independent samples each of size 3, computing the observed value y for each sample, and constructing a histogram of these observed values. The result of one such simulation is given in Figure 7.1(a), a plot obtained using the applet DiceSample (accessible at www.thomsonedu.com/statistics/wackerly). What do you observe in Figure 7.1(a)? As predicted, the maximum observed value of Y is 6, and the minimum value is 1. Also, the values obtained in the simulation accumulate in a mound-shaped manner approximately centered on 3.5, the theoretical mean of Y . In Figure 7.1(b), we see that the average and standard deviation of the 4000 simulated values of Y are very close to the theoretical values obtained in Example 7.1.

Exercises

349

Some of the exercises at the end of this section use the applet DiceSample to explore the simulated sampling distribution of Y for different sample sizes and for die tosses involving loaded dice. Other applets are used to simulate the sampling distributions for the mean and variance of samples taken from a mound-shaped distribution. Like the simulated sampling distributions that you will observe in the exercises, the form of the theoretical sampling distribution of any statistic will depend upon the distribution of the observable random variables in the sample. In the next section, we will use the methods of Chapter 6 to derive the sampling distributions for some statistics used to make inferences about the parameters of a normal distribution.

Exercises 7.1

Applet Exercise In Example 7.1, we derived the mean and variance of the random variable Y based on a sample of size 3 from a familiar population, the one associated with tossing a balanced die. Recall that if Y denotes the number of spots observed on the upper face on a single toss of a balanced die, as in Exercise 3.22, P(Y = i) = 1/6,

i = 1, 2, . . . , 6,

µ = E(Y ) = 3.5, Var(Y ) = 2.9167. Use the applet DiceSample (at www.thomsonedu.com/statistics/wackerly) to complete the following. a Use the button “Roll One Set” to take a sample of size 3 from the die-tossing population. What value did you obtain for the mean of this sample? Where does this value fall on the histogram? Is the value that you obtained equal to one of the possible values associated with a single toss of a balanced die? Why or why not? b Use the button “Roll One Set” again to obtain another sample of size 3 from the die-tossing population. What value did you obtain for the mean of this new sample? Is the value that you obtained equal to the value you obtained in part (a)? Why or why not? c Use the button “Roll One Set” eight more times to obtain a total of ten values of the sample mean. Look at the histogram of these ten means. What do you observe? How many different values for the sample mean did you obtain? Were any values observed more than once? d Use the button “Roll 10 Sets” until you have obtained and plotted 100 realized values for the sample mean, Y . What do you observe about the shape of the histogram of the 100 realized values? Click on the button “Show Stats” to see the mean and standard deviation of the 100 values (y 1 , y 2 , . . . , y 100 ) that you observed. How does the average of the 100 values of y i , i = 1, 2, . . . , 100 compare to E(Y ), the expected number of spots on a single toss of a balanced die? (Notice that the mean and standard deviation of Y that you computed in Exercise 3.22 are given on the second line of the “Stat Report” pop-up screen.) e How does the standard deviation of the 100 values of y i , i = 1, 2, . . . , 100 compare to the standard deviation of Y given on the second line of the “Stat Report” pop-up screen? f Click the button “Roll 1000 Sets” a few times, observing changes to the histogram as you generate more and more realized values of the sample mean. How does the resulting histogram compare to the graph given in Figure 7.1(a)?

350

Chapter 7

Sampling Distributions and the Central Limit Theorem

7.2

Refer to Example 7.1 and Exercise 7.1. a Use the method of Example 7.1 to ﬁnd the exact value of P(Y = 2). b Refer to the histogram obtained in Exercise 7.1(d). How does the relative frequency with which you observed Y = 2 compare to your answer to part (a)? c If you were to generate 10,000 values of Y , what do you expect to obtain for the relative frequency of observing Y = 2?

7.3

Applet Exercise Refer to Exercise 7.1. Use the applet DiceSample and scroll down to the next part of the screen that corresponds to taking samples of size n = 12 from the population corresponding to tossing a balanced die. a Take a single sample of size n = 12 by clicking the button “Roll One Set.” Use the button “Roll One Set” to generate nine more values of the sample mean. How does the histogram of observed values of the sample mean compare to the histogram observed in Exercise 7.1(c) that was based on ten samples each of size 3? b Use the button “Roll 10 Sets” nine more times until you have obtained and plotted 100 realized values (each based on a sample of size n = 12) for the sample mean Y . Click on the button “Show Stats” to see the mean and standard deviation of the 100 values (y 1 , y 2 , . . . , y 100 ) that you observed. How does the average of these 100 values of y i , i = 1, 2, . . . , 100 compare to the average of the 100 values (based on samples of size n = 3) that you obtained in Exercise 7.1(d)? ii Divide the standard deviation of the 100 values of y i , i = 1, 2, . . . , 100 based on samples of size 12 that you just obtained by the standard deviation of the 100 values (based on samples of size n = 3) that you obtained in Exercise 7.1. Why do you expect to get a value close to 1/2? [Hint: V (Y ) = σ 2 /n.] i

c Click on the button “Toggle Normal.” The (green) continuous density function plotted over the histogram is that of a normal random variable with mean and standard deviation equal to the mean and standard deviation of the 100 values, (y 1 , y 2 , . . . , y 100 ), plotted on the histogram. Does this normal distribution appear to reasonably approximate the distribution described by the histogram?

7.4

Applet Exercise The population corresponding to the upper face observed on a single toss of a balanced die is such that all six possible values are equally likely. Would the results analogous to those obtained in Exercises 7.1 and 7.2 be observed if the die was not balanced? Access the applet DiceSample and scroll down to the part of the screen dealing with “Loaded Die.” a If the die is loaded, the six possible outcomes are not equally likely. What are the probabilities associated with each outcome? Click on the buttons “1 roll,” “10 rolls,” and/or “1000 rolls” until you have a good idea of the probabilities associated with the values 1, 2, 3, 4, 5, and 6. What is the general shape of the histogram that you obtained? b Click the button “Show Stats” to see the true values of the probabilities of the six possible values. If Y is the random variable denoting the number of spots on the uppermost face, what is the value for µ = E(Y )? What is the value of σ , the standard deviation of Y ? [Hint: These values appear on the “Stat Report” screen.] c How many times did you simulate rolling the die in part (a)? How do the mean and standard deviation of the values that you simulated compare to the true values µ = E(Y ) and σ ? Simulate 2000 more rolls and answer the same question. d Scroll down to the portion of the screen labeled “Rolling 3 Loaded Dice.” Click the button “Roll 1000 Sets” until you have generated 3000 observed values for the random variable Y .

Exercises

i ii

351

What is the general shape of the simulated sampling distribution that you obtained? How does the mean of the 3000 values y 1 , y 2 , . . . , y 3000 compare to the value of µ = E(Y )√ computed in part (a)? How does the standard deviation of the 3000 values compare to σ/ 3?

e Scroll down to the portion of the screen labeled “Rolling 12 Loaded Dice.” i

In part (ii), you will use the applet to generate 3000 samples of size 12, compute the mean of each observed sample, and plot these means on a histogram. Before using the applet, predict the approximate value that you will obtain for the mean and standard deviation of the 3000 values of y that you are about to generate. ii Use the applet to generate 3000 samples of size 12 and obtain the histogram associated with the respective sample means, y i , i = 1, 2, . . . , 3000. What is the general shape of the simulated sampling distribution that you obtained? Compare the shape of this simulated sampling distribution with the one you obtained in part (d). iii Click the button “Show Stats” to observe the mean and standard deviation of the 3000 values y 1 , y 2 , . . . , y 3000 . How do these values compare to those you predicted in part (i)?

7.5

Applet Exercise What does the sampling distribution of the sample mean look like if samples are taken from an approximately normal distribution? Use the applet Sampling Distribution of the Mean (at www.thomsonedu.com/statistics/wackerly) to complete the following. The population to be sampled is approximately normally distributed with µ = 16.50 and σ = 6.03 (these values are given above the population histogram and denoted M and S, respectively). a

Use the button “Next Obs” to select a single value from the approximately normal population. Click the button four more times to complete a sample of size 5. What value did you obtain for the mean of this sample? Locate this value on the bottom histogram (the histogram for the values of Y ). b Click the button “Reset” to clear the middle graph. Click the button “Next Obs” ﬁve more times to obtain another sample of size 5 from the population. What value did you obtain for the mean of this new sample? Is the value that you obtained equal to the value you obtained in part (a)? Why or why not? c Use the button “1 Sample” eight more times to obtain a total of ten values of the sample mean. Look at the histogram of these ten means. i What do you observe? ii How does the mean of these 10 y-values compare to the population mean µ? d

Use the button “1 Sample” until you have obtained and plotted 25 realized values for the sample mean Y , each based on a sample of size 5. What do you observe about the shape of the histogram of the 25 values of y i , i = 1, 2, . . . , 25? ii How does the value of the standard deviation of the 25 y values compare with the theoretical value for σY obtained in Example 5.27 where we showed that, if Y is computed based on a sample of size n, then V (Y ) = σ 2 /n? i

e

Click the button “1000 Samples” a few times, observing changes to the histogram as you generate more and more realized values of the sample mean. What do you observe about the shape of the resulting histogram for the simulated sampling distribution of Y ? f Click the button “Toggle Normal” to overlay (in green) the normal distribution with the same mean and standard deviation as the set of values of Y that you previously

352

Chapter 7

Sampling Distributions and the Central Limit Theorem

generated. Does this normal distribution appear to be a good approximation to the sampling distribution of Y ?

7.6

Applet Exercise What is the effect of the sample size on the sampling distribution of Y ? Use the applet SampleSize to complete the following. As in Exercise 7.5, the population to be sampled is approximately normally distributed with µ = 16.50 and σ = 6.03 (these values are given above the population histogram and denoted M and S, respectively). a

Use the up/down arrows in the left “Sample Size” box to select one of the small sample sizes that are available and the arrows in the right “Sample Size” box to select a larger sample size. b Click the button “1 Sample” a few times. What is similar about the two histograms that you generated? What is different about them? c Click the button “1000 Samples” a few times and answer the questions in part (b). d Are the means and standard deviations of the two sampling distributions close to the values that you expected? [Hint: V (Y ) = σ 2 /n.] e Click the button “Toggle Normal.” What do you observe about the adequacy of the approximating normal distributions?

7.7

Applet Exercise What does the sampling distribution of the sample variance look like if we sample from a population with an approximately normal distribution? Find out using the applet Sampling Distribution of the Variance (Mound Shaped Population) (at www.thomsonedu.com/ statistics/wackerly) to complete the following. a Click the button “Next Obs” to take a sample of size 1 from the population with distribution represented by the top histogram. The value obtained is plotted on the middle histogram. Click four more times to complete a sample of size 5. The value of the sample variance is computed and given above the middle histogram. Is the value of the sample variance equal to the value of the population variance? Does this surprise you? b When you completed part (a), the value of the sample variance was also plotted on the lowest histogram. Click the button “Reset” and repeat the process in part (a) to generate a second observed value for the sample variance. Did you obtain the same value as you observed in part (a)? Why or why not? c Click the button “1 Sample” a few times. You will observe that different samples lead to different values of the sample variance. Click the button “1000 Samples” a few times to quickly generate a histogram of the observed values of the sample variance (based on samples of size 5). What is the mean of the values of the sample variance that you generated? Is this mean close to the value of the population variance? d In the previous exercises in this section, you obtained simulated sampling distributions for the sample mean. All these sampling distributions were well approximated (for large sample sizes) by a normal distribution. Although the distribution that you obtained is moundshaped, does the sampling distribution of the sample variance seem to be symmetric (like the normal distribution)? e Click the button “Toggle Theory” to overlay the theoretical density function for the sampling distribution of the variance of a sample of size 5 from a normally distributed population. Does the theoretical density provide a reasonable approximation to the values represented in the histogram? f Theorem 7.3, in the next section, states that if a random sample of size n is taken from a normally distributed population, then (n − 1)S 2 /σ 2 has a χ 2 distribution with (n − 1) degrees of freedom. Does this result seem consistent with what you observed in parts (d) and (e)?

7.2

7.8

Sampling Distributions Related to the Normal Distribution

353

Applet Exercise What is the effect of the sample size on the sampling distribution of S 2 ? Use the applet VarianceSize to complete the following. As in some previous exercises, the population to be sampled is approximately normally distributed with µ = 16.50 and σ = 6.03. a What is the value of the population variance σ 2 ? b Use the up/down arrows in the left “Sample Size” box to select one of the small sample sizes that are available and the arrows in the right “Sample Size” box to select a larger sample size. i Click the button “1 Sample” a few times. What is similar about the two histograms that you generated? What is different about them? ii Click the button “1000 Samples” a few times and answer the questions in part (i). iii Are the means of the two sampling distributions close to the value of the population variance? Which of the two sampling distributions exhibits smaller variability? iv Click the button “Toggle Theory.” What do you observe about the adequacy of the approximating theoretical distributions? c

Select sample sizes of 10 and 50 for a new simulation and click the button “1000 Samples” a few times i ii

Which of the sampling distributions appear to be more similar to a normal distribution? Refer to Exercise 7.7(f). In Exercise 7.97, you will show that, for a large number of degrees of freedom, the χ 2 distribution can be approximated by a normal distribution. Does this seem reasonable based on your current simulation?

7.2 Sampling Distributions Related to the Normal Distribution We have already noted that many phenomena observed in the real world have relative frequency distributions that can be modeled adequately by a normal probability distribution. Thus, in many applied problems, it is reasonable to assume that the observable random variables in a random sample, Y1 , Y2 , . . . , Yn , are independent with the same normal density function. In Exercise 6.43, you established that the statistic Y = (1/n)(Y1 + Y2 + · · · + Yn ) actually has a normal distribution. Because this result is used so often in our subsequent discussions, we present it formally in the following theorem.

THEOREM 7.1

Let Y1 , Y2 , . . . , Yn be a random sample of size n from a normal distribution with mean µ and variance σ 2 . Then n 1 Y = Yi n i=1 is normally distributed with mean µY = µ and variance σY2 = σ 2 /n.

354

Chapter 7

Sampling Distributions and the Central Limit Theorem

Proof

Because Y1 , Y2 , . . . , Yn is a random sample from a normal distribution with mean µ and variance σ 2 , Yi , i = 1, 2, . . . , n, are independent, normally distributed variables, with E(Yi ) = µ and V (Yi ) = σ 2 . Further, n 1 1 1 1 Y = Yi = (Y1 ) + (Y2 ) + · · · + (Yn ) n i=1 n n n = a1 Y1 + a2 Y2 + · · · + an Yn ,

where ai = 1/n, i = 1, 2, . . . , n.

Thus, Y is a linear combination of Y1 , Y2 , . . . , Yn , and Theorem 6.3 can be applied to conclude that Y is normally distributed with 1 1 1 1 (Y1 ) + · · · + (Yn ) = (µ) + · · · + (µ) = µ E(Y ) = E n n n n and

1 1 1 1 (Y1 ) + · · · + (Yn ) = 2 (σ 2 ) + · · · + 2 (σ 2 ) V (Y ) = V n n n n =

σ2 1 . (nσ 2 ) = 2 n n

That is, the sampling distribution of Y is normal with mean µY = µ and variance σY2 = σ 2 /n.

Notice that the variance of each of the random variables Y1 , Y2 , . . . , Yn is σ 2 and the variance of the sampling distribution of the random variable Y is σ 2 /n. In the discussions that follow, we will have occasion to refer to both of these variances. The notation σ 2 will be retained for the variance of the random variables Y1 , Y2 , . . . , Yn , and σY2 will be used to denote the variance of the sampling distribution of the random variable Y . Analogously, σ will be retained as the notation for the standard deviation of the Yi ’s, and the standard deviation of the sampling distribution of Y is denoted σY . Under the conditions of Theorem 7.1, Y is normally distributed with mean µY = µ and variance σY2 = σ 2 /n. It follows that Y − µY Y −µ √ Y −µ = Z= √ = n σY σ σ/ n has a standard normal distribution. We will illustrate the use of Theorem 7.1 in the following example. E X A M PL E 7.2

A bottling machine can be regulated so that it discharges an average of µ ounces per bottle. It has been observed that the amount of ﬁll dispensed by the machine is normally distributed with σ = 1.0 ounce. A sample of n = 9 ﬁlled bottles is randomly selected from the output of the machine on a given day (all bottled with the same machine setting), and the ounces of ﬁll are measured for each. Find the probability that the sample mean will be within .3 ounce of the true mean µ for the chosen machine setting.

7.2

Solution

Sampling Distributions Related to the Normal Distribution

355

If Y1 , Y2 , . . . , Y9 denote the ounces of ﬁll to be observed, then we know that the Yi ’s are normally distributed with mean µ and variance σ 2 = 1 for i = 1, 2, . . . , 9. Therefore, by Theorem 7.1, Y possesses a normal sampling distribution with mean µY = µ and variance σY2 = σ 2 /n = 1/9. We want to ﬁnd P(|Y − µ| ≤ .3) = P[−.3 ≤ (Y − µ) ≤ .3] .3 Y −µ .3 =P − √ ≤ √ ≤ √ . σ/ n σ/ n σ/ n √ Because (Y − µY )/σY = (Y − µ)/(σ/ n) has a standard normal distribution, it follows that

.3 .3 P(|Y − µ| ≤ .3) = P − √ ≤ Z ≤ √ 1/ 9 1/ 9 = P(−.9 ≤ Z ≤ .9).

Using Table 4, Appendix 3, we ﬁnd P(−.9 ≤ Z ≤ .9) = 1 − 2P(Z > .9) = 1 − 2(.1841) = .6318. Thus, the probability is only .6318 that the sample mean will be within .3 ounce of the true population mean.

E X A M PL E 7.3 Solution

Refer to Example 7.2. How many observations should be included in the sample if we wish Y to be within .3 ounce of µ with probability .95? Now we want P(|Y − µ| ≤ .3) = P[−.3 ≤ (Y − µ) ≤ .3] = .95. √ Dividing each term of the inequality by σY = σ/ n (recall that σ = 1), we have √ √ Y −µ −.3 .3 P ≤ √ = P(−.3 n ≤ Z ≤ .3 n) = .95. √ ≤ √ σ/ n σ/ n σ/ n But using Table 4, Appendix 3, we obtain P(−1.96 ≤ Z ≤ 1.96) = .95. It must follow that √ .3 n = 1.96

or, equivalently,

n=

1.96 .3

2 = 42.68.

From a practical perspective, it is impossible to take a sample of size 42.68. Our solution indicates that a sample of size 42 is not quite large enough to reach our objective. If n = 43, P(|Y − µ| ≤ .3) slightly exceeds .95.

356

Chapter 7

Sampling Distributions and the Central Limit Theorem

In succeeding chapters we will be interested in statistics that are functions of the squares of the observations in a random sample from a normal population. Theorem 7.2 establishes the sampling distribution of the sum of the squares of independent, standard normal random variables. THEOREM 7.2

Let Y1 , Y2 , . . . , Yn be deﬁned as in Theorem 7.1. Then Z i = (Yi − µ)/σ are independent, standard normal random variables, i = 1, 2, . . . , n, and n n Yi − µ 2 Z i2 = σ i=1 i=1 has a χ 2 distribution with n degrees of freedom (df).

Proof

Because Y1 , Y2 , . . . , Yn is a random sample from a normal distribution with mean µ and variance σ 2 , Example 6.10 implies that Z i = (Yi − µ)/σ has a standard normal distribution for i = 1, 2, . . . , n. Further, the random variables Z i are independent because n the2 random2 variables Yi ’s are independent, i = 1, Z i has a χ distribution with n df follows directly 2, . . . , n. The fact that i=1 from Theorem 6.4. From Table 6, Appendix 3, we can ﬁnd values χα2 so that P χ 2 > χα2 = α for random variables with χ 2 distributions (see Figure 7.2). For example, if the χ 2 2 random variable of interest has 10 df, Table 6, Appendix 3, can be used to ﬁnd χ.90 . 2 To do so, look in the row labeled 10 df and the column headed χ.90 and read the value 4.86518. Therefore, if Y has a χ 2 distribution with 10 df, P(Y > 4.86518) = .90. It follows that P(Y ≤ 4.86518) = .10 and that 4.86518 is the .10 quantile, φ.10 , of a χ 2 random variable with 10 df. In general, P χ 2 > χα2 = α implies that P χ 2 ≤ χα2 = 1 − α and that χα2 = φ1−α , the (1 − α) quantile of the χ 2 random variable. Table 6, Appendix 3, contains χα2 = φ1−α for ten values of α (.005, .01, .025, .05, .1, .90, .95, .975, .99 and .995) for each of 37 different χ 2 distributions (those with degrees of freedom 1, 2, . . . , 30 and 40, 50, 60, 70, 80, 90 and 100). Considerably more information about these distributions, and those associated with degrees of

F I G U R E 7.2 A χ 2 distribution showing upper-tail area α

f (u)

␣ 0

␣2

u

7.2

Sampling Distributions Related to the Normal Distribution

357

freedom not covered in the table, is provided by available statistical software. If Y has a χ 2 distribution with ν df, the R (and S-Plus) command pchisq(y0 ,ν) gives P(Y ≤ y0 ) whereas qchisq(p,ν) yields the pth quantile, the value φ p such that P(Y ≤ φ p ) = p. Probabilities and quantiles associated with χ 2 random variables are also easily obtained using the Chi-Square Probabilities and Quantiles applet (accessible at www.thomsonedu.com/statistics/wackerly). The following example illustrates the combined use of Theorem 7.2 and the χ 2 tables. E X A M PL E 7.4

Solution

If Z 1 , Z 2 , . . . , Z 6 denotes a random sample from the standard normal distribution, ﬁnd a number b such that 6 2 Z i ≤ b = .95. P i=1

6

By Theorem 7.2, has a χ 2 distribution with 6 df. Looking at Table 6, 2 , we see the number Appendix 3, in the row headed 6 df and the column headed χ.05 12.5916. Thus, 6 6 2 2 Z i > 12.5916 = .05, or, equivalently, P Z i ≤ 12.5916 = .95, P 2 i=1 Z i

i=1

i=1

and b = 12.5916 is the .95 quantile (95th percentile) of the sum of the squares of six independent standard normal random variables.

The χ 2 distribution plays an important role in many inferential procedures. For example, suppose that we wish to make an inference about the population variance σ 2 based on a random sample Y1 , Y2 , . . . , Yn from a normal population. As we will show in Chapter 8, a good estimator of σ 2 is the sample variance S2 =

n 1 (Yi − Y )2 . n − 1 i=1

The following theorem gives the probability distribution for a function of the statistic S 2 .

THEOREM 7.3

Let Y1 , Y2 , . . . , Yn be a random sample from a normal distribution with mean µ and variance σ 2 . Then n 1 (n − 1)S 2 = (Yi − Y )2 σ2 σ 2 i=1 has a χ 2 distribution with (n − 1) df. Also, Y and S 2 are independent random variables.

358

Chapter 7

Sampling Distributions and the Central Limit Theorem

Proof

The complete proof of this theorem is outlined in Exercise 13.93. To make the general result more plausible, we will consider the case n = 2 and show that (n − 1)S 2 /σ 2 has a χ 2 distribution with 1 df. In the case n = 2, Y = (1/2)(Y1 + Y2 ), and, therefore, 2 1 (Yi − Y )2 2 − 1 i=1 2 2 1 1 = Y1 − (Y1 + Y2 ) + Y2 − (Y1 + Y2 ) 2 2 2 2 1 1 (Y1 − Y2 ) + (Y2 − Y1 ) = 2 2 2 1 (Y1 − Y2 )2 . = 2 (Y1 − Y2 ) = 2 2

S2 =

It follows that, when n = 2, (Y1 − Y2 )2 (n − 1)S 2 = = 2 σ 2σ 2

Y1 − Y2 √ 2σ 2

2 .

We will show that this quantity is equal to the square of a standard normal random variable; that is, it is a Z 2 , which—as we have already shown in Example 6.11—possesses a χ 2 distribution with 1 df. Because Y1 −Y2 is a linear combination of independent, normally distributed random variables (Y1 − Y2 = a1 Y1 + a2 Y2 with a1 = 1 and a2 = −1), Theorem 6.3 tells us that Y1 − Y2 has a normal distribution with mean 1µ − 1µ = 0 and variance (1)2 σ 2 + (−1)2 σ 2 = 2σ 2 . Therefore, Y1 − Y2 Z= √ 2σ 2 has a standard normal distribution. Because for n = 2 Y1 − Y2 2 (n − 1)S 2 = = Z 2, √ σ2 2σ 2 it follows that (n − 1)S 2 /σ 2 has a χ 2 distribution with 1 df. In Example 6.13, we proved that U1 = (Y1 + Y2 )/σ and U2 = (Y1 − Y2 )/σ are independent random variables. Notice that, because n = 2, Y =

σ U1 Y1 + Y2 = 2 2

and

S2 =

(σ U2 )2 (Y1 − Y2 )2 = . 2 2

Because Y is a function of only U1 and S 2 is a function of only U2 , the independence of U1 and U2 implies the independence of Y and S 2 .

7.2

E X A M PL E 7.5

Sampling Distributions Related to the Normal Distribution

359

In Example 7.2, the ounces of ﬁll from the bottling machine are assumed to have a normal distribution with σ 2 = 1. Suppose that we plan to select a random sample of ten bottles and measure the amount of ﬁll in each bottle. If these ten observations are used to calculate S 2 , it might be useful to specify an interval of values that will include S 2 with a high probability. Find numbers b1 and b2 such that P(b1 ≤ S 2 ≤ b2 ) = .90.

Solution

Notice that

(n − 1)b1 (n − 1)S 2 (n − 1)b2 ≤ ≤ . P(b1 ≤ S ≤ b2 ) = P σ2 σ2 σ2 2

Because σ 2 = 1, it follows that (n − 1)S 2 /σ 2 = (n − 1)S 2 has a χ 2 distribution with (n − 1) df. Therefore, we can use Table 6, Appendix 3, to ﬁnd two numbers a1 and a2 such that P[a1 ≤ (n − 1)S 2 ≤ a2 ] = .90. One method of doing this is to ﬁnd the value of a2 that cuts off an area of .05 in the upper tail and the value of a1 that cuts off .05 in the lower tail (.95 in the upper tail). Because there are n − 1 = 9 df, Table 6, Appendix 3, gives a2 = 16.919 and a1 = 3.325. Consequently, values for b1 and b2 that satisfy our requirements are given by (n − 1)b1 = 9b1 σ2 (n − 1)b2 16.919 = a2 = = 9b2 σ2 3.325 = a1 =

or or

3.325 = .369 and 9 16.919 = 1.880. b2 = 9 b1 =

Thus, if we wish to have an interval that will include S 2 with probability .90, one such interval is (.369, 1.880). Notice that this interval is fairly wide.

The result given in Theorem 7.1 provides the basis for development of inferencemaking procedures about the mean µ of a normal population with known variance √ σ 2 . In that case, Theorem 7.1 tells us that n(Y − µ)/σ has √ a standard normal distribution. When σ is unknown, it can be estimated by S = S 2 , and the quantity √ Y −µ n S provides the basis for developing methods for inferences about µ. We will show that √ n(Y − µ)/S has a distribution known as Student’s t distribution with n − 1 df. The general deﬁnition of a random variable that possesses a Student’s t distribution (or simply a t distribution) is as follows.

360

Chapter 7

Sampling Distributions and the Central Limit Theorem

DEFINITION 7.2

Let Z be a standard normal random variable and let W be a χ 2 -distributed variable with ν df. Then, if Z and W are independent, T =√

Z W/ν

is said to have a t distribution with ν df. If Y1 , Y2 , . . . , Yn constitute a random sample from a normal population with mean √ µ and variance σ 2 , Theorem 7.1 may be applied to show Z = n (Y − µ)/σ has a standard normal distribution. Theorem 7.3 tells us that W = (n − 1)S 2 /σ 2 has a χ 2 distribution with ν = n − 1 df and that Z and W are independent (because Y and S 2 are independent). Therefore, by Deﬁnition 7.2, √ √ n(Y − µ)/σ Y −µ Z = n = / T =√ S W/ν (n − 1)S 2 /σ 2 /(n − 1) has a t distribution with (n − 1) df. The equation for the t density function will not be given here, but it can be found in Exercise 7.98 where hints about its derivation are given. Like the standard normal density function, the t density function is symmetric about zero. Further, for ν > 1, E(T ) = 0; and for ν > 2, V (T ) = ν/(ν − 2). These results follow directly from results developed in Exercises 4.111 and 4.112 (see Exercise 7.30). Thus, we see that, if ν > 1, a t-distributed random variable has the same expected value as a standard normal random variable. However, a standard normal random variable always has variance 1 whereas, if ν > 2, the variance of a random variable with a t distribution always exceeds 1. A standard normal density function and a t density function are sketched in Figure 7.3. Notice that both density functions are symmetric about the origin but that the t density has more probability mass in its tails. Values of tα such that P(T > tα ) = α are given in Table 5, Appendix 3. For example, if a random variable has a t distribution with 21 df, t.100 is found by looking in the row labeled 21 df and the column headed t.100 . Using Table 5, we see that t.100 = 1.323 and that for 21 df, P(T > 1.323) = .100. It follows that 1.323 is the .90 quantile (the 90th percentile) of the t distribution with 21 df and in general that tα = φ1−α , the (1−α) quantile [the 100(1−α)th percentile] of a t-distributed random variable. F I G U R E 7.3 A comparison of the standard normal and t density functions.

Standard Normal

t 0

7.2

Sampling Distributions Related to the Normal Distribution

361

Table 5, Appendix 3, contains tα = φ1−α for ﬁve values of α (.005, .010, .025, .050 and .100) and 30 different t distributions (those with degrees of freedom 1, 2, . . . , 29 and ∞). Considerably more information about these distributions, and those associated with degrees of freedom not covered in the table, is provided by available statistical software. If Y has a t distribution with ν df, the R (and SPlus) command pt(y0 ,ν) gives P(Y ≤ y0 ) whereas qt(p,ν) yields the pth quantile, the value of φ p such that P(Y ≤ φ p ) = p. Probabilities and quantiles associated with t-distributed random variables are also easily obtained using the Student’s t Probabilitles and Quantiles applet (at www.thomsonedu.com/statistics/ wackerly).

E X A M PL E 7.6

Solution

The tensile strength for a type of wire is normally distributed with unknown mean µ and unknown variance σ 2 . Six pieces of wire were randomly selected from a large roll; Yi , the tensile strength for portion i, is measured for i = 1, 2, . . . , 6. The population mean µ and variance σ 2 can be estimated by Y and S 2 , respectively. Because σY2 = σ 2 /n, it follows that σY2 can be estimated by S 2 /n. Find the approximate √ probability that Y will be within 2S/ n of the true population mean µ. We want to ﬁnd √ Y −µ 2S 2S P − √ ≤ (Y − µ) ≤ √ ≤2 = P −2 ≤ n S n n = P(−2 ≤ T ≤ 2), where T has a t distribution with, in this case, n − 1 = 5 df. Looking at Table 5, Appendix 3, we see that the upper-tail area to the right of 2.015 is .05. Hence, P(−2.015 ≤ T ≤ 2.015) = .90, and the probability that Y will be within 2 estimated standard deviations of µ is slightly less than .90. In Exercise 7.24, the exact value for P(−2 ≤ T ≤ 2) will be found using the Student’s t Probabilities and Quantiles applet available at www.thomsonedu.com/statistics/wackerly. Notice that, if σ 2 were known, the probability that Y will fall within 2σY of µ would be given by √ σ σ Y −µ P −2 √ ≤2 ≤ (Y − µ) ≤ 2 √ = P −2 ≤ n σ n n = P(−2 ≤ Z ≤ 2) = .9544.

Suppose that we want to compare the variances of two normal populations based on information contained in independent random samples from the two populations. Samples of sizes n 1 and n 2 are taken from the two populations with variances σ12

362

Chapter 7

Sampling Distributions and the Central Limit Theorem

and σ22 , respectively. If we calculate S12 from the observations in sample 1, then S12 estimates σ12 . Similarly, S22 , calculated from the observations in the second sample, estimates σ22 . Thus, it seems intuitive that the ratio S12 /S22 could be used to make inferences about the relative magnitudes of σ12 and σ22 . If we divide each Si2 by σi2 , then the resulting ratio 2 2 S1 σ2 S12 /σ12 = 2 2 2 S2 /σ2 σ1 S22 has an F distribution with (n 1 − 1) numerator degrees of freedom and (n 2 − 1) denominator degrees of freedom. The general deﬁnition of a random variable that possesses an F distribution appears next. DEFINITION 7.3

Let W1 and W2 be independent χ 2 -distributed random variables with ν1 and ν2 df, respectively. Then W1 /ν1 F= W2 /ν2 is said to have an F distribution with ν1 numerator degrees of freedom and ν2 denominator degrees of freedom. The density function for an F-distributed random variable is given in Exercise 7.99 where the method for its derivation is outlined. It can be shown (see Exercise 7.34) that if F possesses an F distribution with ν1 numerator and ν2 denominator degrees of freedom, then E(F) = ν2 /(ν2 − 2) if ν2 > 2. Also, if ν2 > 4, then V (F) = [2ν22 (ν1 + ν2 − 2)]/[ν1 (ν2 − 2)2 (ν2 − 4)]. Notice that the mean of an Fdistributed random variable depends only on the number of denominator degrees of freedom, ν2 . Considering once again two independent random samples from normal distributions, we know that W1 = (n 1 − 1)S12 /σ12 and W2 = (n 2 − 1)S22 /σ22 have independent χ 2 distributions with ν1 = (n 1 − 1) and ν2 = (n 2 − 1) df, respectively. Thus, Deﬁnition 7.3 implies that (n 1 − 1)S12 /σ12 /(n 1 − 1) W1 /ν1 S 2 /σ 2 = F= = 12 12 2 2 W2 /ν2 S2 /σ2 (n 2 − 1)S2 /σ2 /(n 2 − 1) has an F distribution with (n 1 − 1) numerator degrees of freedom and (n 2 − 1) denominator degrees of freedom. A typical F density function is sketched in Figure 7.4. Values of Fα such that P(F > Fα ) = α are given in Table 7, Appendix 3, for values of α = .100, .050, .025, .010, and .005. In Table 7, the column headings are the numerator degrees of freedom whereas the denominator degrees of freedom are given in the main-row headings. Opposite each denominator degrees of freedom (row heading), the values of α = .100, .050, .025, 010, and .005 appear. For example, if the F variable of interest has 5 numerator degrees of freedom and 7 denominator degrees of freedom, then F.100 = 2.88, F.050 = 3.97, F.025 = 5.29, F.010 = 7.46, and F.005 = 9.52. Thus, if F has an F distribution with 5 numerator degrees of freedom and 7 denominator degrees

7.2

F I G U R E 7.4 A typical F probability density function

Sampling Distributions Related to the Normal Distribution

363

f (u)

␣ u F␣

of freedom, then P(F > 7.46) = .01. It follows that 7.46 is the .99 quantile of the F distribution with 5 numerator degrees of freedom and 7 denominator degrees of freedom. In general, Fα = φ1−α , the (1 − α) quantile [the 100(1 − α)th percentile] of an F-distributed random variable. For the ﬁve previously mentioned values of α, Table 7, Appendix 3 gives the values of Fα for 646 different F distributions (those with numerator degrees of freedom 1, 2, . . . , 10, 12, 15, 20, 24, 30, 40, 60, 120, and ∞, and denominator degrees of freedom 1, 2, . . . , 30, 40, 60, 120, and ∞). Considerably more information about these distributions, and those associated with degrees of freedom not covered in the table, is provided by available statistical software. If Y has an F distribution with ν1 numerator degrees of freedom and ν2 denominator degrees of freedom, the R (and S-Plus) command pf(y0 ,ν1 ,ν2 ) gives P(Y ≤ y0 ) whereas qf(p,ν1 ,ν2 ) yields the pth quantile, the value of φ p such that P(Y ≤ φ p ) = p. Probabilities and quantiles associated with F-distributed random variables are also easily obtained using the F-Ratio Probabilitles and Quantiles applet (at www.thomsonedu.com/statistics/wackerly).

E X A M PL E 7.7

Solution

If we take independent samples of size n 1 = 6 and n 2 = 10 from two normal populations with equal population variances, ﬁnd the number b such that 2 S1 P ≤ b = .95. S22 Because n 1 = 6, n 2 = 10, and the population variances are equal, then S12 /σ12 S12 = S22 /σ22 S22 has an F distribution with ν1 = n 1 − 1 = 5 numerator degrees of freedom and ν2 = n 2 − 1 = 9 denominator degrees of freedom. Also, 2 2 S1 S1 P ≤ b = 1 − P > b . S22 S22 Therefore, we want to ﬁnd the number b cutting off an upper-tail area of .05 under the F density function with 5 numerator degrees of freedom and 9 denominator degrees of freedom. Looking in column 5 and row 9 in Table 7, Appendix 3, we see that the appropriate value of b is 3.48.

364

Chapter 7

Sampling Distributions and the Central Limit Theorem

Even when the population variances are equal, the probability that the ratio of the sample variances exceeds 3.48 is still .05 (assuming sample sizes of n 1 = 6 and n 2 = 10).

This section has been devoted to developing the sampling distributions of various statistics calculated by using the observations in a random sample from a normal population (or independent random samples from two normal populations). In particular, if Y1 , Y2 , . . . , Yn represents a random √ sample from a normal population with mean µ and variance σ 2 , we have seen that n(Y − µ)/σ √ has a standard normal distribution. Also, (n − 1)S 2 /σ 2 has a χ 2 distribution, and n(Y − µ)/S has a t distribution (both with n − 1 df). If we have two independent random samples from normal populations with variances σ12 and σ22 , then F = (S12 /σ12 )/(S22 /σ22 ) has an F distribution. These sampling distributions will enable us to evaluate the properties of inferential procedures in later chapters. In the next section, we discuss approximations to certain sampling distributions. These approximations can be very useful when the exact form of the sampling distribution is unknown or when it is difﬁcult or tedious to use the exact sampling distribution to compute probabilities.

Exercises 7.9

Refer to Example 7.2. The amount of ﬁll dispensed by a bottling machine is normally distributed with σ = 1 ounce. If n = 9 bottles are randomly selected from the output of the machine, we found that the probability that the sample mean will be within .3 ounce of the true mean is .6318. Suppose that Y is to be computed using a sample of size n. a If n = 16, what is P(|Y − µ| ≤ .3)? b Find P(|Y − µ| ≤ .3) when Y is to be computed using samples of sizes n = 25, n = 36, n = 49, and n = 64. c What pattern do you observe among the values for P(|Y − µ| ≤ .3) that you observed for the various values of n? d Do the results that you obtained in part (b) seem to be consistent with the result obtained in Example 7.3?

7.10

Refer to Exercise 7.9. Assume now that the amount of ﬁll dispensed by the bottling machine is normally distributed with σ = 2 ounces. If n = 9 bottles are randomly selected from the output of the machine, what is P(|Y − µ| ≤ .3)? Compare this with the answer obtained in Example 7.2. b Find P(|Y − µ| ≤ .3) when Y is to be computed using samples of sizes n = 25, n = 36, n = 49, and n = 64. c What pattern do you observe among the values for P(|Y − µ| ≤ .3) that you observed for the various values of n? d How do the respective probabilities obtained in this problem (where σ = 2) compare to those obtained in Exercise 7.9 (where σ = 1)? a

7.11

A forester studying the effects of fertilization on certain pine forests in the Southeast is interested in estimating the average basal area of pine trees. In studying basal areas of similar trees

Exercises

365

for many years, he has discovered that these measurements (in square inches) are normally distributed with standard deviation approximately 4 square inches. If the forester samples n = 9 trees, ﬁnd the probability that the sample mean will be within 2 square inches of the population mean.

7.12

Suppose the forester in Exercise 7.11 would like the sample mean to be within 1 square inch of the population mean, with probability .90. How many trees must he measure in order to ensure this degree of accuracy?

7.13

The Environmental Protection Agency is concerned with the problem of setting criteria for the amounts of certain toxic chemicals to be allowed in freshwater lakes and rivers. A common measure of toxicity for any pollutant is the concentration of the pollutant that will kill half of the test species in a given amount of time (usually 96 hours for ﬁsh species). This measure is called LC50 (lethal concentration killing 50% of the test species). In many studies, the values contained in the natural logarithm of LC50 measurements are normally distributed, and, hence, the analysis is based on ln(LC50) data. Studies of the effects of copper on a certain species of ﬁsh (say, species A) show the variance of ln(LC50) measurements to be around .4 with concentration measurements in milligrams per liter. If n = 10 studies on LC50 for copper are to be completed, ﬁnd the probability that the sample mean of ln(LC50) will differ from the true population mean by no more than .5.

7.14

If in Exercise 7.13 we want the sample mean to differ from the population mean by no more than .5 with probability .95, how many tests should be run?

7.15

Suppose that X 1 , X 2 , . . . , X m and Y1 , Y2 , . . . , Yn are independent random samples, with the variables X i normally distributed with mean µ1 and variance σ12 and the variables Yi normally distributed with mean µ2 and variance σ22 . The difference between the sample means, X − Y , is then a linear combination of m + n normally distributed random variables and, by Theorem 6.3, is itself normally distributed. a Find E(X − Y ). b Find V (X − Y ). c Suppose that σ12 = 2, σ22 = 2.5, and m = n. Find the sample sizes so that (X − Y ) will be within 1 unit of (µ1 − µ2 ) with probability .95.

7.16

Referring to Exercise 7.13, suppose that the effects of copper on a second species (say, species B) of ﬁsh show the variance of ln(LC50) measurements to be .8. If the population means of ln(LC50) for the two species are equal, ﬁnd the probability that, with random samples of ten measurements from each species, the sample mean for species A exceeds the sample mean for species B by at least 1 unit.

7.17

Applet Exercise Refer to 7.4. Use the applet Chi-Square Probabilities and Example Quantiles 6 6 2 2 2 Z ≤ 6 . Recall that Z has a χ distribution with 6 df. to ﬁnd P i=1 i i=1 i

7.18

Applet Exercise Refer to Example 7.5. If σ 2 = 1 and n = 10, use the applet Chi-Square Probabilities and Quantiles to ﬁnd P(S 2 ≥ 3). (Recall that, under the conditions previously given, 9S 2 has a χ 2 distribution with 9 df.)

7.19

Ammeters produced by a manufacturer are marketed under the speciﬁcation that the standard deviation of gauge readings is no larger than .2 amp. One of these ammeters was used to make ten independent readings on a test circuit with constant current. If the sample variance of these ten measurements is .065 and it is reasonable to assume that the readings are normally distributed, do the results suggest that the ammeter used does not meet the marketing speciﬁcations? [Hint: Find the approximate probability that the sample variance will exceed .065 if the true population variance is .04.]

366

Chapter 7

Sampling Distributions and the Central Limit Theorem

7.20

a If U has a χ 2 distribution with ν df, ﬁnd E(U ) and V (U ). b Using the results of Theorem 7.3, ﬁnd E(S 2 ) and V (S 2 ) when Y1 , Y2 , . . . , Yn is a random sample from a normal distribution with mean µ and variance σ 2 .

7.21

Refer to Exercise 7.13. Suppose that n = 20 observations are to be taken on ln(LC50) measurements and that σ 2 = 1.4. Let S 2 denote the sample variance of the 20 measurements. a Find a number b such that P(S 2 ≤ b) = .975. b Find a number a such that P(a ≤ S 2 ) = .975. c If a and b are as in parts (a) and (b), what is P(a ≤ S 2 ≤ b)?

7.22

Applet Exercise As we stated in Deﬁnition 4.10, a random variable Y has a χ 2 distribution with ν df if and only if Y has a gamma distribution with α = ν/2 and β = 2. a Use the applet Comparison of Gamma Density Functions to graph χ 2 densities with 10, 40, and 80 df. b What do you notice about the shapes of these density functions? Which of them is most symmetric? c In Exercise 7.97, you will show that for large values of ν, a χ 2 random variable has √ a distribution that can be approximated by a normal distribution with µ = ν and σ = 2ν. How do the mean and standard deviation of the approximating normal distribution compare to the mean and standard deviation of the χ 2 random variable Y ? d Refer to the graphs of the χ 2 densities that you obtained in part (a). In part (c), we stated that, if the number of degrees of freedom is large, the χ 2 distribution can be approximated with a normal distribution. Does this surprise you? Why?

7.23

Applet Exercise a Use the applet Chi-Square Probabilities and Quantiles to ﬁnd P[Y > E(Y )] when Y has χ 2 distributions with 10, 40, and 80 df. b What did you notice about P[Y > E(Y )] as the number of degrees of freedom increases as in part (a)? c How does what you observed in part (b) relate to the shapes of the χ 2 densities that you obtained in Exercise 7.22?

7.24

Applet Exercise Refer to Example 7.6. Suppose that T has a t distribution with 5 df. a

Use the applet Student’s t Probabilities and Quantiles to ﬁnd the exact probability that T is greater than 2. b Use the applet Student’s t Probabilities and Quantiles to ﬁnd the exact probability that T is less than −2. c Use the applet Student’s t Probabilities and Quantiles to ﬁnd the exact probability that T is between −2 and 2. d Your answer to part (c) is considerably less than 0.9544 = P(−2 ≤ Z ≤ 2). Refer to Figure 7.3 and explain why this is as expected.

7.25

Applet Exercise Suppose that T is a t-distributed random variable. If T has 5 df, use Table 5, Appendix 3, to ﬁnd t.10 , the value such that P(T > t.10 ) = .10. Find t.10 using the applet Student’s t Probabilities and Quantiles. b Refer to part (a). What quantile does t.10 correspond to? Which percentile? c Use the applet Student’s t Probabilities and Quantiles to ﬁnd the value of t.10 for t distributions with 30, 60, and 120 df. a

Exercises

367

d When Z has a standard normal distribution, P(Z > 1.282) = .10 and z .10 = 1.282. What property of the t distribution (when compared to the standard normal distribution) explains the fact that all of the values obtained in part (c) are larger than z .10 = 1.282? e What do you observe about the relative sizes of the values of t.10 for t distributions with 30, 60, and 120 df? Guess what t.10 “converges to” as the number of degrees of freedom gets large. [Hint: Look at the row labeled ∞ in Table 5, Appendix 3.]

7.26

Refer to Exercise 7.11. Suppose that in the forest fertilization problem the population standard deviation of basal areas is not known and must be estimated from the sample. If a random sample of n = 9 basal areas is to be measured, ﬁnd two statistics g1 and g2 such that P[g1 ≤ (Y − µ) ≤ g2 ] = .90.

7.27

Applet Exercise Refer to Example 7.7. If we take independent samples of sizes n 1 = 6 and n 2 = 10 from two normal populations with equal population variances, use the applet F-Ratio Probabilities and Quantiles to ﬁnd a P(S12 /S22 > 2). b P(S12 /S22 < 0.5). c the probability that one of the sample variances is at least twice as big as the other.

7.28

Applet Exercise Suppose that Y has an F distribution with ν1 = 4 numerator degrees of freedom and ν2 = 6 denominator degrees of freedom. a Use Table 7, Appendix 3, to ﬁnd F.025 . Also ﬁnd F.025 using the applet F-Ratio Probabilities and Quantiles. b Refer to part (a). What quantile of Y does F.025 correspond to? What percentile? c Refer to parts (a) and (b). Use the applet F-Ratio Probabilities and Quantiles to ﬁnd F.975 , the .025 quantile (2.5th percentile) of the distribution of Y . d If U has an F distribution with ν1 = 6 numerator and ν2 = 4 denominator degrees of freedom, use Table 7, Appendix 3, or the F-Ratio Probabilities and Quantiles applet to ﬁnd F.025 . e In Exercise 7.29, you will show that if Y is a random variable that has an F distribution with ν1 numerator and ν2 denominator degrees of freedom, then U = 1/Y has an F distribution with ν2 numerator and ν1 denominator degrees of freedom. Does this result explain the relationship between F.975 from part (c) (4 numerator and 6 denominator degrees of freedom) and F.025 from part (d) (6 numerator and 4 denominator degrees of freedom)? What is this relationship?

7.29

If Y is a random variable that has an F distribution with ν1 numerator and ν2 denominator degrees of freedom, show that U = 1/Y has an F distribution with ν2 numerator and ν1 denominator degrees of freedom.

*7.30

Suppose that Z has a standard normal distribution and that Y is an independent χ 2 -distributed random variable with ν df. Then, according to Deﬁnition 7.2, T = √

Z Y /ν

has a t distribution with ν df.1 a If Z has a standard normal distribution, give E(Z ) and E(Z 2 ). [Hint: For any random variable, E(Z 2 ) = V (Z ) + (E(Z ))2 .] 1. Exercises preceded by an asterisk are optional.

368

Chapter 7

Sampling Distributions and the Central Limit Theorem

b

According to the result derived in Exercise 4.112(a), if Y has a χ 2 distribution with ν df, then ([ν/2] + a) a 2 , if ν > −2a. E (Y a ) = (ν/2) Use this result, the result from part (a), and the structure of T to show the following. [Hint: Recall the independence of Z and Y .] i E(T ) = 0, if ν > 1. ii V (T ) = ν/(ν − 2), if ν > 2.

7.31

a Use Table 7, Appendix 3, to ﬁnd F.01 for F-distributed random variables, all with 4 numerator degrees of freedom, but with denominator degrees of freedom of 10, 15, 30, 60, 120, and ∞. b Refer to part (a). What do you observe about the values of F.01 as the number of denominator degrees of freedom increases? 2 c What is χ.01 for a χ 2 -distributed random variable with 4 df? 2 d Divide the value of χ.01 (4 df) from part (c) by the value of F.01 (numerator df = 4; denominator df = ∞). Explain why the value that you obtained is a reasonable value for the ratio. [Hint: Consider the deﬁnition of an F-distributed random variable given in Deﬁnition 7.3.]

7.32

Applet Exercise a Find t.05 for a t-distributed random variable with 5 df. 2 b Refer to part (a). What is P(T 2 > t.05 )? c Find F.10 for an F-distributed random variable with 1 numerator degree of freedom and 5 denominator degrees of freedom. 2 d Compare the value of F.10 found in part (c) with the value of t.05 from parts (a) and (b). e In Exercise 7.33, you will show that if T has a t distribution with ν df, then U = T 2 has an F distribution with 1 numerator degree of freedom and ν denominator degrees of freedom. How does this explain the relationship between the values of F.10 (1 num. df, 5 denom df) 2 and t.05 (5 df) that you observed in part (d)?

7.33

Use the structures of T and F given in Deﬁnitions 7.2 and 7.3, respectively, to argue that if T has a t distribution with ν df, then U = T 2 has an F distribution with 1 numerator degree of freedom and ν denominator degrees of freedom.

*7.34

Suppose that W1 and W2 are independent χ 2 -distributed random variables with ν1 and ν2 df, respectively. According to Deﬁnition 7.3, F=

W1 /ν1 W2 /ν2

has an F distribution with ν1 and ν2 numerator and denominator degrees of freedom, respectively. Use the preceding structure of F, the independence of W1 and W2 , and the result summarized in Exercise 7.30(b) to show a b

7.35

E(F) = ν2 /(ν2 − 2), if ν2 > 2. V (F) = [2ν22 (ν1 + ν2 − 2)]/[ν1 (ν2 − 2)2 (ν2 − 4)], if ν2 > 4.

Refer to Exercise 7.34. Suppose that F has an F distribution with ν1 = 50 numerator degrees of freedom and ν2 = 70 denominator degrees of freedom. Notice that Table 7, Appendix 3, does not contain entries for 50 numerator degrees of freedom and 70 denominator degrees of freedom.

Exercises

369

a What is E(F)? b Give V (F). c Is it likely that F will exceed 3? [Hint: Use Tchebysheff’s theorem.]

*7.36

Let S12 denote the sample variance for a random sample of ten ln(LC50) values for copper and let S22 denote the sample variance for a random sample of eight ln(LC50) values for lead, both samples using the same species of ﬁsh. The population variance for measurements on copper is assumed to be twice the corresponding population variance for measurements on lead. Assume S12 to be independent of S22 . a Find a number b such that P b

S12 ≤ b = .95. S22

Find a number a such that S2 P a ≤ 12 = .95. S2

[Hint: Use the result of Exercise 7.29 and notice that P(U1 /U2 ≤ k) = P(U2 /U1 ≥ 1/k).] c If a and b are as in parts (a) and (b), ﬁnd S2 P a ≤ 12 ≤ b . S2

7.37

Let Y1 , Y2 , . . . , Y5 be a randomsample of size 5 from a normal population with mean 0 and 5 Yi . Let Y6 be another independent observation from the variance 1 and let Y = (1/5) i=1 same population. What is the distribution of 5 a W = i=1 Yi2 ? Why? 5 b U = i=1 (Yi − Y )2 ? Why? 5 2 2 c i=1 (Yi − Y ) + Y6 ? Why?

7.38

Suppose that Y1 , Y2 , . . . , Y5 , Y6 , Y , W , and U are as deﬁned in Exercise 7.37. What is the distribution of √ √ a 5Y6 / W ? Why? √ b 2Y6 / U ? Why? 2 c 2 5Y + Y62 /U ? Why?

*7.39

Suppose that independent samples (of sizes n i ) are taken from each of k populations and that population i is normally distributed with mean µi and variance σ 2 , i = 1, 2, . . . , k. That is, all populations are normally distributed with the same variance but with (possibly) different means. Let X i and Si2 , i = 1, 2, . . . , k be the respective sample means and variances. Let θ = c1 µ1 + c2 µ2 + · · · + ck µk , where c1 , c2 , . . . , ck are given constants. Give the distribution of θˆ = c1 X 1 + c2 X 2 + · · · + ck X k . Provide reasons for any claims that you make. b Give the distribution of a

SSE , σ2

where SSE =

Provide reasons for any claims that you make.

k (n i − 1)Si2 . i=1

370

Chapter 7

Sampling Distributions and the Central Limit Theorem

c Give the distribution of 0

θˆ − θ c12 n1

+

c22 n2

+ ··· +

ck2 nk

,

where MSE =

SSE . n1 + n2 + · · · + nk − k

MSE

Provide reasons for any claims that you make.

7.3 The Central Limit Theorem In Chapter 5, we showed that if Y1 , Y2 , . . . , Yn represents a random sample from any distribution with mean µ and variance σ 2 , then E(Y ) = µ and V (Y ) = σ 2 /n. In this section, we will develop an approximation for the sampling distribution of Y that can be used regardless of the distribution of the population from which the sample is taken. If we sample from a normal population, Theorem 7.1 tells us that Y has a normal sampling distribution. But what can we say about the sampling distribution of Y if the variables Yi are not normally distributed? Fortunately, Y will have a sampling distribution that is approximately normal if the sample size is large. The formal statement of this result is called the central limit theorem. Before we state this theorem, however, we will look at some empirical investigations that demonstrate the sampling distribution of Y . A computer was used to generate random samples of size n from an exponential density function with mean 10—that is, from a population with density (1/10)e−y/10 , y > 0, f (y) = 0, elsewhere. A graph of this density function is given in Figure 7.5. The sample mean was computed for each sample, and the relative frequency histogram for the values of the sample means for 1000 samples each of size n = 5, is shown in Figure 7.6. Notice that Figure 7.6 portrays a histogram that is roughly mound-shaped, but the histogram is slightly skewed. Figure 7.7 is a graph of a similar relative frequency histogram of the values of the sample mean for 1000 samples, each of size n = 25. In this case, Figure 7.7 shows a mounded-shaped and nearly symmetric histogram, which can be approximated quite closely with a normal density function. F I G U R E 7.5 An exponential density function

f ( y)

.1

y

7.3

F I G U R E 7.6 Relative frequency histogram: sample means for 1000 samples (n = 5) from an exponential distribution

The Central Limit Theorem

371

Relative Frequency .20 .18 .16 .14 .12 .10 .08 .06 .04 .02 0

F I G U R E 7.7 Relative frequency histogram: sample means for 1000 samples (n = 25) from an exponential distribution

1.00

3.25

5.50

6

7

7.75 10.00 12.25 14.50 16.75 19.00 21.25

y

Relative Frequency .20 .18 .16 .14 .12 .10 .08 .06 .04 .02 0

5

8

9

10

11

12

13

14

15

y

Recall from Chapter 5 that E(Y ) = µY = µ and V (Y ) = σY2 = σ 2 /n. For the exponential density function used in the simulations, µ = E(Yi ) = 10 and σ 2 = V (Yi ) = (10)2 = 100. Thus, for this example, we see that 100 σ2 = . n n For each value of n (5 and 25), we calculated the average of the 1000 sample means generated in the study. The observed variance of the 1000 sample means was also calculated for each value of n. The results are shown in Table 7.1. In each empirical study (n = 5 and n = 25), the average of the observed sample means and the variance of the observed sample means are quite close to the theoretical values. We now give a formal statement of the central limit theorem. µY = E(Y ) = µ = 10

and

σY2 = V (Y ) =

372

Chapter 7

Sampling Distributions and the Central Limit Theorem

Table 7.1 Calculations for 1000 sample means

THEOREM 7.4

Sample Size

Average of 1000 Sample Means

µY = µ

Variance of 1000 Sample Means

σY2 = σ 2 /n

n=5 n = 25

9.86 9.95

10 10

19.63 3.93

20 4

Central Limit Theorem: Let Y1 , Y2 , . . . , Yn be independent and identically distributed random variables with E(Yi ) = µ and V (Yi ) = σ 2 < ∞. Deﬁne n n Yi − nµ Y −µ 1 Yi . = where Y = Un = i=1 √ √ n i=1 σ n σ/ n Then the distribution function of Un converges to the standard normal distribution function as n → ∞. That is, " u 1 2 lim P(Un ≤ u) = for all u. √ e−t /2 dt n→∞ 2π −∞ The central limit theorem implies that probability statements about Un can be approximated by corresponding probabilities for the standard normal random variable if n is large. (Usually, a value of n greater than 30 will ensure that the distribution of Un can be closely approximated by a normal distribution.) As a matter of convenience, the conclusion of the central limit theorem is often replaced with the simpler statement that Y is asymptotically normally distributed with mean µ and variance σ 2 /n. The central limit theorem can be applied to a random sample Y1 , Y2 , . . . , Yn from any distribution as long as E(Yi ) = µ and V (Yi ) = σ 2 are both ﬁnite and the sample size is large. We will give some examples of the use of the central limit theorem but will defer the proof until the next section (coverage of which is optional). The proof is not needed for an understanding of the applications of the central limit theorem that appear in this text.

E X A M PL E 7.8

Achievement test scores of all high school seniors in a state have mean 60 and variance 64. A random sample of n = 100 students from one large high school had a mean score of 58. Is there evidence to suggest that this high school is inferior? (Calculate the probability that the sample mean is at most 58 when n = 100.)

Solution

Let Y denote the mean of a random sample of n = 100 scores from a population with We want to approximate P(Y ≤ 58). We know from Theorem µ = 60 and σ 2 = 64.√ 7.4 that (Y − µ)/(σ/ n) has a distribution that can be approximated by a standard normal distribution. Hence, using Table 4, Appendix 3, we have Y − 60 58 − 60 ≈ P(Z ≤ −2.5) = .0062. P(Y ≤ 58) = P ≤ √ .8 8/ 100

Exercises

373

Because this probability is so small, it is unlikely that the sample from the school of interest can be regarded as a random sample from a population with µ = 60 and σ 2 = 64. The evidence suggests that the average score for this high school is lower than the overall average of µ = 60. This example illustrates the use of probability in the process of testing hypotheses, a common technique of statistical inference that will be further discussed in Chapter 10.

E X A M PL E 7.9

Solution

The service times for customers coming through a checkout counter in a retail store are independent random variables with mean 1.5 minutes and variance 1.0. Approximate the probability that 100 customers can be served in less than 2 hours of total service time. If we let Yi denote the service time for the ith customer, then we want 100 120 = P(Y ≤ 1.20). Yi ≤ 120 = P Y ≤ P 100 i=1 Because the sample size is large, the central limit theorem tells us that Y is approximately normally distributed with mean µY = µ = 1.5 and variance σY2 = σ 2 /n = 1.0/100. Therefore, using Table 4, Appendix 3, we have Y − 1.50 1.20 − 1.50 P(Y ≤ 1.20) = P ≤ √ √ 1/ 100 1/ 100 ≈ P[Z ≤ (1.2 − 1.5)10] = P(Z ≤ −3) = .0013. Thus, the probability that 100 customers can be served in less than 2 hours is approximately .0013. This small probability indicates that it is virtually impossible to serve 100 customers in only 2 hours.

Exercises 7.40

Applet Exercise Suppose that the population of interest does not have a normal distribution. What does the sampling distribution of Y look like, and what is the effect of the sample size on the sampling distribution of Y ? Use the applet SampleSize to complete the following. Use the up/down arrow to the left of the histogram of the population distribution to select the “Skewed” distribution. What is the mean and standard deviation of the population from which samples will be selected? [These values are labeled M and S, respectively, and are given above the population histogram.] a Use the up/down arrows in the left and right “Sample Size” boxes to select samples of size 1 and 3. Click the button “1 Sample” a few times. What is similar about the two histograms that you generated? What is different about them?

374

Chapter 7

Sampling Distributions and the Central Limit Theorem

b

Click the button “1000 Samples” a few times and answer the questions in part (b). Do the generated histograms have the shapes that you expected? Why? c Are the means and standard deviations of the two sampling distributions close to the values that you expected? [Hint: V (Y ) = σ 2 /n.] d Click the button “Toggle Normal.” What do you observe about the adequacy of the approximating normal distributions? e Click on the two generated sampling distributions to pop up windows for each. Use the up/down arrows in the left and right “Sample Size” boxes to select samples of size 10 and 25. Click the button “Toggle Normal.” You now have graphs of the sampling distributions of the sample means based on samples of size 1, 3, 10, and 25. What do you observe about the adequacy of the normal approximation as the sample size increases?

7.41

Applet Exercise Refer to Exercise 7.40. Use the applet SampleSize to complete the following. Use the up/down arrow to the left of the histogram of the population distribution to select the “U-shaped” distribution. What is the mean and standard deviation of the population from which samples will be selected? a Answer the questions in parts (a) through (e) of Exercise 7.40. b Refer to part (a). When you examined the sampling distribution of Y for n = 3, the sampling distribution had a “valley” in the middle. Why did this occur? Use the applet Basic to ﬁnd out. Select the “U-shaped” population distribution and click the button “1 Sample.” What do you observe about the values of individual observations in the sample. Click the button “1 Sample” several more times. Do the values in the sample tend to be either (relatively) large or small with few values in the “middle”? Why? What effect does this have on the value of the sample mean? [Hint: 3 is an odd sample size.]

7.42

The fracture strength of tempered glass averages 14 (measured in thousands of pounds per square inch) and has standard deviation 2. a What is the probability that the average fracture strength of 100 randomly selected pieces of this glass exceeds 14.5? b Find an interval that includes, with probability 0.95, the average fracture strength of 100 randomly selected pieces of this glass.

7.43

An anthropologist wishes to estimate the average height of men for a certain race of people. If the population standard deviation is assumed to be 2.5 inches and if she randomly samples 100 men, ﬁnd the probability that the difference between the sample mean and the true population mean will not exceed .5 inch.

7.44

Suppose that the anthropologist of Exercise 7.43 wants the difference between the sample mean and the population mean to be less than .4 inch, with probability .95. How many men should she sample to achieve this objective?

7.45

Workers employed in a large service industry have an average wage of $7.00 per hour with a standard deviation of $.50. The industry has 64 workers of a certain ethnic group. These workers have an average wage of $6.90 per hour. Is it reasonable to assume that the wage rate of the ethnic group is equivalent to that of a random sample of workers from those employed in the service industry? [Hint: Calculate the probability of obtaining a sample mean less than or equal to $6.90 per hour.]

7.46

The acidity of soils is measured by a quantity called the pH, which may range from 0 (high acidity) to 14 (high alkalinity). A soil scientist wants to estimate the average pH for a large ﬁeld by randomly selecting n core samples and measuring the pH in each sample. Although

Exercises

375

the population standard deviation of pH measurements is not known, past experience indicates that most soils have a pH value of between 5 and 8. If the scientist selects n = 40 samples, ﬁnd the approximate probability that the sample mean of the 40 pH measurements will be within .2 unit of the true average pH for the ﬁeld. [Hint: See Exercise 1.17.]

7.47

Suppose that the scientist of Exercise 7.46 would like the sample mean to be within .1 of the true mean with probability .90. How many core samples should the scientist take?

7.48

An important aspect of a federal economic plan was that consumers would save a substantial portion of the money that they received from an income tax reduction. Suppose that early estimates of the portion of total tax saved, based on a random sampling of 35 economists, had mean 26% and standard deviation 12%. a What is the approximate probability that a sample mean estimate, based on a random sample of n = 35 economists, will lie within 1% of the mean of the population of the estimates of all economists? b Is it necessarily true that the mean of the population of estimates of all economists is equal to the percent tax saving that will actually be achieved?

7.49

The length of time required for the periodic maintenance of an automobile or another machine usually has a mound-shaped probability distribution. Because some occasional long service times will occur, the distribution tends to be skewed to the right. Suppose that the length of time required to run a 5000-mile check and to service an automobile has mean 1.4 hours and standard deviation .7 hour. Suppose also that the service department plans to service 50 automobiles per 8-hour day and that, in order to do so, it can spend a maximum average service time of only 1.6 hours per automobile. On what proportion of all workdays will the service department have to work overtime?

7.50

Shear strength measurements for spot welds have been found to have standard deviation 10 pounds per square inch (psi). If 100 test welds are to be measured, what is the approximate probability that the sample mean will be within 1 psi of the true population mean?

7.51

Refer to Exercise 7.50. If the standard deviation of shear strength measurements for spot welds is 10 psi, how many test welds should be sampled if we want the sample mean to be within 1 psi of the true mean with probability approximately .99?

7.52

Resistors to be used in a circuit have average resistance 200 ohms and standard deviation 10 ohms. Suppose 25 of these resistors are randomly selected to be used in a circuit. a What is the probability that the average resistance for the 25 resistors is between 199 and 202 ohms? b Find the probability that the total resistance does not exceed 5100 ohms. [Hint: see Example 7.9.]

7.53

One-hour carbon monoxide concentrations in air samples from a large city average 12 ppm (parts per million) with standard deviation 9 ppm. a Do you think that carbon monoxide concentrations in air samples from this city are normally distributed? Why or why not? b Find the probability that the average concentration in 100 randomly selected samples will exceed 14 ppm.

7.54

Unaltered bitumens, as commonly found in lead–zinc deposits, have atomic hydrogen/carbon (H/C) ratios that average 1.4 with standard deviation .05. Find the probability that the average H/C ratio is less than 1.3 if we randomly select 25 bitumen samples.

376

Chapter 7

Sampling Distributions and the Central Limit Theorem

7.55

The downtime per day for a computing facility has mean 4 hours and standard deviation .8 hour. a

Suppose that we want to compute probabilities about the average daily downtime for a period of 30 days. i

What assumptions must be true to use the result of Theorem 7.4 to obtain a valid approximation for probabilities about the average daily downtime? ii Under the assumptions described in part (i), what is the approximate probability that the average daily downtime for a period of 30 days is between 1 and 5 hours? b

Under the assumptions described in part (a), what is the approximate probability that the total downtime for a period of 30 days is less than 115 hours?

7.56

Many bulk products—such as iron ore, coal, and raw sugar—are sampled for quality by a method that requires many small samples to be taken periodically as the material is moving along a conveyor belt. The small samples are then combined and mixed to form one composite sample. Let Yi denote the volume of the ith small sample from a particular lot and suppose that Y1 , Y2 , . . . , Yn constitute a random sample, with each Yi value having mean µ (in cubic inches) and variance σ 2 . The average volume µ of the samples can be set by adjusting the size of the sampling device. Suppose that the variance σ 2 of the volumes of the samples is known to be approximately 4. The total volume of the composite sample must exceed 200 cubic inches with probability approximately .95 when n = 50 small samples are selected. Determine a setting for µ that will allow the sampling requirements to be satisﬁed.

7.57

Twenty-ﬁve heat lamps are connected in a greenhouse so that when one lamp fails, another takes over immediately. (Only one lamp is turned on at any time.) The lamps operate independently, and each has a mean life of 50 hours and standard deviation of 4 hours. If the greenhouse is not checked for 1300 hours after the lamp system is turned on, what is the probability that a lamp will be burning at the end of the 1300-hour period?

7.58

Suppose that X 1 , X 2 , . . . , X n and Y1 , Y2 , . . . , Yn are independent random samples from populations with means µ1 and µ2 and variances σ12 and σ22 , respectively. Show that the random variable (X − Y ) − (µ1 − µ2 ) ( Un = (σ12 + σ22 )/n satisﬁes the conditions of Theorem 7.4 and thus that the distribution function of Un converges to a standard normal distribution function as n → ∞. [Hint: Consider Wi = X i − Yi , for i = 1, 2, . . . , n.]

7.59

An experiment is designed to test whether operator A or operator B gets the job of operating a new machine. Each operator is timed on 50 independent trials involving the performance of a certain task using the machine. If the sample means for the 50 trials differ by more than 1 second, the operator with the smaller mean time gets the job. Otherwise, the experiment is considered to end in a tie. If the standard deviations of times for both operators are assumed to be 2 seconds, what is the probability that operator A will get the job even though both operators have equal ability?

7.60

The result in Exercise 7.58 holds even if the sample sizes differ. That is, if X 1 , X 2 , . . . , X n1 and Y1 , Y2 , . . . , Yn2 constitute independent random samples from populations with means µ1 and µ2 and variances σ12 and σ22 , respectively, then X − Y will be approximately normally distributed, for large n 1 and n 2 , with mean µ1 − µ2 and variance (σ12 /n 1 ) + (σ22 /n 2 ). The ﬂow of water through soil depends on, among other things, the porosity (volume proportion of voids) of the soil. To compare two types of sandy soil, n 1 = 50 measurements are to be taken on the porosity of soil A and n 2 = 100 measurements are to be taken on soil B.

7.4

A Proof of the Central Limit Theorem (Optional)

377

Assume that σ12 = .01 and σ22 = .02. Find the probability that the difference between the sample means will be within .05 unit of the difference between the population means µ1 − µ2 .

7.61

Refer to Exercise 7.60. Suppose that n 1 = n 2 = n, and ﬁnd the value of n that allows the difference between the sample means to be within .04 unit of µ1 − µ2 with probability .90.

7.62

The times that a cashier spends processing individual customer’s order are independent random variables with mean 2.5 minutes and standard deviation 2 minutes. What is the approximate probability that it will take more than 4 hours to process the orders of 100 people?

7.63

Refer to Exercise 7.62. Find the number of customers n such that the probability that the orders of all n customers can be processed in less than 2 hours is approximately .1.

7.4 A Proof of the Central Limit Theorem (Optional) We will sketch a proof of the central limit theorem for the case in which the momentgenerating functions exist for the random variables in the sample. The proof depends upon a fundamental result of probability theory, which cannot be proved here but that is stated in Theorem 7.5. THEOREM 7.5

Let Y and Y1 , Y2 , Y3 , . . . be random variables with moment-generating functions m(t) and m 1 (t), m 2 (t), m 3 (t), . . . , respectively. If lim m n (t) = m(t)

n→∞

for all real t,

then the distribution function of Yn converges to the distribution function of Y as n → ∞. We now give the proof of the central limit theorem, Theorem 7.4.

Proof

Write √ Un = n 1 = √ n

Y −µ σ n i=1

Yi − nµ σ

n 1 =√ Zi , n i=1

where Z i =

Yi − µ . σ

Because the random variables Yi ’s are independent and identically distributed, Z i , i = 1, 2, . . . , n, are independent, and identically distributed with E(Z i ) = 0 and V (Z i ) = 1. Since the moment-generating function of the sum of independent random variables is the product of their individual moment-generating functions, m Z i (t) = m Z 1 (t) × m Z 2 (t) × · · · × m Z n (t) = [m Z 1 (t)]n

378

Chapter 7

Sampling Distributions and the Central Limit Theorem

and

m Un (t) = m

Zi

t √ n

= m Z1

t √ n

n .

By Taylor’s theorem, with remainder (see your Calculus II text) t2 , where 0 < ξ < t, 2 and because m Z 1 (0) = E(e0Z 1 ) = E(1) = 1, and m Z 1 (0) = E(Z 1 ) = 0, m Z 1 (t) = m Z 1 (0) + m Z 1 (0)t + m Z 1 (ξ )

m Z 1 (t) = 1 + Therefore,

m Z 1 (ξ ) 2

t 2,

where 0 < ξ < t.

n t 2 m Un (t) = 1 + √ 2 n n m Z 1 (ξn )t 2 /2 = 1+ , n m Z 1 (ξn )

t where 0 < ξn < √ . n

Notice that as n → ∞, ξn → 0 and m Z 1 (ξn )t 2 /2 → m Z 1 (0)t 2 /2 = E(Z 12 )t 2 /2 = t 2 /2 because E(Z 12 ) = V (Z 1 ) = 1. Recall that if bn n lim bn = b then lim 1 + = eb . n→∞ n→∞ n Finally, lim m Un (t) = lim

n→∞

n→∞

1+

m Z 1 (ξn )t 2 /2 n

n = et

2

/2

,

the moment-generating function for a standard normal random variable. Applying Theorem 7.5, we conclude that Un has a distribution function that converges to the distribution function of the standard normal random variable.

7.5 The Normal Approximation to the Binomial Distribution The central limit theorem also can be used to approximate probabilities for some discrete random variables when the exact probabilities are tedious to calculate. One useful example involves the binomial distribution for large values of the number of trials n. Suppose that Y has a binomial distribution with n trials and probability of success on any one trial denoted by p. If we want to ﬁnd P(Y ≤ b), we can use the binomial

7.5

The Normal Approximation to the Binomial Distribution

379

probability function to compute P(Y = y) for each nonnegative integer y less than or equal to b and then sum these probabilities. Tables are available for some values of the sample size n, but direct calculation is cumbersome for large values of n for which tables may be unavailable. Alternatively, we can view Y , the number of successes in n trials, as a sum of a sample consisting of 0s and 1s; that is, Y =

n

Xi ,

i=1

where

$ Xi =

1, if the ith trial results in success, 0, otherwise.

The random variables X i for i = 1, 2, . . . , n are independent (because the trials are independent), and it is easy to show that E(X i ) = p and V (X i ) = p(1 − p) for i = 1, 2, . . . , n. Consequently, when n is large, the sample fraction of successes, n Y 1 = Xi = X , n n i=1

possesses an approximately normal sampling distribution with mean E(X i ) = p and variance V (X i )/n = p(1 − p)/n. Thus, we have used Theorem 7.4 (the central limit theorem) to establish that if Y is a binomial random variable with parameters n and p and if n is large, then Y/n has approximately the same distribution as U , where U is normally distributed with mean µU = p and variance σU2 = p(1 − p)/n. Equivalently, for large n, we can think of Y as having approximately the same distribution as W , where W is normally distributed with mean µW = np and variance σW2 = np(1 − p).

EXAMPLE 7.10

Candidate A believes that she can win a city election if she can earn at least 55% of the votes in precinct 1. She also believes that about 50% of the city’s voters favor her. If n = 100 voters show up to vote at precinct 1, what is the probability that candidate A will receive at least 55% of their votes?

Solution

Let Y denote the number of voters at precinct 1 who vote for candidate A. We must approximate P(Y /n ≥ .55) when p is the probability that a randomly selected voter from precinct 1 favors candidate A. If we think of the n = 100 voters at precinct 1 as a random sample from the city, then Y has a binomial distribution with n = 100 and p = .5. We have seen that the fraction of voters who favor candidate A is n 1 Y = Xi n n i=1

where X i = 1 if the ith voter favors candidate A and X i = 0 otherwise. Because it is reasonable to assume that X i , i = 1, 2, . . . , n are independent, the central limit theorem implies that X = Y /n is approximately normally distributed

380

Chapter 7

Sampling Distributions and the Central Limit Theorem

with mean p = .5 and variance pq/n = (.5)(.5)/100 = .0025. Therefore, Y /n − .5 Y .55 − .50 ≥ .55 = P √ ≈ P(Z ≥ 1) = .1587 P ≥ n .05 .0025 from Table 4, Appendix 3.

The normal approximation to binomial probabilities works well even for moderately large n as long as p is not close to zero or one. A useful rule of thumb is√ that the normal approximation to the binomial distribution is appropriate when p ± 3 pq/n lies in the interval (0, 1)—that is, if ( ( 0 < p − 3 pq/n and p + 3 pq/n < 1. In Exercise 7.70, you will show that a more convenient but equivalent criterion is that the normal approximation is adequate if larger of p and q . n>9 smaller of p and q As you will see in Exercise 7.71, for some values of p, this criterion is sometimes met for moderate values of n. Especially for moderate values of n, substantial improvement in the approximation can be made by a slight adjustment on the boundaries used in the calculations. If we look at the segment of a binomial distribution graphed in Figure 7.8, we can see what happens when we try to approximate a discrete distribution represented by a histogram with a continuous density function. If we want to ﬁnd P(Y ≤ 3) by using the binomial distribution, we can ﬁnd the total area in the four rectangles (above 0, 1, 2, and 3) illustrated in the binomial histogram (Figure 7.8). Notice that the total area in the rectangles can be approximated by an area under the normal curve. The area under the curve includes some areas not in the histogram and excludes the portion of the histogram that lies above the curve. If we want to approximate P(Y ≤ 3) by calculating an area under the density function, the area under the density function to the left of 3.5 provides a better approximation than does the area to the left of 3.0. The following example illustrates how close the normal approximation is for a case in which some exact binomial probabilities can be found. F I G U R E 7.8 The normal approximation to the binomial distribution: n = 10 and p = .5

p ( y)

1

2

3

y

7.5

EXAMPLE 7.11

Solution

The Normal Approximation to the Binomial Distribution

381

Suppose that Y has a binomial distribution with n = 25 and p = .4. Find the exact probabilities that Y ≤ 8 and Y = 8 and compare these to the corresponding values found by using the normal approximation. From Table 1, Appendix 3, we ﬁnd that P(Y ≤ 8) = .274 and P(Y = 8) = P(Y ≤ 8) − P(Y ≤ 7) = .274 − .154 = .120. As previously stated, we can think of Y as having approximately the same distribution as W , where W is normally distributed with µW = np and σW2 = np(1 − p). Because we want P(Y ≤ 8), we look at the normal curve area to the left of 8.5. Thus, 8.5 − 10 W − np ≤√ P(Y ≤ 8) ≈ P(W ≤ 8.5) = P √ np(1 − p) 25(.4)(.6) = P(Z ≤ −.61) = .2709 from Table 4, Appendix 3. This approximate value is close to the exact value for P(Y ≤ 8) = .274, obtained from the binomial tables. To ﬁnd the normal approximation to the binomial probability p(8), we will ﬁnd the area under the normal curve between the points 7.5 and 8.5 because this is the interval included in the histogram bar over y = 8 (see Figure 7.9). Because Y has approximately the same distribution as W , where W is normally distributed with µW = np = 25(.4) = 10 and σW2 = np(1 − p) = 25(.4)(.6) = 6, it follows that P(Y = 8) ≈ P(7.5 ≤ W ≤ 8.5) 7.5 − 10 W − 10 8.5 − 10 =P ≤ √ ≤ √ √ 6 6 6 = P(−1.02 ≤ Z ≤ −.61) = .2709 − .1539 = .1170.

F I G U R E 7.9 P (Y = 8) for binomial distribution of Example 7.11

p ( y)

6

7

8 7.5

9 8.5

y

382

Chapter 7

Sampling Distributions and the Central Limit Theorem

Again, we see that this approximate value is very close to the actual value, P(Y = 8) = .120, calculated earlier.

In the above example, we used an area under a normal curve to approximate P(Y ≤ 8) and P(Y = 8) when Y had a binomial distribution with n = 25 and p = .4. To improve the approximation, .5 was added to the largest value of interest (8) when we used the approximation P(Y ≤ 8) ≈ P(W ≤ 8.5) and W had an appropriate normal distribution. Had we been interested in approximating P(Y ≥ 6), we would have used P(Y ≥ 6) ≈ P(W ≥ 5.5); that is, we would have subtracted .5 from the smallest value of interest (6). The .5 that we added to the largest value of interest (making it a little larger) and subtracted from the smallest value of interest (making it a little smaller) is commonly called the continuity correction associated with the normal approximation. The only time that this continuity correction is used in this text is when we approximate a binomial (discrete) distribution with a normal (continuous) distribution.

Exercises 7.64

Applet Exercise Access the applet Normal Approximation to Binomial Distribution (at www. thomsonedu.com/statistics/wackerly). When the applet is started, it displays the details in Example 7.11 and Figure 7.9. Initially, the display contains only the binomial histogram and the exact value (calculated using the binomial probability function) for p(8) = P(Y = 8). Scroll down a little and click the button “Toggle Normal Approximation” to overlay the normal √ density with mean 10 and standard deviation .6 = 2.449, the same mean and standard deviation as the binomial random variable Y . You will get a graph superior to the one in Figure 7.9. a b c

How many probability mass or density functions are displayed? Enter 0 in the box labeled “Begin” and press the enter key. What probabilities do you obtain? Refer to part (b). On the line where the approximating normal probability is displayed, you see the expression Normal: P(−0.5 9( p/q) .

c Combine the results from parts (a) and (b) to obtain that the normal approximation to the binomial is adequate if p q n>9 and n > 9 , q p or, equivalently,

n>9

larger of p and q . smaller of p and q

384

Chapter 7

Sampling Distributions and the Central Limit Theorem

7.71

Refer to Exercise 7.70. a For what values of n will the normal approximation to the binomial distribution be adequate if p = .5? b Answer the question in part (a) if p = .6, .4, .8, .2, .99, and .001.

7.72

A machine is shut down for repairs if a random sample of 100 items selected from the daily output of the machine reveals at least 15% defectives. (Assume that the daily output is a large number of items.) If on a given day the machine is producing only 10% defective items, what is the probability that it will be shut down? [Hint: Use the .5 continuity correction.]

7.73

An airline ﬁnds that 5% of the persons who make reservations on a certain ﬂight do not show up for the ﬂight. If the airline sells 160 tickets for a ﬂight with only 155 seats, what is the probability that a seat will be available for every person holding a reservation and planning to ﬂy?

7.74

According to a survey conducted by the American Bar Association, 1 in every 410 Americans is a lawyer, but 1 in every 64 residents of Washington, D.C., is a lawyer. a If you select a random sample of 1500 Americans, what is the approximate probability that the sample contains at least one lawyer? b If the sample is selected from among the residents of Washington, D.C., what is the approximate probability that the sample contains more than 30 lawyers? c If you stand on a Washington, D.C., street corner and interview the ﬁrst 1000 persons who walked by and 30 say that they are lawyers, does this suggest that the density of lawyers passing the corner exceeds the density within the city? Explain.

7.75

A pollster believes that 20% of the voters in a certain area favor a bond issue. If 64 voters are randomly sampled from the large number of voters in this area, approximate the probability that the sampled fraction of voters favoring the bond issue will not differ from the true fraction by more than .06.

7.76

a Show that the variance of Y /n, where Y has a binomial distribution with n trials and a success probability of p, has a maximum at p = .5, for ﬁxed n. b A random sample of n items is to be selected from a large lot, and the number of defectives Y is to be observed. What value of n guarantees that Y/n will be within .1 of the true fraction of defectives, with probability .95?

7.77

The manager of a supermarket wants to obtain information about the proportion of customers who dislike a new policy on cashing checks. How many customers should he sample if he wants the sample fraction to be within .15 of the true fraction, with probability .98?

7.78

If the supermarket manager (Exercise 7.77) samples n = 50 customers and if the true fraction of customers who dislike the policy is approximately .9, ﬁnd the probability that the sample fraction will be within .15 unit of the true fraction.

7.79

Suppose that a random sample of 25 items is selected from the machine of Exercise 7.72. If the machine produces 10% defectives, ﬁnd the probability that the sample will contain at least two defectives, by using the following methods: a The normal approximation to the binomial b The exact binomial tables

7.80

The median age of residents of the United States is 31 years. If a survey of 100 randomly selected U.S. residents is to be taken, what is the approximate probability that at least 60 will be under 31 years of age?

7.6

7.81

Summary

385

A lot acceptance sampling plan for large lots speciﬁes that 50 items be randomly selected and that the lot be accepted if no more than 5 of the items selected do not conform to speciﬁcations. a What is the approximate probability that a lot will be accepted if the true proportion of nonconforming items in the lot is .10? b Answer the question in part (a) if the true proportion of nonconforming items in the lot is .20 and .30.

7.82

The quality of computer disks is measured by the number of missing pulses. Brand X is such that 80% of the disks have no missing pulses. If 100 disks of brand X are inspected, what is the probability that 15 or more contain missing pulses?

7.83

Applet Exercise Vehicles entering an intersection from the east are equally likely to turn left, turn right, or proceed straight ahead. If 50 vehicles enter this intersection from the east, use the applet Normal Approximation to Binomial Distribution to ﬁnd the exact and approximate probabilities that a 15 or fewer turn right. b at least two-thirds of those in the sample turn.

7.84

Just as the difference between two sample means is normally distributed for large samples, so is the difference between two sample proportions. That is, if Y1 and Y2 are independent binomial random variables with parameters (n 1 , p1 ) and (n 2 , p2 ), respectively, then (Y1 /n 1 ) − (Y2 /n 2 ) is approximately normally distributed for large values of n 1 and n 2 . Y2 Y1 . − a Find E n n2 1 Y1 Y2 b Find V . − n1 n2

7.85

As a check on the relative abundance of certain species of ﬁsh in two lakes, n = 50 observations are taken on results of net trapping in each lake. For each observation, the experimenter merely records whether the desired species was present in the trap. Past experience has shown that this species appears in lake A traps approximately 10% of the time and in lake B traps approximately 20% of the time. Use these results to approximate the probability that the difference between the sample proportions will be within .1 of the difference between the true proportions.

7.86

An auditor samples 100 of a ﬁrm’s travel vouchers to ascertain what percentage of the whole set of vouchers are improperly documented. What is the approximate probability that more than 30% of the sampled vouchers are improperly documented if, in fact, only 20% of all the vouchers are improperly documented? If you were the auditor and observed more than 30% with improper documentation, what would you conclude about the ﬁrm’s claim that only 20% suffered from improper documentation? Why?

7.87

The times to process orders at the service counter of a pharmacy are exponentially distributed with mean 10 minutes. If 100 customers visit the counter in a 2-day period, what is the probability that at least half of them need to wait more than 10 minutes?

7.6 Summary To make inferences about population parameters, we need to know the probability distributions for certain statistics, functions of the observable random variables in the sample (or samples). These probability distributions provide models for the

386

Chapter 7

Sampling Distributions and the Central Limit Theorem

Table 7.2 R (and S-Plus) procedures giving probabilities and percentiles for normal, χ2 , t, and F distributions.

P(Y ≤ y0 )

pth Quantile, φ p Such That P(Y ≤ φ p ) = p

Normal (µ,σ )

pnorm(y0 ,µ,σ )

qnorm(p,µ,σ )

χ 2 with ν df

pchisq(y0 , ν)

qchisq(p,ν)

Distribution

t with ν df

pt(y0 , ν)

qt(p,ν)

F with ν1 num. df, ν2 denom. df

pf(y0 , ν1 , ν2 )

qf(p,ν1 , ν2 )

relative frequency behavior of the statistics in repeated sampling; consequently, they are referred to as sampling distributions. We have seen that the normal, χ 2 , t, and F distributions provide models for the sampling distributions of statistics used to make inferences about the parameters associated with normal distributions. For your convenience, Table 7.2 contains a summary of the R (or S-Plus) commands that provide probabilities and quantiles associated with these distributions. When the sample size is large, the sample mean Y possesses an approximately normal distribution if the random sample is taken from any distribution with a ﬁnite mean µ and a ﬁnite variance σ 2 . This result, known as the central limit theorem, also provides the justiﬁcation for approximating binomial probabilities with corresponding probabilities associated with the normal distribution. The sampling distributions developed in this chapter will be used in the inferencemaking procedures presented in subsequent chapters.

References and Further Readings Casella, G., and R. L. Berger. 2002. Statistical Inference, 2nd ed. Paciﬁc Grove, Calif.: Duxbury. Hoel, P. G. 1984. Introduction to Mathematical Statistics, 5th ed. New York: Wiley. Hogg, R. V., A. T. Craig, and J. W. McKean. 2005. Introduction to Mathematical Statistics, 6th ed. Upper Saddle River, N.J.: Pearson Prentice Hall. Mood, A. M., F. A. Graybill, and D. Boes. 1974. Introduction to the Theory of Statistics, 3d ed. New York: McGraw-Hill. Parzen, E. 1992. Modern Probability Theory and Its Applications. New York: Wiley-Interscience.

Supplementary Exercises 7.88

The efﬁciency (in lumens per watt) of light bulbs of a certain type has population mean 9.5 and standard deviation .5, according to production speciﬁcations. The speciﬁcations for a room in which eight of these bulbs are to be installed call for the average efﬁciency of the eight bulbs

Supplementary Exercises

387

to exceed 10. Find the probability that this speciﬁcation for the room will be met, assuming that efﬁciency measurements are normally distributed.

7.89

Refer to Exercise 7.88. What should be the mean efﬁciency per bulb if the speciﬁcation for the room is to be met with a probability of approximately .80? (Assume that the variance of efﬁciency measurements remains at .5.)

7.90

Briggs and King developed the technique of nuclear transplantation in which the nucleus of a cell from one of the later stages of an embryo’s development is transplanted into a zygote (a single-cell, fertilized egg) to see if the nucleus can support normal development. If the probability that a single transplant from the early gastrula stage will be successful is .65, what is the probability that more than 70 transplants out of 100 will be successful?

7.91

A retail dealer sells three brands of automobiles. For brand A, her proﬁt per sale, X is normally distributed with parameters (µ1 , σ12 ); for brand B her proﬁt per sale Y is normally distributed with parameters (µ2 , σ22 ); for brand C, her proﬁt per sale W is normally distributed with parameters (µ3 , σ32 ). For the year, two-ﬁfths of the dealer’s sales are of brand A, one-ﬁfth of brand B, and the remaining two-ﬁfths of brand C. If you are given data on proﬁts for n 1 , n 2 , and n 3 sales of brands A, B, and C, respectively, the quantity U = .4X + .2Y + .4W will approximate to the true average proﬁt per sale for the year. Find the mean, variance, and probability density function for U . Assume that X, Y , and W are independent.

7.92

From each of two normal populations with identical means and with standard deviations of 6.40 and 7.20, independent random samples of 64 observations are drawn. Find the probability that the difference between the means of the samples exceeds .6 in absolute value.

7.93

If Y has an exponential distribution with mean θ, show that U = 2Y /θ has a χ 2 distribution with 2 df.

7.94

A plant supervisor is interested in budgeting weekly repair costs for a certain type of machine. Records over the past years indicate that these repair costs have an exponential distribution with mean 20 for each machine studied. Let Y1 , Y2 , . . . , Y5 denote the repair costs for ﬁve of these 5 machines for the next week. Find a number c such that P i=1 Yi > c = .05, assuming that the machines operate independently. [Hint: Use the result given in Exercise 7.93.]

7.95

The coefﬁcient of variation (CV) for a sample of values Y1 , Y2 , . . . , Yn is deﬁned by CV = S/Y . This quantity, which gives the standard deviation as a proportion of the mean, is sometimes informative. For example, the value S = 10 has little meaning unless we can compare it to something else. If S is observed to be 10 and Y is observed to be 1000, the amount of variation is small relative to the size of the mean. However, if S is observed to be 10 and Y is observed to be 5, the variation is quite large relative to the size of the mean. If we were studying the precision (variation in repeated measurements) of a measuring instrument, the ﬁrst case (CV = 10/1000) might provide acceptable precision, but the second case (CV = 2) would be unacceptable. Let Y1 , Y2 , . . . , Y10 denote a random sample of size 10 from a normal distribution with mean 0 and variance σ 2 . Use the following steps to ﬁnd the number c such that S ≤ c = .95. P −c ≤ Y a

2

Use the result of Exercise 7.33 to ﬁnd the distribution of (10)Y /S 2 . 2

b Use the result of Exercise 7.29 to ﬁnd the distribution of S 2 /[(10)Y ]. c Use the answer to (b) to ﬁnd the constant c.

388

Chapter 7

Sampling Distributions and the Central Limit Theorem

7.96

Suppose that Y1 , Y2 , . . . , Y40 denote a random sample of measurements on the proportion of impurities in iron ore samples. Let each variable Yi have a probability density function given by 3y 2 , 0 ≤ y ≤ 1, f (y) = 0, elsewhere. The ore is to be rejected by the potential buyer if Y exceeds .7. Find P(Y > .7) for the sample of size 40.

*7.97

Let X 1 , X 2 , . . . , X n be independent χ 2 -distributed random variables, each with 1 df. Deﬁne Y as n Y = Xi . i=1

It follows from Exercise 6.59 that Y has a χ distribution with n df. 2

√ Use the preceding representation of Y as the sum of the X ’s to show that Z = (Y − n)/ 2n has an asymptotic standard normal distribution. b A machine in a heavy-equipment factory produces steel rods of length Y , where Y is a normally distributed random variable with mean 6 inches and variance .2. The cost C of repairing a rod that is not exactly 6 inches in length is proportional to the square of the error and is given, in dollars, by C = 4(Y − µ)2 . If 50 rods with independent lengths are produced in a given day, approximate the probability that the total cost for repairs for that day exceeds $48. a

*7.98

*7.99

Suppose that T is deﬁned as in Deﬁnition 7.2.

√ a If W is ﬁxed at w, then T is given by Z /c, where c = w/ν. Use this idea to ﬁnd the conditional density of T for a ﬁxed W = w. b Find the joint density of T and W, f (t, w), by using f (t, w) = f (t|w) f (w). c Integrate over w to show that $ ) −(ν+1)/2 [(ν + 1)/2] t2 f (t) = √ 1+ , −∞ < t < ∞. ν π ν(ν/2) Suppose F is deﬁned as in Deﬁnition 7.3. a If W2 is ﬁxed at w 2 , then F = W1 /c, where c = w 2 ν1 /ν2 . Find the conditional density of F for ﬁxed W2 = w 2 . b Find the joint density of F and W2 . c Integrate over w 2 to show that the probability density function of F—say, g(y)—is given by [(ν1 + ν2 )/2](ν1 /ν2 )ν1 /2 (ν1 /2)−1 ν1 y −(ν1 +ν2 )/2 g(y) = y 1+ , 0 < y < ∞. (ν1 /2)(ν2 /2) ν2

*7.100

Let X have a Poisson distribution with parameter λ.

√ a Show that the moment-generating function of Y = (X − λ)/ λ is given by √ √ m Y (t) = exp(λet/ λ − λt − λ).

b

Use the expansion et/

√

λ

=

√ ∞ [t/ λ]i i! i=0

Supplementary Exercises

389

to show that lim m Y (t) = et

λ→∞

c

2 /2

.

Use Theorem 7.5 to show that the distribution function of Y converges to a standard normal distribution function as λ → ∞.

*7.101

In the interest of pollution control, an experimenter wants to count the number of bacteria per small volume of water. Let X denote the bacteria count per cubic centimeter of water and assume that X has a Poisson probability distribution with mean λ = 100. If the allowable pollution in a water supply is a count of 110 per cubic centimeter, approximate the probability that X will be at most 110. [Hint: Use the result in Exercise 7.100(c).]

*7.102

Y , the number of accidents per year at a given intersection, is assumed to have a Poisson distribution. Over the past few years, an average of 36 accidents per year have occurred at this intersection. If the number of accidents per year is at least 45, an intersection can qualify to be redesigned under an emergency program set up by the state. Approximate the probability that the intersection in question will come under the emergency program at the end of the next year.

*7.103

An experimenter is comparing two methods for removing bacteria colonies from processed luncheon meats. After treating some samples by method A and other identical samples by method B, the experimenter selects a 2-cubic-centimeter subsample from each sample and makes bacteria colony counts on these subsamples. Let X denote the total count for the subsamples treated by method A and let Y denote the total count for the subsamples treated by method B. Assume that X and Y are independent Poisson random variables with means λ1 and λ2 , respectively. If X exceeds Y by more than 10, method B will be judged superior to A. Suppose that, in fact, λ1 = λ2 = 50. Find the approximate probability that method B will be judged superior to method A.

*7.104

Let Yn be a binomial random variable with n trials and with success probability p. Suppose that n tends to inﬁnity and p tends to zero in such a way that np remains ﬁxed at np = λ. Use the result in Theorem 7.5 to prove that the distribution of Yn converges to a Poisson distribution with mean λ.

*7.105

If the probability that a person will suffer an adverse reaction from a medication is .001, use the result of Exercise 7.104 to approximate the probability that 2 or more persons will suffer an adverse reaction if the medication is administered to 1000 individuals.

CHAPTER

8

Estimation 8.1

Introduction

8.2

The Bias and Mean Square Error of Point Estimators

8.3

Some Common Unbiased Point Estimators

8.4

Evaluating the Goodness of a Point Estimator

8.5

Conﬁdence Intervals

8.6

Large-Sample Conﬁdence Intervals

8.7

Selecting the Sample Size

8.8

Small-Sample Conﬁdence Intervals for µ and µ1 − µ2

8.9

Conﬁdence Intervals for σ 2

8.10 Summary References and Further Readings

8.1 Introduction As stated in Chapter 1, the purpose of statistics is to use the information contained in a sample to make inferences about the population from which the sample is taken. Because populations are characterized by numerical descriptive measures called parameters, the objective of many statistical investigations is to estimate the value of one or more relevant parameters. As you will see, the sampling distributions derived in Chapter 7 play an important role in the development of the estimation procedures that are the focus of this chapter. Estimation has many practical applications. For example, a manufacturer of washing machines might be interested in estimating the proportion p of washers that can be expected to fail prior to the expiration of a 1-year guarantee time. Other important population parameters are the population mean, variance, and standard deviation. For example, we might wish to estimate the mean waiting time µ at a supermarket checkout station or the standard deviation of the error of measurement σ of an electronic 390

8.1

Introduction

391

instrument. To simplify our terminology, we will call the parameter of interest in the experiment the target parameter. Suppose that we wish to estimate the average amount of mercury µ that a newly developed process can remove from 1 ounce of ore obtained at a geographic location. We could give our estimate in two distinct forms. First, we could use a single number— for instance .13 ounce—that we think is close to the unknown population mean µ. This type of estimate is called a point estimate because a single value, or point, is given as the estimate of µ. Second, we might say that µ will fall between two numbers—for example, between .07 and .19 ounce. In this second type of estimation procedure, the two values that we give may be used to construct an interval (.07, .19) that is intended to enclose the parameter of interest; thus, the estimate is called an interval estimate. The information in the sample can be used to calculate the value of a point estimate, an interval estimate, or both. In any case, the actual estimation is accomplished by using an estimator for the target parameter.

DEFINITION 8.1

An estimator is a rule, often expressed as a formula, that tells how to calculate the value of an estimate based on the measurements contained in a sample.

For example, the sample mean

Y =

n 1 Yi n i=1

is one possible point estimator of the population mean µ. Clearly, the expression for Y is both a rule and a formula. It tells us to sum the sample observations and divide by the sample size n. An experimenter who wants an interval estimate of a parameter must use the sample data to calculate two values, chosen so that the interval formed by the two values includes the target parameter with a speciﬁed probability. Examples of interval estimators will be given in subsequent sections. Many different estimators (rules for estimating) may be obtained for the same population parameter. This should not be surprising. Ten engineers, each assigned to estimate the cost of a large construction job, could use different methods of estimation and thereby arrive at different estimates of the total cost. Such engineers, called estimators in the construction industry, base their estimates on speciﬁed ﬁxed guidelines and intuition. Each estimator represents a unique human subjective rule for obtaining a single estimate. This brings us to a most important point: Some estimators are considered good, and others, bad. The management of a construction ﬁrm must deﬁne good and bad as they relate to the estimation of the cost of a job. How can we establish criteria of goodness to compare statistical estimators? The following sections contain some answers to this question.

392

Chapter 8

Estimation

8.2 The Bias and Mean Square Error of Point Estimators Point estimation is similar, in many respects, to ﬁring a revolver at a target. The estimator, generating estimates, is analogous to the revolver; a particular estimate is comparable to one shot; and the parameter of interest corresponds to the bull’s-eye. Drawing a single sample from the population and using it to compute an estimate for the value of the parameter corresponds to ﬁring a single shot at the bull’s-eye. Suppose that a man ﬁres a single shot at a target and that shot pierces the bull’seye. Do we conclude that he is an excellent shot? Would you want to hold the target while a second shot is ﬁred? Obviously, we would not decide that the man is an expert marksperson based on such a small amount of evidence. On the other hand, if 100 shots in succession hit the bull’s-eye, we might acquire sufﬁcient conﬁdence in the marksperson and consider holding the target for the next shot if the compensation was adequate. The point is that we cannot evaluate the goodness of a point estimation procedure on the basis of the value of a single estimate; rather, we must observe the results when the estimation procedure is used many, many times. Because the estimates are numbers, we evaluate the goodness of the point estimator by constructing a frequency distribution of the values of the estimates obtained in repeated sampling and note how closely this distribution clusters about the target parameter. Suppose that we wish to specify a point estimate for a population parameter that ˆ read as “θ hat.” we will call θ . The estimator of θ will be indicated by the symbol θ, The “hat” indicates that we are estimating the parameter immediately beneath it. With the revolver-ﬁring example in mind, we can say that it is highly desirable for the distribution of estimates—or, more properly, the sampling distribution of the estimator—to cluster about the target parameter as shown in Figure 8.1. In other words, we would like the mean or expected value of the distribution of estimates to ˆ = θ. Point estimators that satisfy this equal the parameter estimated; that is, E(θ) property are said to be unbiased. The sampling distribution for a positively biased ˆ > θ , is shown in Figure 8.2. point estimator, one for which E(θ) F I G U R E 8.1 A distribution of estimates

F I G U R E 8.2 Sampling distribution for a positively biased estimator

ˆ

f (ˆ )

E(ˆ )

ˆ

8.2

The Bias and Mean Square Error of Point Estimators

393

DEFINITION 8.2

Let θˆ be a point estimator for a parameter θ. Then θˆ is an unbiased estimator if E(θˆ ) = θ. If E(θˆ ) =

θ, θˆ is said to be biased.

DEFINITION 8.3

ˆ = E(θ) ˆ − θ. The bias of a point estimator θˆ is given by B(θ) Figure 8.3 shows two possible sampling distributions for unbiased point estimators for a target parameter θ. We would prefer that our estimator have the type of distribution indicated in Figure 8.3(b) because the smaller variance guarantees that in repeated sampling a higher fraction of values of θˆ 2 will be “close” to θ. Thus, in addition to preferring unbiasedness, we want the variance of the distribution of the estimator V (θˆ ) to be as small as possible. Given two unbiased estimators of a parameter θ, and all other things being equal, we would select the estimator with the smaller variance. Rather than using the bias and variance of a point estimator to characterize its goodness, we might employ E[(θˆ − θ)2 ], the average of the square of the distance between the estimator and its target parameter.

DEFINITION 8.4

The mean square error of a point estimator θˆ is ˆ = E[(θˆ − θ)2 ]. MSE(θ) ˆ is a function of both its variance The mean square error of an estimator θˆ , MSE(θ), ˆ ˆ it can be shown that and its bias. If B(θ ) denotes the bias of the estimator θ, ˆ = V(θ) ˆ + [B(θ)] ˆ 2. MSE(θ) We will leave the proof of this result as Exercise 8.1. In this section, we have deﬁned properties of point estimators that are sometimes desirable. In particular, we often seek unbiased estimators with relatively small variances. In the next section, we consider some common and useful unbiased point estimators.

F I G U R E 8.3 Sampling distributions for two unbiased estimators: (a) estimator with large variation; (b) estimator with small variation

f (ˆ 1)

f (ˆ 2)

ˆ1

(a)

(b)

ˆ2

394

Chapter 8

Estimation

Exercises 8.1

Using the identity (θˆ − θ ) = [θˆ − E(θˆ )] + [E(θˆ ) − θ ] = [θˆ − E(θˆ )] + B(θˆ ), show that MSE(θˆ ) = E[(θˆ − θ )2 ] = V (θˆ ) + (B(θˆ ))2 .

8.2

a If θˆ is an unbiased estimator for θ , what is B(θˆ )? b If B(θˆ ) = 5, what is E(θˆ )?

8.3

Suppose that θˆ is an estimator for a parameter θ and E(θˆ ) = aθ +b for some nonzero constants a and b. a In terms of a, b, and θ , what is B(θˆ )? b Find a function of θˆ —say, θˆ —that is an unbiased estimator for θ.

8.4

Refer to Exercise 8.1. a If θˆ is an unbiased estimator for θ, how does MSE(θˆ ) compare to V (θˆ )? b If θˆ is an biased estimator for θ , how does MSE(θˆ ) compare to V (θˆ )?

8.5

Refer to Exercises 8.1 and consider the unbiased estimator θˆ that you proposed in Exercise 8.3. a Express MSE(θˆ ) as a function of V (θˆ ). b Give an example of a value of a for which MSE(θˆ ) < MSE(θˆ ). c Give an example of values for a and b for which MSE(θˆ ) > MSE(θˆ ).

8.6

Suppose that E(θˆ 1 ) = E(θˆ 2 ) = θ, V (θˆ1 ) = σ12 , and V (θˆ 2 ) = σ22 . Consider the estimator θˆ3 = a θˆ1 + (1 − a)θˆ2 . a Show that θˆ3 is an unbiased estimator for θ. b If θˆ 1 and θˆ 2 are independent, how should the constant a be chosen in order to minimize the variance of θˆ 3 ?

8.7

Consider the situation described in Exercise 8.6. How should the constant a be chosen to minimize the variance of θˆ 3 if θˆ1 and θˆ 2 are not independent but are such that Cov(θˆ1 , θˆ 2 ) = c=

0?

8.8

Suppose that Y1 , Y2 , Y3 denote a random sample from an exponential distribution with density function 1 −y/θ e , y > 0, f (y) = θ 0, elsewhere. Consider the following ﬁve estimators of θ: θˆ1 = Y1 ,

Y1 + Y 2 , θˆ2 = 2

Y1 + 2Y2 θˆ3 = , 3

θˆ4 = min(Y1 , Y2 , Y3 ),

a Which of these estimators are unbiased? b Among the unbiased estimators, which has the smallest variance?

θˆ5 = Y .

Exercises

8.9

395

Suppose that Y1 , Y2 , . . . , Yn constitute a random sample from a population with probability density function 1 e−y/(θ+1) , y > 0, θ > −1, f (y) = θ +1 0, elsewhere. Suggest a suitable statistic to use as an unbiased estimator for θ . [Hint: Consider Y .]

8.10

The number of breakdowns per week for a type of minicomputer is a random variable Y with a Poisson distribution and mean λ. A random sample Y1 , Y2 , . . . , Yn of observations on the weekly number of breakdowns is available. a Suggest an unbiased estimator for λ. b The weekly cost of repairing these breakdowns is C = 3Y +Y 2 . Show that E(C) = 4λ + λ2 . c Find a function of Y1 , Y2 , . . . , Yn that is an unbiased estimator of E(C). [Hint: Use what you know about Y and (Y )2 .]

8.11

Let Y1 , Y2 , . . . , Yn denote a random sample of size n from a population with mean 3. Assume that θˆ 2 is an unbiased estimator of E(Y 2 ) and that θˆ 3 is an unbiased estimator of E(Y 3 ). Give an unbiased estimator for the third central moment of the underlying distribution.

8.12

The reading on a voltage meter connected to a test circuit is uniformly distributed over the interval (θ, θ + 1), where θ is the true but unknown voltage of the circuit. Suppose that Y1 , Y2 , . . . , Yn denote a random sample of such readings. a Show that Y is a biased estimator of θ and compute the bias. b Find a function of Y that is an unbiased estimator of θ. c Find MSE(Y ) when Y is used as an estimator of θ.

8.13

We have seen that if Y has a binomial distribution with parameters n and p, then Y/n is an unbiased estimator of p. To estimate the variance of Y , we generally use n(Y /n)(1 − Y/n). a Show that the suggested estimator is a biased estimator of V (Y ). b Modify n(Y /n)(1 − Y /n) slightly to form an unbiased estimator of V (Y ).

8.14

Let Y1 , Y2 , . . . , Yn denote a random sample of size n from a population whose density is given by $ α−1 α αy /θ , 0 ≤ y ≤ θ, f (y) = 0, elsewhere, where α > 0 is a known, ﬁxed value, but θ is unknown. (This is the power family distribution introduced in Exercise 6.17.) Consider the estimator θˆ = max(Y1 , Y2 , . . . , Yn ). a Show that θˆ is a biased estimator for θ. b Find a multiple of θˆ that is an unbiased estimator of θ . c Derive MSE(θˆ ).

8.15

Let Y1 , Y2 , . . . , Yn denote a random sample of size n from a population whose density is given by $ 3 −4 3β y , β ≤ y, f (y) = 0, elsewhere, where β > 0 is unknown. (This is one of the Pareto distributions introduced in Exercise 6.18.) Consider the estimator βˆ = min(Y1 , Y2 , . . . , Yn ). a b

ˆ Derive the bias of the estimator β. ˆ Derive MSE(β).

396

Chapter 8

Estimation

*8.16

Suppose that Y1 , Y2 , . . . , Yn constitute a random sample from a normal distribution with parameters µ and σ 2 .1 √ a Show that S = S 2 is a biased estimator of σ . [Hint: Recall the distribution of (n−1)S 2 /σ 2 and the result given in Exercise 4.112.] b Adjust S to form an unbiased estimator of σ . c Find an unbiased estimator of µ − z α σ , the point that cuts off a lower-tail area of α under this normal curve.

8.17

If Y has a binomial distribution with parameters n and p, then pˆ 1 = Y /n is an unbiased estimator of p. Another estimator of p is pˆ 2 = (Y + 1)/(n + 2). a Derive the bias of pˆ 2 . b Derive MSE( pˆ 1 ) and MSE( pˆ 2 ). c For what values of p is MSE( pˆ 1 ) < MSE( pˆ 2 )?

8.18

Let Y1 , Y2 , . . . , Yn denote a random sample of size n from a population with a uniform distribution on the interval (0, θ). Consider Y(1) = min(Y1 , Y2 , . . . , Yn ), the smallest-order statistic. Use the methods of Section 6.7 to derive E(Y(1) ). Find a multiple of Y(1) that is an unbiased estimator for θ .

8.19

Suppose that Y1 , Y2 , . . . , Yn denote a random sample of size n from a population with an exponential distribution whose density is given by $ (1/θ )e−y/θ , y > 0, f (y) = 0, elsewhere. If Y(1) = min(Y1 , Y2 , . . . , Yn ) denotes the smallest-order statistic, show that θˆ = nY(1) is an unbiased estimator for θ and ﬁnd MSE(θˆ ). [Hint: Recall the results of Exercise 6.81.]

*8.20

Suppose that Y1 , Y2 , Y3 , Y4 denote a random sample of size 4 from a population with an exponential distribution whose density is given by $ (1/θ )e−y/θ , y > 0, f (y) 0, elsewhere. √ a Let X = Y1 Y2 . Find a multiple of X that is an unbiased estimator for θ. [Hint: Use√your √ knowledge of the gamma distribution and the fact that (1/2) = π to ﬁnd E( Y1 ). Recall that the variables Yi are independent.] √ b Let W = Y1 Y2 Y3 Y4 . Find a multiple of W that is an unbiased estimator for θ 2 . [Recall the hint for part (a).]

8.3 Some Common Unbiased Point Estimators Some formal methods for deriving point estimators for target parameters are presented in Chapter 9. In this section, we focus on some estimators that merit consideration on the basis of intuition. For example, it seems natural to use the sample mean 1. Exercises preceded by an asterisk are optional.

8.3

Some Common Unbiased Point Estimators

397

Y to estimate the population mean µ and to use the sample proportion pˆ = Y/n to estimate a binomial parameter p. If an inference is to be based on independent random samples of n 1 and n 2 observations selected from two different populations, how would we estimate the difference between means (µ1 − µ2 ) or the difference in two binomial parameters, ( p1 − p2 )? Again, our intuition suggests using the point estimators (Y 1 − Y 2 ), the difference in the sample means, to estimate (µ1 − µ2 ) and using ( pˆ 1 − pˆ 2 ), the difference in the sample proportions, to estimate ( p1 − p2 ). Because the four estimators Y , pˆ , (Y 1 − Y 2 ), and ( pˆ 1 − pˆ 2 ) are functions of the random variables observed in samples, we can ﬁnd their expected values and variances by using the expectation theorems of Sections 5.6–5.8. The standard deviation of each of the estimators is simply the square root of the respective variance. Such an effort would show that, when random sampling has been employed, all four point estimators are unbiased and that they possess the standard deviations shown in Table 8.1. To facilitate communication, we use the notation σθˆ2 to denote the variance ˆ The standard deviation of the sampling of the sampling distribution of the estimator θ. / 2 ˆ distribution of the estimator θ , σθˆ = σθˆ , is usually called the standard error of the estimator θˆ . In Chapter 5, we did much of the derivation required for Table 8.1. In particular, we found the means and variances of Y and pˆ in Examples 5.27 and 5.28, respectively. If the random samples are independent, these results and Theorem 5.12 imply that E(Y 1 − Y 2 ) = E(Y 1 ) − E(Y 2 ) = µ1 − µ2 , V(Y 1 − Y 2 ) = V(Y 1 ) + V(Y 2 ) =

σ12 σ2 + 2. n1 n2

The expected value and standard error of ( pˆ 1 − pˆ 2 ), shown in Table 8.1, can be acquired similarly.

Table 8.1 Expected values and standard errors of some common point estimators

Target Parameter θ

Sample Size(s)

Point Estimator θˆ

E(θˆ )

µ

n

Y

µ

p

n

pˆ =

Y n

p 0

µ1 − µ2

n 1 and n 2

Y1 − Y2

µ1 − µ 2

p1 − p2

n 1 and n 2

pˆ 1 − pˆ 2

p1 − p2

1

∗

σ12 and σ22 are the variances of populations 1 and 2, respectively. † The two samples are assumed to be independent.

Standard Error σθˆ σ √ n 1 pq n σ2 σ12 + 2 n1 n2

∗†

p2 q 2 p1 q 1 + n1 n2

†

398

Chapter 8

Estimation

Although unbiasedness is often a desirable property for a point estimator, not all estimators are unbiased. In Chapter 1, we deﬁned the sample variance n (Yi − Y )2 S 2 = i=1 . n−1 It probably seemed more natural to divide by n than by n − 1 in the preceding expression and to calculate n (Yi − Y )2 2 . S = i=1 n Example 8.1 establishes that S 2 and S 2 are, respectively, biased and unbiased estimators of the population variance σ 2 . We initially identiﬁed S 2 as the sample variance because it is an unbiased estimator. E X A M PL E 8.1

Let Y1 , Y2 , . . . , Yn be a random sample with E(Yi ) = µ and V (Yi ) = σ 2 . Show that S 2 =

n 1 (Yi − Y )2 n i=1

is a biased estimator for σ 2 and that S2 =

n 1 (Yi − Y )2 n − 1 i=1

is an unbiased estimator for σ 2 . Solution

It can be shown (see Exercise 1.9) that n n 1 (Yi − Y )2 = Yi2 − n i=1 i=1

n i=1

2 Yi

=

n

2

Yi2 − nY .

i=1

Hence,

n n n 2 2 2 2 E (Yi − Y ) = E Yi − n E(Y ) = E(Yi2 ) − n E(Y ). i=1

i=1

i=1

Notice that E(Yi2 ) is the same for i = 1, 2, . . . , n. We use this and the fact that the variance of a random variable is given by V (Y ) = E(Y 2 ) − [E(Y )]2 to conclude that 2 E(Yi2 ) = V (Yi ) + [E(Yi )]2 = σ 2 + µ2 , E(Y ) = V (Y ) + [E(Y )]2 = σ 2 /n + µ2 , and that 2 n n σ 2 2 2 2 E +µ (Yi − Y ) = (σ + µ ) − n n i=1 i=1 2 σ 2 2 2 +µ = n(σ + µ ) − n n = nσ 2 − σ 2 = (n − 1)σ 2 .

8.4

Evaluating the Goodness of a Point Estimator

399

It follows that n n−1 1 1 2 2 2 E(S ) = E (Yi − Y ) = (n − 1)σ = σ2 n n n i=1 and that S 2 is biased because E(S 2 ) =

σ 2 . However, n 1 1 E(S 2 ) = (Yi − Y )2 = E (n − 1)σ 2 = σ 2 , n−1 n − 1 i=1 so we see that S 2 is an unbiased estimator for σ 2 .

Two ﬁnal comments can be made concerning the point estimators of Table 8.1. First, the expected values and standard errors for Y and Y 1 − Y 2 given in the table are valid regardless of the distribution of the population(s) from which the sample(s) is (are) taken. Second, all four estimators possess probability distributions that are approximately normal for large samples. The central limit theorem justiﬁes this statement for Y and pˆ , and similar theorems for functions of sample means justify the assertion for (Y 1 − Y 2 ) and ( pˆ 1 − pˆ 2 ). How large is “large”? For most populations, the probability distribution of Y is mound-shaped even for relatively small samples (as low as n = 5), and will tend rapidly to normality as the sample size approaches n = 30 or larger. However, you sometimes will need to select larger samples from binomial populations because the required sample size depends on p. The binomial probability distribution is perfectly symmetric about its mean when p = 1/2 and becomes more and more asymmetric as p tends to 0 or 1. As a rough rule, you can assume that the distribution of √ pˆ will be mound-shaped and approaching normality for sample sizes such that p ± 3 pq/n lies in the interval (0, 1), or, as you demonstrated in Exercise 7.70, if n > 9 (larger of p and q)/(smaller of p and q). We know that Y , pˆ , (Y 1 −Y 2 ), and ( pˆ 1 − pˆ 2 ) are unbiased with near-normal (at least mound-shaped) sampling distributions for moderate-sized samples; now let us use this information to answer some practical questions. If we use an estimator once and acquire a single estimate, how good will this estimate be? How much faith can we place in the validity of our inference? The answers to these questions are provided in the next section.

8.4 Evaluating the Goodness of a Point Estimator One way to measure the goodness of any point estimation procedure is in terms of the distances between the estimates that it generates and the target parameter. This quantity, which varies randomly in repeated sampling, is called the error of estimation. Naturally we would like the error of estimation to be as small as possible.

400

Chapter 8

Estimation

DEFINITION 8.5

The error of estimation ε is the distance between an estimator and its target parameter. That is, ε = |θˆ − θ|. Because θˆ is a random variable, the error of estimation is also a random quantity, and we cannot say how large or small it will be for a particular estimate. However, we can make probability statements about it. For example, suppose that θˆ is an unbiased estimator of θ and has a sampling distribution as shown in Figure 8.4. If we select two points, (θ − b) and (θ + b), located near the tails of the probability density, the probability that the error of estimation ε is less than b is represented by the shaded area in Figure 8.4. That is, P(|θˆ − θ| < b) = P[−b < (θˆ − θ) < b] = P(θ − b < θˆ < θ + b). We can think of b as a probabilistic bound on the error of estimation. Although we are not certain that a given error is less than b, Figure 8.4 indicates that P(ε < b) is high. If b can be regarded from a practical point of view as small, then P(ε < b) provides a measure of the goodness of a single estimate. This probability identiﬁes the fraction of times, in repeated sampling, that the estimator θˆ falls within b units of θ, the target parameter. Suppose that we want to ﬁnd the value of b so that P(ε < b) = .90. This is easy if we know the probability density function of θˆ . Then we seek a value b such that " θ+b ˆ d θˆ = .90. f (θ) θ−b

But whether we know the probability distribution of θˆ or not, if θˆ is unbiased we can ﬁnd an approximate bound on ε by expressing b as a multiple of the standard error of θˆ (recall that the standard error of an estimator is simply a convenient alternative name for the standard deviation of the estimator). For example, for k ≥ 1, if we let b = kσθˆ , we know from Tchebysheff’s theorem that ε will be less than kσθˆ with probability at least 1 − 1/k 2 . A convenient and often-used value of k is k = 2. Hence, we know that ε will be less than b = 2σθˆ with probability at least .75. You will ﬁnd that, with a probability in the vicinity of .95, many random variables observed in nature lie within 2 standard deviations of their mean. The probability F I G U R E 8.4 Sampling distribution of a point estimator θˆ

f (ˆ )

P(⑀ < b )

( – b)

b

( + b) b

ˆ

8.4

Evaluating the Goodness of a Point Estimator

401

Table 8.2 Probability that (µ− 2σ ) < Y < (µ +2σ )

Distribution

Probability

Normal Uniform Exponential

.9544 1.0000 .9502

that Y lies in the interval (µ ± 2σ ) is shown in Table 8.2 for the normal, uniform, and exponential probability distributions. The point is that b = 2σθˆ is a good approximate bound on the error of estimation in most practical situations. According to Tchebysheff’s theorem, the probability that the error of estimation will be less than this bound is at least .75. As we have previously observed, the bounds for probabilities provided by Tchebysheff’s theorem are usually very conservative; the actual probabilities usually exceed the Tchebysheff bounds by a considerable amount. E X A M PL E 8.2

A sample of n = 1000 voters, randomly selected from a city, showed y = 560 in favor of candidate Jones. Estimate p, the fraction of voters in the population favoring Jones, and place a 2-standard-error bound on the error of estimation.

Solution

We will use the estimator pˆ = Y/n to estimate p. Hence, the estimate of p, the fraction of voters favoring candidate Jones, is y 560 = = .56. n 1000 How much faith can we place in this value? The probability distribution of pˆ is very accurately approximated by a normal probability distribution for large samples. Since n = 1000, when b = 2σ pˆ , the probability that ε will be less than b is approximately .95. √ From Table 8.1, the standard error of the estimator for p is given by σ pˆ = pq/n. Therefore, 1 pq . b = 2σ pˆ = 2 n pˆ =

Unfortunately, to calculate b, we need to know p, and estimating p was the objective of our sampling. This apparent stalemate is not a handicap, however, because σ pˆ varies little for small changes in p. Hence, substitution of the estimate pˆ for p produces little error in calculating the exact value of b = 2σ pˆ . Then, for our example, we have 1 1 pq (.56)(.44) ≈2 = .03. b = 2σ pˆ = 2 n 1000 What is the signiﬁcance of our calculations? The probability that the error of estimation is less than .03 is approximately .95. Consequently, we can be reasonably conﬁdent that our estimate, .56, is within .03 of the true value of p, the proportion of voters in the population who favor Jones.

402

Chapter 8

Estimation

E X A M PL E 8.3

A comparison of the durability of two types of automobile tires was obtained by road testing samples of n 1 = n 2 = 100 tires of each type. The number of miles until wear-out was recorded, where wear-out was deﬁned as the number of miles until the amount of remaining tread reached a prespeciﬁed small value. The measurements for the two types of tires were obtained independently, and the following means and variances were computed: y 1 = 26,400 miles,

y 2 = 25,100 miles,

s12 = 1,440,000,

s22 = 1,960,000.

Estimate the difference in mean miles to wear-out and place a 2-standard-error bound on the error of estimation. Solution

The point estimate of (µ1 − µ2 ) is (y 1 − y 2 ) = 26,400 − 25,100 = 1300 miles, and the standard error of the estimator (see Table 8.1) is 0 σ12 σ2 + 2. σ(Y 1 −Y 2 ) = n1 n2 We must know σ12 and σ22 , or have good approximate values for them, to calculate σ(Y 1 −Y 2 ) . Fairly accurate values of σ12 and σ22 often can be calculated from similar experimental data collected at some prior time, or they can be obtained from the current sample data by using the unbiased estimators ni 1 (Yi j − Y i )2 , i = 1, 2. σˆ i2 = Si2 = n i − 1 j=1 These estimates will be adequate if the sample sizes are reasonably large—say, n i ≥ 30—for i = 1, 2. The calculated values of S12 and S22 , based on the two wear tests, are s12 = 1,440,000 and s22 = 1,960,000. Substituting these values for σ12 and σ22 in the formula for σ(Y 1 −Y 2 ) , we have 0 0 1 σ12 s12 1,440,000 1,960,000 σ22 s22 + + ≈ + = σ(Y 1 −Y 2 ) = n1 n2 n1 n2 100 100 ( = 34,000 = 184.4 miles. Consequently, we estimate the difference in mean wear to be 1300 miles, and we expect the error of estimation to be less than 2σ(Y 1 −Y 2 ) , or 368.8 miles, with a probability of approximately .95.

Exercises 8.21

An investigator is interested in the possibility of merging the capabilities of television and the Internet. A random sample of n = 50 Internet users yielded that the mean amount of time spent watching television per week was 11.5 hours and that the standard deviation was 3.5 hours. Estimate the population mean time that Internet users spend watching television and place a bound on the error of estimation.

Exercises

403

8.22

An increase in the rate of consumer savings frequently is tied to a lack of conﬁdence in the economy and is said to be an indicator of a recessional tendency in the economy. A random sampling of n = 200 savings accounts in a local community showed the mean increase in savings account values to be 7.2% over the past 12 months, with standard deviation 5.6%. Estimate the mean percentage increase in savings account values over the past 12 months for depositors in the community. Place a bound on your error of estimation.

8.23

The Environmental Protection Agency and the University of Florida recently cooperated in a large study of the possible effects of trace elements in drinking water on kidney-stone disease. The accompanying table presents data on age, amount of calcium in home drinking water (measured in parts per million), and smoking activity. These data were obtained from individuals with recurrent kidney-stone problems, all of whom lived in the Carolinas and the Rocky Mountain states. Carolinas

Rockies

467 45.1 10.2 11.3 16.6 .78

191 46.4 9.8 40.1 28.4 .61

Sample size Mean age Standard deviation of age Mean calcium component (ppm) Standard deviation of calcium Proportion now smoking

a Estimate the average calcium concentration in drinking water for kidney-stone patients in the Carolinas. Place a bound on the error of estimation. b Estimate the difference in mean ages for kidney-stone patients in the Carolinas and in the Rockies. Place a bound on the error of estimation. c Estimate and place a 2-standard-deviation bound on the difference in proportions of kidney-stone patients from the Carolinas and Rockies who were smokers at the time of the study.

Text not available due to copyright restrictions

8.25

A study was conducted to compare the mean number of police emergency calls per 8-hour shift in two districts of a large city. Samples of 100 8-hour shifts were randomly selected from the police records for each of the two regions, and the number of emergency calls was recorded for each shift. The sample statistics are given in the following table. Region

Sample size Sample mean Sample variance

1

2

100 2.4 1.44

100 3.1 2.64

Text not available due to copyright restrictions

404

Chapter 8

Estimation

a Estimate the difference in the mean number of police emergency calls per 8-hour shift between the two districts in the city. b Find a bound for the error of estimation.

8.26

The Mars twin rovers, Spirit and Opportunity, which roamed the surface of Mars in the winter of 2004, found evidence that there was once water on Mars, raising the possibility that there was once life on the plant. Do you think that the United States should pursue a program to send humans to Mars? An opinion poll3 indicated that 49% of the 1093 adults surveyed think that we should pursue such a program. a Estimate the proportion of all Americans who think that the United States should pursue a program to send humans to Mars. Find a bound on the error of estimation. b The poll actually asked several questions. If we wanted to report an error of estimation that would be valid for all of the questions on the poll, what value should we use? [Hint: What is the maximum possible value for p × q?]

8.27

A random sample of 985 “likely voters”—those who are judged to be likely to vote in an upcoming election—were polled during a phone-athon conducted by the Republican Party. Of those contacted, 592 indicated that they intended to vote for the Republican running in the election. a According to this study, the estimate for p, the proportion of all “likely voters” who will vote for the Republican candidate, is p = .601. Find a bound for the error of estimation. b If the “likely voters” are representative of those who will actually vote, do you think that the Republican candidate will be elected? Why? How conﬁdent are you in your decision? c Can you think of reasons that those polled might not be representative of those who actually vote in the election?

8.28

In a study of the relationship between birth order and college success, an investigator found that 126 in a sample of 180 college graduates were ﬁrstborn or only children; in a sample of 100 nongraduates of comparable age and socioeconomic background, the number of ﬁrstborn or only children was 54. Estimate the difference in the proportions of ﬁrstborn or only children for the two populations from which these samples were drawn. Give a bound for the error of estimation.

8.29

Sometimes surveys provide interesting information about issues that did not seem to be the focus of survey initially. Results from two CNN/USA Today/Gallup polls, one conducted in March 2003 and one in November 2003, were recently presented online.4 Both polls involved samples of 1001 adults, aged 18 years and older. In the March sample, 45% of those sampled claimed to be fans of professional baseball whereas 51% of those polled in November claimed to be fans. a

b

Give a point estimate for the difference in the proportions of Americans who claim to be baseball fans in March (at the beginning of the season) and November (after the World Series). Provide a bound for the error of estimation. Is there sufﬁcient evidence to conclude that fan support is greater at the end of the season? Explain.

3. Source: “Space Exploration,” Associated Press Poll, http:www.pollingreport.com/science.htm#Space, 5 April 2004. 4. Source: Mark Gillespie,“Baseball Fans Overwhelmingly Want Mandatory Steroid Testing,” http:www. gallup.com/content/print/.aspx?ci=11245, 14 February 2004.

Exercises

405

8.30

Refer to Exercise 8.29. Give the point estimate and a bound on the error of estimation for the proportion of adults who would have claimed to be baseball fans in March 2003. Is it likely that the value of your estimate is off by as much as 10%? Why?

8.31

In a study to compare the perceived effects of two pain relievers, 200 randomly selected adults were given the ﬁrst pain reliever, and 93% indicated appreciable pain relief. Of the 450 individuals given the other pain reliever, 96% indicated experiencing appreciable relief. a Give an estimate for the difference in the proportions of all adults who would indicate perceived pain relief after taking the two pain relievers. Provide a bound on the error of estimation. b Based on your answer to part (a), is there evidence that proportions experiencing relief differ for those who take the two pain relievers? Why?

8.32

An auditor randomly samples 20 accounts receivable from among the 500 such accounts of a client’s ﬁrm. The auditor lists the amount of each account and checks to see if the underlying documents comply with stated procedures. The data are recorded in the accompanying table (amounts are in dollars, Y = yes, and N = no). Account

Amount

Compliance

Account

Amount

Compliance

1 2 3 4 5 6 7 8 9 10

278 192 310 94 86 335 310 290 221 168

Y Y Y N Y Y N Y Y Y

11 12 13 14 15 16 17 18 19 20

188 212 92 56 142 37 186 221 219 305

N N Y Y Y Y N Y N Y

Estimate the total accounts receivable for the 500 accounts of the ﬁrm and place a bound on the error of estimation. Do you think that the average account receivable for the ﬁrm exceeds $250? Why?

8.33

Refer to Exercise 8.32. From the data given on the compliance checks, estimate the proportion of the ﬁrm’s accounts that fail to comply with stated procedures. Place a bound on the error of estimation. Do you think that the proportion of accounts that comply with stated procedures exceeds 80%? Why?

8.34

We can place a 2-standard-deviation bound on the error of estimation with any estimator for which we can ﬁnd a reasonable estimate of the standard error. Suppose that Y1 , Y2 , . . . , Yn represent a random sample from a Poisson distribution with mean λ. We know that V (Yi ) = λ, and hence E(Y ) = λ and V (Y ) = λ/n. How would you employ Y1 , Y2 , . . . , Yn to estimate λ? How would you estimate the standard error of your estimator?

8.35

Refer to Exercise 8.34. In polycrystalline aluminum, the number of grain nucleation sites per unit volume is modeled as having a Poisson distribution with mean λ. Fifty unit-volume test specimens subjected to annealing under regime A produced an average of 20 sites per unit volume. Fifty independently selected unit-volume test specimens subjected to annealing regime B produced an average of 23 sites per unit volume.

406

Chapter 8

Estimation

a Estimate the mean number λA of nucleation sites for regime A and place a 2-standard-error bound on the error of estimation. b Estimate the difference in the mean numbers of nucleation sites λA − λB for regimes A and B. Place a 2-standard-error bound on the error of estimation. Would you say that regime B tends to produce a larger mean number of nucleation sites? Why?

8.36

If Y1 , Y2 , . . . , Yn denote a random sample from an exponential distribution with √ mean θ, then E(Yi ) = θ and V (Yi ) = θ 2 . Thus, E(Y ) = θ and V (Y ) = θ 2 /n, or σY = θ/ n. Suggest an unbiased estimator for θ and provide an estimate for the standard error of your estimator.

8.37

Refer to Exercise 8.36. An engineer observes n = 10 independent length-of-life measurements on a type of electronic component. The average of these 10 measurements is 1020 hours. If these lengths of life come from an exponential distribution with mean θ, estimate θ and place a 2-standard-error bound on the error of estimation.

8.38

The number of persons coming through a blood bank until the ﬁrst person with type A blood is found is a random variable Y with a geometric distribution. If p denotes the probability that any one randomly selected person will possess type A blood, then E(Y ) = 1/ p and V (Y ) = (1 − p)/ p 2 . a Find a function of Y that is an unbiased estimator of V (Y ). b Suggest how to form a 2-standard-error bound on the error of estimation when Y is used to estimate 1/ p.

8.5 Conﬁdence Intervals An interval estimator is a rule specifying the method for using the sample measurements to calculate two numbers that form the endpoints of the interval. Ideally, the resulting interval will have two properties: First, it will contain the target parameter θ; second, it will be relatively narrow. One or both of the endpoints of the interval, being functions of the sample measurements, will vary randomly from sample to sample. Thus, the length and location of the interval are random quantities, and we cannot be certain that the (ﬁxed) target parameter θ will fall between the endpoints of any single interval calculated from a single sample. This being the case, our objective is to ﬁnd an interval estimator capable of generating narrow intervals that have a high probability of enclosing θ. Interval estimators are commonly called conﬁdence intervals. The upper and lower endpoints of a conﬁdence interval are called the upper and lower conﬁdence limits, respectively. The probability that a (random) conﬁdence interval will enclose θ (a ﬁxed quantity) is called the conﬁdence coefﬁcient. From a practical point of view, the conﬁdence coefﬁcient identiﬁes the fraction of the time, in repeated sampling, that the intervals constructed will contain the target parameter θ. If we know that the conﬁdence coefﬁcient associated with our estimator is high, we can be highly conﬁdent that any conﬁdence interval, constructed by using the results from a single sample, will enclose θ. Suppose that θˆ L and θˆ U are the (random) lower and upper conﬁdence limits, respectively, for a parameter θ. Then, if P θˆ L ≤ θ ≤ θˆU = 1 − α,

8.5

Conﬁdence Intervals

407

the probability (1 − α) is the conﬁdence coefﬁcient. The resulting random interval deﬁned by θˆ L , θˆ U is called a two-sided conﬁdence interval. It is also possible to form a one-sided conﬁdence interval such that P θˆ L ≤ θ = 1 − α. Although only θˆ L is random in this case, the conﬁdence interval is [θˆ L , ∞). Similarly, we could have an upper one-sided conﬁdence interval such that P(θ ≤ θˆU ) = 1 − α. The implied conﬁdence interval here is (−∞, θˆ U ]. One very useful method for ﬁnding conﬁdence intervals is called the pivotal method. This method depends on ﬁnding a pivotal quantity that possesses two characteristics: 1. It is a function of the sample measurements and the unknown parameter θ, where θ is the only unknown quantity. 2. Its probability distribution does not depend on the parameter θ. If the probability distribution of the pivotal quantity is known, the following logic can be used to form the desired interval estimate. If Y is any random variable, c > 0 is a constant, and P(a ≤ Y ≤ b) = .7; then certainly P(ca ≤ cY ≤ cb) = .7. Similarly, for any constant d, P(a + d ≤ Y + d ≤ b + d) = .7. That is, the probability of the event (a ≤ Y ≤ b) is unaffected by a change of scale or a translation of Y . Thus, if we know the probability distribution of a pivotal quantity, we may be able to use operations like these to form the desired interval estimator. We illustrate this method in the following examples. E X A M PL E 8.4 Solution

Suppose that we are to obtain a single observation Y from an exponential distribution with mean θ. Use Y to form a conﬁdence interval for θ with conﬁdence coefﬁcient .90. The probability density function for Y is given by 1 −y/θ , y ≥ 0, e f (y) = θ 0, elsewhere. By the transformation method of Chapter 6 we can see that U = Y/θ has the exponential density function given by $ −u e , u > 0, fU (u) = 0, elsewhere. The density function for U is graphed in Figure 8.5. U = Y/θ is a function of Y (the sample measurement) and θ, and the distribution of U does not depend on θ. Thus, we can use U = Y /θ as a pivotal quantity. Because we want an interval estimator with conﬁdence coefﬁcient equal to .90, we ﬁnd two numbers a and b such that P(a ≤ U ≤ b) = .90.

408

Chapter 8

Estimation

F I G U R E 8.5 Density function for U, Example 8.4

f (u) .05

.05 .90 a

u

b

One way to do this is to choose a and b to satisfy " a " −u P(U < a) = e du = .05 and P(U > b) = 0

∞

e−u du = .05.

b

These equations yield 1 − e−a = .05

and

e−b = .05

or, equivalently, a = .051, b = 2.996.

It follows that

Y .90 = P(.051 ≤ U ≤ 2.996) = P .051 ≤ ≤ 2.996 . θ Because we seek an interval estimator for θ, let us manipulate the inequalities describing the event to isolate θ in the middle. Y has an exponential distribution, so P(Y > 0) = 1, and we maintain the direction of the inequalities if we divide through by Y . That is, Y .051 1 2.996 .90 = P .051 ≤ ≤ 2.996 = P ≤ ≤ . θ Y θ Y Taking reciprocals (and hence reversing the direction of the inequalities), we obtain Y Y Y Y .90 = P ≥θ ≥ ≤θ ≤ =P . .051 2.996 2.996 .051 Thus, we see that Y/2.996 and Y/.051 form the desired lower and upper conﬁdence limits, respectively. To obtain numerical values for these limits, we must observe an actual value for Y and substitute that value into the given formulas for the conﬁdence limits. We know that limits of the form (Y /2.996, Y /.051) will include the true (unknown) values of θ for 90% of the values of Y we would obtain by repeatedly sampling from this exponential distribution.

E X A M PL E 8.5

Suppose that we take a sample of size n = 1 from a uniform distribution deﬁned on the interval [0, θ], where θ is unknown. Find a 95% lower conﬁdence bound for θ.

Solution

Because Y is uniform on [0, θ], the methods of Chapter 6 can be used to show that U = Y /θ is uniformly distributed over [0, 1]. That is, $ 1, 0 ≤ u ≤ 1, fU (u) = 0, elsewhere.

Exercises

F I G U R E 8.6 Density function for U, Example 8.5

409

f (u) 1 .05 .95

a 1

u

Figure 8.6 contains a graph of the density function for U . Again, we see that U satisﬁes the requirements of a pivotal quantity. Because we seek a 95% lower conﬁdence limit for θ , let us determine the value for a so that P(U ≤ a) = .95. That is, " a (1) du = .95, 0

or a = .95. Thus,

Y Y P(U ≤ .95) = P ≤ .95 = P(Y ≤ .95θ) = P ≤ θ = .95. θ .95 We see that Y /.95 is a lower conﬁdence limit for θ, with conﬁdence coefﬁcient .95. Because any observed Y must be less than θ, it is intuitively reasonable to have the lower conﬁdence limit for θ slightly larger than the observed value of Y .

The two preceding examples illustrate the use of the pivotal method for ﬁnding conﬁdence limits for unknown parameters. In each instance, the interval estimates were developed on the basis of a single observation from the distribution. These examples were introduced primarily to illustrate the pivotal method. In the remaining sections of this chapter, we use this method in conjunction with the sampling distributions presented in Chapter 7 to develop some interval estimates of greater practical importance.

Exercises 8.39

Suppose that the random variable Y has a gamma distribution with parameters α = 2 and an unknown β. In Exercise 6.46, you used the method of moment-generating functions to prove a general result implying that 2Y/β has a χ 2 distribution with 4 degrees of freedom (df). Using 2Y /β as a pivotal quantity, derive a 90% conﬁdence interval for β.

8.40

Suppose that the random variable Y is an observation from a normal distribution with unknown mean µ and variance 1. Find a a 95% conﬁdence interval for µ. b 95% upper conﬁdence limit for µ. c 95% lower conﬁdence limit for µ.

8.41

Suppose that Y is normally distributed with mean 0 and unknown variance σ 2 . Then Y 2 /σ 2 has a χ 2 distribution with 1 df. Use the pivotal quantity Y 2 /σ 2 to ﬁnd a

410

Chapter 8

Estimation

a 95% conﬁdence interval for σ 2 . b 95% upper conﬁdence limit for σ 2 . c 95% lower conﬁdence limit for σ 2 .

8.42

Use the answers from Exercise 8.41 to ﬁnd a a 95% conﬁdence interval for σ . b 95% upper conﬁdence limit for σ . c 95% lower conﬁdence limit for σ .

8.43

Let Y1 , Y2 , . . . , Yn denote a random sample of size n from a population with a uniform distribution on the interval (0, θ ). Let Y(n) = max(Y1 , Y2 , . . . , Yn ) and U = (1/θ )Y(n) . a Show that U has distribution function

b

8.44

0, u < 0, FU (u) = u n , 0 ≤ u ≤ 1, 1, u > 1. Because the distribution of U does not depend on θ , U is a pivotal quantity. Find a 95% lower conﬁdence bound for θ .

Let Y have probability density function

2(θ − y) , 0 < y < θ, θ2 0, elsewhere. Show that Y has distribution function 0, y ≤ 0, 2y y2 FY (y) = − 2 , 0 < y < θ, θ θ 1, y ≥ θ. f Y (y) =

a

b Show that Y /θ is a pivotal quantity. c Use the pivotal quantity from part (b) to ﬁnd a 90% lower conﬁdence limit for θ.

8.45

Refer to Exercise 8.44. a Use the pivotal quantity from Exercise 8.44(b) to ﬁnd a 90% upper conﬁdence limit for θ . b If θˆ L is the lower conﬁdence bound for θ obtained in Exercise 8.44(c) and θˆ U is the upper bound found in part (a), what is the conﬁdence coefﬁcient of the interval (θˆ L , θˆ U )?

8.46

Refer to Example 8.4 and suppose that Y is a single observation from an exponential distribution with mean θ . Use the method of moment-generating functions to show that 2Y /θ is a pivotal quantity and has a χ 2 distribution with 2 df. b Use the pivotal quantity 2Y /θ to derive a 90% conﬁdence interval for θ. c Compare the interval you obtained in part (b) with the interval obtained in Example 8.4. a

8.47

Refer to Exercise 8.46. Assume that Y1 , Y2 , . . . , Yn is a sample of size n from an exponential distribution with mean θ . n a Use the method of moment-generating functions to show that 2 i=1 Yi /θ is a pivotal quantity and has a χ 2 distribution with 2n df. n b Use the pivotal quantity 2 i=1 Yi /θ to derive a 95% conﬁdence interval for θ .

8.6

Large-Sample Conﬁdence Intervals

411

c If a sample of size n = 7 yields y = 4.77, use the result from part (b) to give a 95% conﬁdence interval for θ .

8.48

Refer to Exercises 8.39 and 8.47. Assume that Y1 , Y2 , . . . , Yn is a sample of size n from a gamma-distributed population with α = 2 and unknown β. a Use the method of moment-generating functions to show that 2 n1 Yi /β is a pivotal quantity 2 and has a χ distribution with 4n df. b Use the pivotal quantity 2 n1 Yi /β to derive a 95% conﬁdence interval for β. c If a sample of size n = 5 yields y = 5.39, use the result from part (b) to give a 95% conﬁdence interval for β.

8.49

Refer to Exercise 8.48. Suppose that Y1 , Y2 , . . . , Yn is a sample of size n from a gammadistributed population with parameters α and β. If α = m, where m is a known integer and β is unknown, ﬁnd a pivotal quantity that has a χ 2 distribution with m × n df. Use this pivotal quantity to derive a 100(1 − α)% conﬁdence interval for β. b If α = c, where c is a known constant but not an integer and β is unknown, ﬁnd a pivotal quantity that has a gamma distribution with parameters α = cn and β = 1. Give a formula for a 100(1 − α)% conﬁdence interval for β. c Applet Exercise Refer to part (b). If α = c = 2.57 and a sample of size n = 10 yields y = 11.36, give a 95% conﬁdence interval for β. [Use the applet Gamma Probabilities and Quantiles to obtain appropriate quantiles for the pivotal quantity that you obtained in part (b).] a

8.6 Large-Sample Conﬁdence Intervals In Section 8.3, we presented some unbiased point estimators for the parameters µ, p, µ1 − µ2 , and p1 − p2 . As we indicated in that section, for large samples all these point estimators have approximately normal sampling distributions with standard errors as given in Table 8.1. That is, under the conditions of Section 8.3, if the target parameter θ is µ, p, µ1 − µ2 , or p1 − p2 , then for large samples, θˆ − θ Z= σθˆ possesses approximately a standard normal distribution. Consequently, Z = (θˆ − θ )/σθˆ forms (at least approximately) a pivotal quantity, and the pivotal method can be employed to develop conﬁdence intervals for the target parameter θ. E X A M PL E 8.6 Solution

Let θˆ be a statistic that is normally distributed with mean θ and standard error σθˆ . Find a conﬁdence interval for θ that possesses a conﬁdence coefﬁcient equal to (1 − α). The quantity θˆ − θ σθˆ has a standard normal distribution. Now select two values in the tails of this distribution, z α/2 and −z α/2 , such that (see Figure 8.7) Z=

P(−z α/2 ≤ Z ≤ z α/2 ) = 1 − α.

412

Chapter 8

Estimation

F I G U R E 8.7 Location of zα/2 and −zα/2

␣ 兾2

␣ 兾2 1 –␣

– z ␣ 兾2

z ␣ 兾2

Substituting for Z in the probability statement, we have θˆ − θ ≤ z α/2 = 1 − α. P −z α/2 ≤ σθˆ Multiplying by σθˆ , we obtain P(−z α/2 σθˆ ≤ θˆ − θ ≤ z α/2 σθˆ ) = 1 − α and subtracting θˆ from each term of the inequality, we get P(−θˆ − z α/2 σθˆ ≤ −θ ≤ −θˆ + z α/2 σθˆ ) = 1 − α. Finally, multiplying each term by −1 and, consequently, changing the direction of the inequalities, we have P(θˆ − z α/2 σθˆ ≤ θ ≤ θˆ + z α/2 σθˆ ) = 1 − α. Thus, the endpoints for a 100(1 − α)% conﬁdence interval for θ are given by θˆ L = θˆ − z α/2 σθˆ

and θˆ U = θˆ + z α/2 σθˆ .

By analogous arguments, we can determine that 100(1−α)% one-sided conﬁdence limits, often called upper and lower bounds, respectively, are given by 100(1 − α)% lower bound for θ = θˆ − z α σθ, ˆ 100(1 − α)% upper bound for θ = θˆ + z α σθˆ . Suppose that we compute both a 100(1 − α)% lower bound and a 100(1 − α)% upper bound for θ. We then decide to use both of these bounds to form a conﬁdence interval for θ. What will be the conﬁdence coefﬁcient of this interval? A quick look at the preceding conﬁrms that combining lower and upper bounds, each with conﬁdence coefﬁcient 1 − α, yields a two-sided interval with conﬁdence coefﬁcient 1 − 2α. Under the conditions described in Section 8.3, the results given earlier in this section can be used to ﬁnd large-sample conﬁdence intervals (one-sided or two-sided) for µ, p, (µ1 − µ2 ), and ( p1 − p2 ). The following examples illustrate applications of the general method developed in Example 8.6. E X A M PL E 8.7

The shopping times of n = 64 randomly selected customers at a local supermarket were recorded. The average and variance of the 64 shopping times were 33 minutes and 256 minutes2 , respectively. Estimate µ, the true average shopping time per customer, with a conﬁdence coefﬁcient of 1 − α = .90.

8.6

Solution

Large-Sample Conﬁdence Intervals

413

In this case, we are interested in the parameter θ = µ. Thus, θˆ = y = 33 and s 2 = 256 for a sample of n = 64 shopping times. The population variance σ 2 is unknown, so (as in Section 8.3), we use s 2 as its estimated value. The conﬁdence interval θˆ ± z α/2 σθˆ has the form

σ s y ± z α/2 √ ≈ y ± z α/2 √ . n n From Table 4, Appendix 3, z α/2 = z .05 = 1.645; hence, the conﬁdence limits are given by s 16 y − z α/2 √ = 29.71, = 33 − 1.645 8 n s 16 y + z α/2 √ = 36.29. = 33 + 1.645 8 n Thus, our conﬁdence interval for µ is (29.71, 36.29). In √ repeated sampling, approximately 90% of all intervals of the form Y ± 1.645(S/ n) include µ, the true mean shopping time per customer. Although we do not know whether the particular interval (29.71, 36.29) contains µ, the procedure that generated it yields intervals that do capture the true mean in approximately 95% of all instances where the procedure is used.

E X A M PL E 8.8

Solution

Two brands of refrigerators, denoted A and B, are each guaranteed for 1 year. In a random sample of 50 refrigerators of brand A, 12 were observed to fail before the guarantee period ended. An independent random sample of 60 brand B refrigerators also revealed 12 failures during the guarantee period. Estimate the true difference ( p1 − p2 ) between proportions of failures during the guarantee period, with conﬁdence coefﬁcient approximately .98. The conﬁdence interval θˆ ± z α/2 σθˆ now has the form

1

p1 q 1 p2 q 2 + . n1 n2 Because p1 , q1 , p2 , and q2 are unknown, the exact value of σθˆ cannot be evaluated. But as indicated in Section 8.3, we can get a good approximation for σθˆ by substituting pˆ 1 , qˆ 1 = 1 − pˆ 1 , pˆ 2 , and qˆ 2 = 1 − pˆ 2 for p1 , q1 , p2 , and q2 , respectively. For this example, pˆ 1 = .24, qˆ 1 = .76, pˆ 2 = .20, qˆ 2 = .80, and z .01 = 2.33. The desired 98% conﬁdence interval is 1 (.24)(.76) (.20)(.80) + (.24 − .20) ± 2.33 50 60 .04 ± .1851 or [−.1451, .2251]. ( pˆ 1 − pˆ 2 ) ± z α/2

414

Chapter 8

Estimation

Notice that this conﬁdence interval contains zero. Thus, a zero value for the difference in proportions ( p1 − p2 ) is “believable” (at approximately the 98% conﬁdence level) on the basis of the observed data. However, the interval also includes the value .1. Thus, .1 represents another value of ( p1 − p2 ) that is “believable” on the basis of the data that we have analyzed.

We close this section with an empirical investigation of the performance of the large-sample interval estimation procedure for a single population proportion p, based on Y , the number of successes observed during n trials experiment. In √ in a binomial √ this case, θ = p; θˆ√= pˆ = Y /n and σθˆ = σ pˆ = p(1 − p)/n ≈ pˆ (1 − pˆ )/n. (As in Section 8.3, pˆ (1 − pˆ )/n provides a good approximation for σ pˆ .) The appropriate conﬁdence limits then are 1 1 ˆ ˆ ˆ ˆ p (1 − p ) p (1 − p ) θˆ L = pˆ − z α/2 and θˆU = pˆ + z α/2 . n n Figure 8.8 shows the results of 24 independent binomial experiments, each based on 35 trials when the true value of p = 0.5. For each of the experiments, we calculated the number of successes y, the value of pˆ = √ y/35, and the corresponding 95% conﬁdence interval, using the formula pˆ ± 1.96 pˆ (1 − pˆ )/35. (Notice that z .025 = 1.96.) In the ﬁrst we observed y = 18, pˆ = 18/35 = 0.5143, and √ √ binomial experiment, σ pˆ ≈ pˆ (1 − pˆ )/n = (.5143)(.4857)/35 = 0.0845. So, the interval obtained in the ﬁrst experiment is .5143 ± 1.96(0.0845) or (0.3487, 0.6799). The estimate for p from the ﬁrst experiment is shown by the lowest large dot in Figure 8.8, and the resulting conﬁdence interval is given by the horizontal line through that dot. The vertical line indicates the true value of p, 0.5 in this case. Notice that the interval F I G U R E 8.8 Twenty-four realized 95% conﬁdence intervals for a population proportion

True Probability 0.50

0.00

0.25

0.50 Estimated Probability

0.75

1.00

Exercises

415

obtained in the ﬁrst trial (of size 35) actually contains the true value of the population proportion p. The remaining 23 conﬁdence intervals contained in this small simulation are given by the rest of the horizontal lines in Figure 8.8. Notice that each individual interval either contains the the true value of p or it does not. However, the true value of p is contained in 23 out of the 24 (95.8%) of intervals observed. If the same procedure was used many times, each individual interval would either contain or fail to contain the true value of p, but the percentage of all intervals that capture p would be very close to 95%. You are “95% conﬁdent” that the interval contains the parameter because the interval was obtained by using a procedure that generates intervals that do contain the parameter approximately 95% of the times the procedure is used. The applet ConﬁdenceIntervalP (accessible at www.thomsonedu.com/statistics/ wackerly) was used to produce Figure 8.8. What happens if different values of n or different conﬁdence coefﬁcients are used? Do we obtain similar results if the true value of p is something other than 0.5? Several of the following exercises will allow you to use the applet to answer questions like these. In this section, we have used the pivotal method to derive large-sample conﬁdence intervals for the parameters µ, p, µ1 − µ2 , and p1 − p2 under the conditions of Section 8.3. The key formula is θˆ ± z α/2 σθˆ , where the values of θˆ and σθˆ are as given in Table 8.1. When θ = µ is the target parameter, then θˆ = Y and σθˆ2 = σ 2 /n, where σ 2 is the population variance. If the true value of σ 2 is known, this value should be used in calculating the conﬁdence interval. If σ 2 is not known and n is large, there is no serious loss of accuracy if s 2 is substituted for σ 2 in the formula for the conﬁdence interval. Similarly, if σ12 and σ22 are unknown and both n 1 and n 2 are large, s12 and s22 can be substituted for these values in the formula for a large-sample conﬁdence interval for√θ = µ1 − µ2 . When θ = p is the target parameter, then θˆ = pˆ and σ pˆ = pq/n. Because p is the unknown target parameter, σ pˆ cannot be evaluated. If n is large and we substitute pˆ for p (and qˆ = 1 − pˆ for q) in the formula for σ pˆ , however, the resulting conﬁdence interval will have approximately the stated conﬁdence coefﬁcient. For large n 1 and n 2 , similar statements hold when pˆ 1 and pˆ 2 are used to estimate p1 and p2 , respectively, in the formula for σ pˆ21 − pˆ 2 . The theoretical justiﬁcation for these substitutions will be provided in Section 9.3.

Exercises 8.50

Refer to Example 8.8. In this example, p1 and p2 were used to denote the proportions of refrigerators of brands A and B, respectively, that failed during the guarantee periods. a b

At the approximate 98% conﬁdence level, what is the largest “believable value” for the difference in the proportions of failures for refrigerators of brands A and B? At the approximate 98% conﬁdence level, what is the smallest “believable value” for the difference in the proportions of failures for refrigerators of brands A and B?

416

Chapter 8

Estimation

c If p1 − p2 actually equals 0.2251, which brand has the larger proportion of failures during the warranty period? How much larger? d If p1 − p2 actually equals −0.1451, which brand has the larger proportion of failures during the warranty period larger? How much larger? e As observed in Example 8.8, zero is a believable value of the difference. Would you conclude that there is evidence of a difference in the proportions of failures (within the warranty period) for the two brands of refrigerators? Why?

8.51

Applet Exercise What happens if we attempt to use the applet ConﬁdenceIntervalP (accessible at www.thomsonedu.com/statistics/wackerly) to reproduce the results presented in Figure 8.8? Access the applet. Don’t change the value of p from .50 or the conﬁdence coefﬁcient from .95, but use the “Sample Size” button to change the sample size to n = 35. Click the button “One Sample” a single time. In the top left portion of the display, the sample values are depicted by a set of 35 0s and 1s, and the value of the estimate for p and the resulting 95% conﬁdence interval are given below the sample values. a What is the value of pˆ that you obtained? Is it the same as the ﬁrst value obtained, 0.5143, when Figure 8.8 was generated? Does this surprise you? Why? b Use the value of the estimate that you obtained and the formula for a 95% conﬁdence interval to verify that the conﬁdence interval given on the display is correctly calculated. c Does the interval that you obtained contain the true value of p? d What is the length of the conﬁdence interval that you obtained? Is it exactly the same as the length of ﬁrst interval, (.3487, .6799), obtained when Figure 8.8 was generated? Why? e Click the button “One Sample” again. Is this interval different than the one previously generated? Click the button “One Sample” three more times. How many distinctly different intervals appear among the ﬁrst 5 intervals generated? How many of the intervals contain .5? f Click the button “One Sample” until you have obtained 24 intervals. What percentage of the intervals contain the true value of p = .5? Is the percentage close to the value that you expected?

8.52

Applet Exercise Refer to Exercise 8.51. Don’t change the value of p from .50 or the conﬁdence coefﬁcient from .95, but use the button “Sample Size” to change the sample size to n = 50. Click the button “One Sample” a single time. a

How long is the resulting conﬁdence interval? How does the length of this interval compare to the one that you obtained in Exercise 8.51(d)? Why are the lengths of the intervals different? b Click the button “25 Samples.” Is the percentage of intervals that contain the true value of p close to what you expected? c Click the button “100 Samples.” Is the percentage of intervals that contain the true value of p close to what you expected? d If you were to click the button “100 Samples” several times and calculate the percentage of all of the intervals that contain the true value of p, what percentage of intervals do you expect to capture p?

8.53

Applet Exercise Refer to Exercises 8.51 and 8.52. Change the value of p to .25 (put the cursor on the vertical line and drag it to the left until 0.25 appears as the true probability). Change the sample size to n = 75 and the conﬁdence coefﬁcient to .90.

Exercises

417

a Click the button “One Sample” a single time. i ii b

What is the length of the resulting interval? Is the interval longer or shorter than that obtained in Exercise 8.51(d)? Give three reasons that the interval you obtained in part (i) is shorter than the interval obtained in Exercise 8.51(d).

Click the button “100 Samples” a few times. Each click will produce 100 intervals and provide you with the number and proportion of those 100 intervals that contain the true value of p. After each click, write down the number of intervals that captured p = .25. i

How many intervals did you generate? How many of the generated intervals captured the true value of p? ii What percentage of all the generated intervals captured p?

8.54

Applet Exercise Refer to Exercises 8.51–8.53. Change the value of p to .90. Change the sample size to n = 10 and the conﬁdence coefﬁcient to 0.95. Click the button “100 Samples” a few times. After each click, write down the number of intervals that captured p = .90. a When the simulation produced ten successes in ten trials, what is the resulting realized 95% conﬁdence interval for p? What is the length of the interval? Why? How is this depicted on the display? b How many intervals did you generate? How many of the generated intervals captured the true value of p? c What percentage of all of the generated intervals captured p? d Does the result of part (c) surprise you? e Does the result in part (c) invalidate the large-sample conﬁdence interval procedures presented in this section? Why?

8.55

Applet Exercise Refer to Exercises 8.51–8.54. Change the value of p to .90. Change the sample size to n = 100 and the conﬁdence coefﬁcient to .95. Click the button “100 Samples” a few times. After each click, write down the number of intervals that captured p = .90 and answer the questions posed in Exercise 8.54, parts (b)–(e).

8.56

Is America’s romance with movies on the wane? In a Gallup Poll5 of n = 800 randomly chosen adults, 45% indicated that movies were getting better whereas 43% indicated that movies were getting worse. a b

Find a 98% conﬁdence interval for p, the overall proportion of adults who say that movies are getting better. Does the interval include the value p = .50? Do you think that a majority of adults say that movies are getting better?

8.57

Refer to Exercise 8.29. According to the result given there, 51% of the n = 1001 adults polled in November 2003 claimed to be baseball fans. Construct a 99% conﬁdence interval for the proportion of adults who professed to be baseball fans in November 2003 (after the World Series). Interpret this interval.

8.58

The administrators for a hospital wished to estimate the average number of days required for inpatient treatment of patients between the ages of 25 and 34. A random sample of 500 hospital 5. Source: “Movie Mania Ebbing,” Gallup Poll of 800 adults, http://www.usatoday.com/snapshot/news/ 2001-06-14-moviemania.htm., 16–18 March 2001.

418

Chapter 8

Estimation

patients between these ages produced a mean and standard deviation equal to 5.4 and 3.1 days, respectively. Construct a 95% conﬁdence interval for the mean length of stay for the population of patients from which the sample was drawn.

8.59

When it comes to advertising, “’tweens” are not ready for the hard-line messages that advertisers often use to reach teenagers. The Geppeto Group study6 found that 78% of ’tweens understand and enjoy ads that are silly in nature. Suppose that the study involved n = 1030 ’tweens. a Construct a 90% conﬁdence interval for the proportion of ’tweens who understand and enjoy ads that are silly in nature. b Do you think that “more than 75%” of all ’tweens enjoy ads that are silly in nature? Why?

8.60

What is the normal body temperature for healthy humans? A random sample of 130 healthy human body temperatures provided by Allen Shoemaker7 yielded 98.25 degrees and standard deviation 0.73 degrees. a b

Give a 99% conﬁdence interval for the average body temperature of healthy people. Does the conﬁdence interval obtained in part (a) contain the value 98.6 degrees, the accepted average temperature cited by physicians and others? What conclusions can you draw?

8.61

A small amount of the trace element selenium, from 50 to 200 micrograms (µg) per day, is considered essential to good health. Suppose that independent random samples of n 1 = n 2 = 30 adults were selected from two regions of the United States, and a day’s intake of selenium, from both liquids and solids, was recorded for each person. The mean and standard deviation of the selenium daily intakes for the 30 adults from region 1 were y 1 = 167.1 µg and s1 = 24.3 µg, respectively. The corresponding statistics for the 30 adults from region 2 were y 2 = 140.9 µg and s2 = 17.6 µg. Find a 95% conﬁdence interval for the difference in the mean selenium intake for the two regions.

8.62

The following statistics are the result of an experiment conducted by P. I. Ward to investigate a theory concerning the molting behavior of the male Gammarus pulex, a small crustacean.8 If a male needs to molt while paired with a female, he must release her, and so loses her. The theory is that the male G. pulex is able to postpone molting, thereby reducing the possibility of losing his mate. Ward randomly assigned 100 pairs of males and females to two groups of 50 each. Pairs in the ﬁrst group were maintained together (normal); those in the second group were separated (split). The length of time to molt was recorded for both males and females, and the means, standard deviations, and sample sizes are shown in the accompanying table. (The number of crustaceans in each of the four samples is less than 50 because some in each group did not survive until molting time.) Time to Molt (days) Mean s n Males Normal Split Females Normal Split

24.8 21.3

7.1 8.1

34 41

8.6 11.6

4.8 5.6

45 48

6. Source: “Caught in the Middle,” American Demographics, July 2001, pp. 14–15. 7. Source: Allen L. Shoemaker, “What’s Normal? Temperature, Gender and Heart Rate,” Journal of Statistics Education (1996). 8. Source: “Gammarus pulex Control Their Moult Timing to Secure Mates,” Animal Behaviour 32 (1984).

Exercises

419

a Find a 99% conﬁdence interval for the difference in mean molt time for “normal” males versus those “split” from their mates. b Interpret the interval.

8.63

Most Americans love participating in or at least watching sporting events. Some feel that sports have more than just entertainment value. In a survey of 1000 adults, conducted by KRC Research & Consulting , 78% felt that spectator sports have a positive effect on society.9 a Find a 95% conﬁdence interval for the percentage of the public that feel that sports have a positive effect on society. b The poll reported a margin of error of “plus or minus 3.1%.” Does this agree with your answer to part (a)? What value of p produces the margin of error given by the poll?

8.64

In a CNN/USA Today/Gallup Poll, 1000 Americans were asked how well the term patriotic described themselves.10 Some results from the poll are contained in the following summary table. Age Group All 18–34 60+ Very well Somewhat well Not Very well Not well at all

.53 .31 .10 .06

.35 .41 .16 .08

.77 .17 .04 .02

a If the 18–34 and 60+ age groups consisted of 340 and 150 individuals, respectively, ﬁnd a 98% conﬁdence interval for the difference in proportions of those in these age groups who agreed that patriotic described them very well. b Based on the interval that you obtained in part (a), do you think that the difference in proportions of those who view themselves as patriotic is as large as 0.6? Explain.

8.65

For a comparison of the rates of defectives produced by two assembly lines, independent random samples of 100 items were selected from each line. Line A yielded 18 defectives in the sample, and line B yielded 12 defectives. a Find a 98% conﬁdence interval for the true difference in proportions of defectives for the two lines. b Is there evidence here to suggest that one line produces a higher proportion of defectives than the other?

8.66

Historically, biology has been taught through lectures, and assessment of learning was accomplished by testing vocabulary and memorized facts. A teacher-devoloped new curriculum, Biology: A Community Content (BACC), is standards based, activity oriented, and inquiry centered. Students taught using the historical and new methods were tested in the traditional sense on biology concepts that featured biological knowledge and process skills. The results of a test on biology concepts were published in The American Biology Teacher and are given in the following table.11

9. Source: Mike Tharp, “Ready, Set, Go. Why We Love Our Games—Sports Crazy,” U.S. News & World Report, 15 July 1997, p. 31. 10. Source: Adapted from “I’m a Yankee Doodle Dandy,” Knowledge Networks: 2000, American Demographics, July 2001, p. 9. 11. Source: William Leonard, Barbara Speziale, and John Pernick, “Performance Assessment of a Standards-Based High School Biology Curriculum,” The American Biology Teacher 63(5) (2001): 310–316.

420

Chapter 8

Estimation

Pretest: all BACC classes Pretest: all traditional Posttest: all BACC classes Posttest: all traditional

Mean

Sample Size

Standard Deviation

13.38 14.06 18.50 16.50

372 368 365 298

5.59 5.45 8.03 6.96

a Give a 90% conﬁdence interval for the mean posttest score for all BACC students. b Find a 95% conﬁdence interval for the difference in the mean posttest scores for BACC and traditionally taught students. c Does the conﬁdence interval in part (b) provide evidence that there is a difference in the mean posttest scores for BACC and traditionally taught students? Explain.

8.67

One suggested method for solving the electric-power shortage in a region involves constructing ﬂoating nuclear power plants a few miles offshore in the ocean. Concern about the possibility of a ship collision with the ﬂoating (but anchored) plant has raised the need for an estimate of the density of ship trafﬁc in the area. The number of ships passing within 10 miles of the proposed power-plant location per day, recorded for n = 60 days during July and August, possessed a sample mean and variance of y = 7.2 and s 2 = 8.8. a

Find a 95% conﬁdence interval for the mean number of ships passing within 10 miles of the proposed power-plant location during a 1-day time period. b The density of ship trafﬁc was expected to decrease during the winter months. A sample of n = 90 daily recordings of ship sightings for December, January, and February yielded a mean and variance of y = 4.7 and s 2 = 4.9. Find a 90% conﬁdence interval for the difference in mean density of ship trafﬁc between the summer and winter months. c What is the population associated with your estimate in part (b)? What could be wrong with the sampling procedure for parts (a) and (b)?

*8.68

Suppose that Y1 , Y2 , Y3 , and Y4 have a multinomial distribution with n trials and probabilities p1 , p2 , p3 , and p4 for the four cells. Just as in the binomial case, any linear combination of Y1 , Y2 , Y3 , and Y4 will be approximately normally distributed for large n. a Determine the variance of Y1 −Y2 . [Hint: Recall that the random variables Yi are dependent.] b A study of attitudes among residents of Florida with regard to policies for handling nuisance alligators in urban areas showed the following. Among 500 people sampled and presented with four management choices, 6% said the alligators should be completely protected, 16% said they should be destroyed by wildlife ofﬁcers, 52% said they should be relocated live, and 26% said that a regulated commercial harvest should be allowed. Estimate the difference between the population proportion favoring complete protection and the population proportion favoring destruction by wildlife ofﬁcers. Use a conﬁdence coefﬁcient of .95.

*8.69

The Journal of Communication, Winter 1978, reported on a study of viewing violence on TV. Samples from populations with low viewing rates (10–19 programs per week) and high viewing rates (40–49 programs per week) were divided into two age groups, and Y , the number of persons watching a high number of violent programs, was recorded. The data for two age groups are shown in the accompanying table, with n i denoting the sample size for each cell. If Y1 , Y2 , Y3 , and Y4 have independent binomial distributions with parameters p1 , p2 , p3 , and p4 , respectively, ﬁnd a 95% conﬁdence interval for ( p3 − p1 ) − ( p4 − p2 ). This function of the pi values represents a comparison between the change in viewing habits for young adults and the corresponding change for older adults, as we move from those with low viewing rates to

8.7

Selecting the Sample Size

421

those with high viewing rates. (The data suggest that the rate of viewing violence may increase with young adults but decrease with older adults.)

Viewing Rate Low High

16–34 y1 = 20 y3 = 18

Age Group 55 and Over

n 1 = 31 n 3 = 26

y2 = 13 y4 = 7

n 2 = 30 n 4 = 28

8.7 Selecting the Sample Size The design of an experiment is essentially a plan for purchasing a quantity of information. Like any other commodity, information may be acquired at varying prices depending on the manner in which the data are obtained. Some measurements contain a large amount of information about the parameter of interest; others may contain little or none. Research, scientiﬁc or otherwise, is done in order to obtain information. Obviously, we should seek to obtain information at minimum cost. The sampling procedure—or experimental design, as it is usually called—affects the quantity of information per measurement. This, together with the sample size n controls the total amount of relevant information in a sample. At this point in our study, we will be concerned with the simplest sampling situation: random sampling from a relatively large population. We ﬁrst devote our attention to selection of the sample size n. A researcher makes little progress in planning an experiment before encountering the problem of selecting the sample size. Indeed, one of the most frequent questions asked of the statistician is, How many measurements should be included in the sample? Unfortunately, the statistician cannot answer this question without knowing how much information the experimenter wishes to obtain. Referring speciﬁcally to estimation, we would like to know how accurate the experimenter wishes the estimate to be. The experimenter can indicate the desired accuracy by specifying a bound on the error of estimation. For instance, suppose that we wish to estimate the average daily yield µ of a chemical and we wish the error of estimation to be less than 5 tons with probability .95. Because approximately 95% of the sample means will lie within 2σY of µ in repeated sampling, we are asking that 2σY equal 5 tons (see Figure 8.9). Then 2σ √ =5 n

and

n=

4σ 2 . 25

We cannot obtain an exact numerical value for n unless the population standard deviation σ is known. This is exactly what we would expect because the variability associated with the estimator Y depends on the variability exhibited in the population from which the sample will be drawn. Lacking an exact value for σ , we use the best approximation available such as an estimate s obtained from a previous sample or knowledge of the range of the measurements in the population. Because the range is approximately equal to 4σ (recall the empirical rule), one-fourth of the range provides an approximate value

422

Chapter 8

Estimation

F I G U R E 8.9 The approximate distribution of Y for large samples

2 Y

y 2 Y

of σ . For our example, suppose that the range of the daily yields is known to be approximately 84 tons. Then σ ≈ 84/4 = 21 and 4σ 2 (4)(21)2 ≈ = 70.56 25 25 = 71.

n=

Using a sample size n = 71, we can be reasonably certain (with conﬁdence coefﬁcient approximately equal to .95) that our estimate will lie within 5 tons of the true average daily yield. Actually, we would expect the error of estimation to be much less than 5 tons. According to the empirical rule, the probability is approximately equal to .68 that the error of estimation will be less than σY = 2.5 tons. The probabilities .95 and .68 used in these statements are inexact because σ was approximated. Although this method of choosing the sample size is only approximate for a speciﬁed accuracy of estimation, it is the best available and is certainly better than selecting the sample size intuitively. The method of choosing the sample sizes for all the large-sample estimation procedures outlined in Table 8.1 is analogous to that just described. The experimenter must specify a desired bound on the error of estimation and an associated conﬁdence level 1 − α. For example, if the parameter is θ and the desired bound is B, we equate z α/2 σθˆ = B, where, as in Section 8.6, α . 2 We illustrate the use of this method in the following examples. P(Z > z α/2 ) =

E X A M PL E 8.9

The reaction of an individual to a stimulus in a psychological experiment may take one of two forms, A or B. If an experimenter wishes to estimate the probability p that a person will react in manner A, how many people must be included in the experiment? Assume that the experimenter will be satisﬁed if the error of estimation is less than .04 with probability equal to .90. Assume also that he expects p to lie somewhere in the neighborhood of .6.

Solution

Because we have speciﬁed that 1 − α = .90, α must equal .10 and α/2 = .05. The z value corresponding to an area equal to .05 in the upper tail of the standard normal

8.7

Selecting the Sample Size

423

distribution is z α/2 = z .05 = 1.645. We then require that 1 pq 1.645σ pˆ = .04, or 1.645 = .04. n Because the standard error of pˆ depends on p, which is unknown, we could use the guessed value of p = .6 provided by the experimenter as an approximate value for n. Then 1 (.6)(.4) = .04 1.645 n n = 406. In this example, we assumed that p ≈ .60. How would we proceed if we had no idea about the true value of p? In Exercise 7.76(a), we established that the maximum value for the variance of pˆ = Y /n occurs when p = .5. If we did not know that p ≈ .6, we would use p = .5, which would yield the maximum possible value for n : n = 423. No matter what the true value for p, n = 423 is large enough to provide an estimate that is within B = .04 of p with probability .90.

EXAMPLE 8.10

An experimenter wishes to compare the effectiveness of two methods of training industrial employees to perform an assembly operation. The selected employees are to be divided into two groups of equal size, the ﬁrst receiving training method 1 and the second receiving training method 2. After training, each employee will perform the assembly operation, and the length of assembly time will be recorded. The experimenter expects the measurements for both groups to have a range of approximately 8 minutes. If the estimate of the difference in mean assembly times is to be correct to within 1 minute with probability .95, how many workers must be included in each training group?

Solution

The manufacturer speciﬁed 1 − α = .95. Thus, α = .05 and z α/2 = z .025 = 1.96. Equating 1.96σ(Y 1 −Y 2 ) to 1 minute, we obtain 0 σ2 σ2 1.96 1 + 2 = 1. n1 n2 Alternatively, because we desire n 1 to equal n 2 , we may let n 1 = n 2 = n and obtain the equation 0 σ2 σ2 1.96 1 + 2 = 1. n n As noted earlier, the variability of each method of assembly is approximately the same; hence, σ12 = σ22 = σ 2 . Because the range, 8 minutes, is approximately equal to 4σ , we have 4σ ≈ 8,

or equivalently, σ ≈ 2.

424

Chapter 8

Estimation

Substituting this value for σ1 and σ2 in the earlier equation, we obtain 1 (2)2 (2)2 + = 1. 1.96 n n Solving, we obtain n = 30.73. Therefore, each group should contain n = 31 members.

Exercises 8.70

Let Y be a binomial random variable with parameter p. Find the sample size necessary to estimate p to within .05 with probability .95 in the following situations: a If p is thought to be approximately .9 b If no information about p is known (use p = .5 in estimating the variance of pˆ ).

8.71

A state wildlife service wants to estimate the mean number of days that each licensed hunter actually hunts during a given season, with a bound on the error of estimation equal to 2 hunting days. If data collected in earlier surveys have shown σ to be approximately equal to 10, how many hunters must be included in the survey?

8.72

Telephone pollsters often interview between 1000 and 1500 individuals regarding their opinions on various issues. Does the performance of colleges’ athletic teams have a positive impact on the public’s perception of the prestige of the institutions? A new survey is to be undertaken to see if there is a difference between the opinions of men and women on this issue. a If 1000 men and 1000 women are to be interviewed, how accurately could you estimate the difference in the proportions who think that the performance of their athletics teams has a positive impact on the perceived prestige of the institutions? Find a bound on the error of estimation. b Suppose that you were designing the survey and wished to estimate the difference in a pair of proportions, correct to within .02, with probability .9. How many interviewees should be included in each sample?

8.73

Refer to Exercise 8.59. How many ’tweens should have been interviewed in order to estimate the proportion of ’tweens who understand and enjoy ads that are silly in nature, correct to within .02, with probability .99? Use the proportion from the previous sample in approximating the standard error of the estimate.

8.74

Suppose that you want to estimate the mean pH of rainfalls in an area that suffers from heavy pollution due to the discharge of smoke from a power plant. Assume that σ is in the neighborhood of .5 pH and that you want your estimate to lie within .1 of µ with probability near .95. Approximately how many rainfalls must be included in your sample (one pH reading per rainfall)? Would it be valid to select all of your water specimens from a single rainfall? Explain.

8.75

Refer to Exercise 8.74. Suppose that you wish to estimate the difference between the mean acidity for rainfalls at two different locations, one in a relatively unpolluted area along the ocean and the other in an area subject to heavy air pollution. If you wish your estimate to be correct to the nearest .1 pH with probability near .90, approximately how many rainfalls (pH values) must you include in each sample? (Assume that the variance of the pH measurements is approximately .25 at both locations and that the samples are to be of equal size.)

8.8

Small-Sample Conﬁdence Intervals for µ and µ1 − µ2

425

8.76

Refer to the comparison of the daily adult intake of selenium in two different regions of the United States, in Exercise 8.61. Suppose that you wish to estimate the difference in the mean daily intake between the two regions, correct to within 5 µg, with probability .90. If you plan to select an equal number of adults from the two regions (that is, if µ1 = µ2 ), how large should n 1 and n 2 be?

8.77

Refer to Exercise 8.28. If the researcher wants to estimate the difference in proportions to within .05 with 90% conﬁdence, how many graduates and nongraduates must be interviewed? (Assume that an equal number will be interviewed from each group.)

8.78

Refer to Exercise 8.65. How many items should be sampled from each line if a 95% conﬁdence interval for the true difference in proportions is to have width .2? Assume that samples of equal size will be taken from each line.

8.79

Refer to Exercise 8.66. a Another similar study is to be undertaken to compare the mean posttest scores for BACC and traditionally taught high school biology students. The objective is to produce a 99% conﬁdence interval for the true difference in the mean posttest scores. If we need to sample an equal number of BACC and traditionally taught students and want the width of the conﬁdence interval to be 1.0, how many observations should be included in each group? b Repeat the calculations from part (a) if we are interested in comparing mean pretest scores. c Suppose that the researcher wants to construct 99% conﬁdence intervals to compare both pretest and posttest scores for BACC and traditionally taught biology students. If her objective is that both intervals have widths no larger than 1 unit, what sample sizes should be used?

8.8 Small-Sample Conﬁdence Intervals for µ and µ1 − µ2 The conﬁdence intervals for a population mean µ that we discuss in this section are based on the assumption that the experimenter’s sample has been randomly selected from a normal population. The intervals are appropriate for samples of any size, and the conﬁdence coefﬁcients of the intervals are close to the speciﬁed values even when the population is not normal, as long as the departure from normality is not excessive. We rarely know the form of the population frequency distribution before we sample. Consequently, if an interval estimator is to be of any value, it must work reasonably well even when the population is not normal. “Working well” means that the conﬁdence coefﬁcient should not be affected by modest departures from normality. For most mound-shaped population distributions, experimental studies indicate that these conﬁdence intervals maintain conﬁdence coefﬁcients close to the nominal values used in their calculation. We assume that Y1 , Y2 , . . . , Yn represent a random sample selected from a normal population, and we let Y and S 2 represent the sample mean and sample variance, respectively. We would like to construct a conﬁdence interval for the population mean when V (Yi ) = σ 2 is unknown and the sample size is too small to permit us to to apply the large-sample techniques of the previous section. Under the assumptions

426

Chapter 8

Estimation

F I G U R E 8.10 Location of tα/2 and −tα/2 ␣ 兾2

␣ 兾2 1 –␣

– t ␣ 兾2

t ␣ 兾2

just stated, Theorems 7.1 and 7.3 and Deﬁnition 7.2 imply that T =

Y −µ √ S/ n

has a t distribution with (n − 1) df. The quantity T serves as the pivotal quantity that we will use to form a conﬁdence interval for µ. From Table 5, Appendix 3, we can ﬁnd values tα/2 and −tα/2 (see Figure 8.10) so that P(−tα/2 ≤ T ≤ tα/2 ) = 1 − α. The t distribution has a density function very much like the standard normal density except that the tails are thicker (as illustrated in Figure 7.3). Recall that the values of tα/2 depend on the degrees of freedom (n − 1) as well as on the conﬁdence coefﬁcient (1 − α). The conﬁdence interval for µ is developed by manipulating the inequalities in the probability statement in a manner analogous to that used in the derivation presented in Example 8.6. In this case, the resulting conﬁdence interval for µ is S Y ± tα/2 √ . n Under the preceding assumptions, we can also obtain 100(1 − α)% one-sided conﬁdence limits for µ. Notice that tα , given in Table 5, Appendix 3, is such that P(T ≤ tα ) = 1 − α. Substituting T into this expression and manipulating the resulting inequality, we obtain √ P[Y − tα (S/ n) ≤ µ] = 1 − α. √ Thus, Y −√tα (S/ n) is a 100(1 − α)% lower conﬁdence bound for µ. Analogously, Y +tα (S/ n) is a 100(1−α)% upper conﬁdence bound for µ. As in the large-sample case, if we determine both 100(1 − α)% lower and upper conﬁdence bounds for µ and use the respective bounds as endpoints for a conﬁdence interval, the resulting two-sided interval has conﬁdence coefﬁcient equal to 1 − 2α.

E X A M PL E 8.11

A manufacturer of gunpowder has developed a new powder, which was tested in eight shells. The resulting muzzle velocities, in feet per second, were as follows: 3005 2995

2925 3005

2935 2937

2965 2905

8.8

Small-Sample Conﬁdence Intervals for µ and µ1 − µ2

427

Find a 95% conﬁdence interval for the true average velocity µ for shells of this type. Assume that muzzle velocities are approximately normally distributed. Solution

If we assume that the velocities Yi are normally distributed, the conﬁdence interval for µ is S Y ± tα/2 √ , n where tα/2 is determined for n−1 df. For the given data, y = 2959 and s = 39.1. In this example, we have n − 1 = 7 df and, using Table 5, Appendix 3, tα/2 = t.025 = 2.365. Thus, we obtain 39.1 , or 2959 ± 32.7, 2959 ± 2.365 √ 8 as the observed conﬁdence interval for µ.

Suppose that we are interested in comparing the means of two normal populations, one with mean µ1 and variance σ12 and the other with mean µ2 and variance σ22 . If the samples are independent, conﬁdence intervals for µ1 − µ2 based on a t-distributed random variable can be constructed if we assume that the two populations have a common but unknown variance, σ12 = σ22 = σ 2 (unknown). If Y 1 and Y 2 are the respective sample means obtained from independent random samples from normal populations, the large-sample conﬁdence interval for (µ1 − µ2 ) is developed by using Z=

(Y 1 − Y 2 ) − (µ1 − µ2 ) 0 σ12 σ2 + 2 n1 n2

as a pivotal quantity. Because we assumed that the sampled populations are both normally distributed, Z has a standard normal distribution, and using the assumption σ12 = σ22 = σ 2 , the quantity Z may be rewritten as Z=

(Y 1 − Y 2 ) − (µ1 − µ2 ) . 1 1 1 σ + n1 n2

Because σ is unknown, we need to ﬁnd an estimator of the common variance σ 2 so that we can construct a quantity with a t distribution. Let Y11 , Y12 , . . . , Y1n 1 denote the random sample of size n 1 from the ﬁrst population and let Y21 , Y22 , . . . , Y2n 2 denote an independent random sample of size n 2 from the second population. Then Y1 =

n1 1 Y1i n 1 i=1

and

Y2 =

n2 1 Y2i . n 2 i=1

428

Chapter 8

Estimation

The usual unbiased estimator of the common variance σ 2 is obtained by pooling the sample data to obtain the pooled estimator S 2p : n 1 S 2p

i=1 (Y1i

=

n 2 − Y 1 )2 + i=1 (Y2i − Y 2 )2 (n 1 − 1)S12 + (n 2 − 1)S22 = , n1 + n2 − 2 n1 + n2 − 2

where Si2 is the sample variance from the ith sample, i = 1, 2. Notice that if n 1 = n 2 ,

n 2 , S 2p is the weighted average of S12 S 2p is simply the average of S12 and S22 . If n 1 = 2 and S2 , with larger weight given to the sample variance associated with the larger sample size. Further, W =

(n 1 + n 2 − 2)S 2p σ2

n 1 =

i=1 (Y1i − σ2

Y 1 )2

n 2 +

i=1 (Y2i − σ2

Y 2 )2

is the sum of two independent χ 2 -distributed random variables with (n 1 − 1) and (n 2 −1) df, respectively. Thus, W has a χ 2 distribution with ν = (n 1 −1) + (n 2 −1) = (n 1 + n 2 − 2) df. (See Theorems 7.2 and 7.3.) We now use the χ 2 -distributed variable W and the independent standard normal quantity Z deﬁned in the previous paragraph to form a pivotal quantity: 20 (Y 1 − Y 2 ) − (µ1 − µ2 ) (n 1 + n 2 − 2)S 2p Z T = / = 1 σ 2 (n 1 + n 2 − 2) W 1 1 σ + ν n1 n2 =

(Y 1 − Y 2 ) − (µ1 − µ2 ) , 1 1 1 Sp + n1 n2

a quantity that by construction has a t distribution with (n 1 + n 2 − 2) df. Proceeding as we did earlier in this section, we see that the conﬁdence interval for (µ1 − µ2 ) has the form 0 1 1 + , (Y 1 − Y 2 ) ± tα/2 S p n1 n2 where tα/2 is determined from the t distribution with (n 1 + n 2 − 2) df.

E X A M PL E 8.12

To reach maximum efﬁciency in performing an assembly operation in a manufacturing plant, new employees require approximately a 1-month training period. A new method of training was suggested, and a test was conducted to compare the new method with the standard procedure. Two groups of nine new employees each were trained for a period of 3 weeks, one group using the new method and the other following the standard training procedure. The length of time (in minutes)

8.8

Small-Sample Conﬁdence Intervals for µ and µ1 − µ2

429

Table 8.3 Data for Example 8.12

Procedure Standard New

Measurements 32 35

37 31

35 29

28 25

41 34

44 40

35 27

31 32

34 31

required for each employee to assemble the device was recorded at the end of the 3-week period. The resulting measurements are as shown in Table 8.3. Estimate the true mean difference (µ1 − µ2 ) with conﬁdence coefﬁcient .95. Assume that the assembly times are approximately normally distributed, that the variances of the assembly times are approximately equal for the two methods, and that the samples are independent. Solution

For the data in Table 8.3, with sample 1 denoting the standard procedure, we have y 1 = 35.22, 9

(y1i − y 1 )2 = 195.56,

i=1

s12 = 24.445,

y 2 = 31.56, 9

(y2i − y 2 )2 = 160.22,

i=1

s22 = 20.027.

Hence, s 2p =

195.56 + 160.22 8(24.445) + 8(20.027) = = 22.236 9+9−2 16

and

sp = 4.716.

Notice that, because n 1 = n 2 = 9, s 2p is the simple average of s12 and s12 . Also, t.025 = 2.120 for (n 1 + n 2 − 2) = 16 df. The observed conﬁdence interval is therefore 0 (y 1 − y 2 ) ± tα/2 s p

1 1 + n1 n2 1

(35.22 − 31.56) ± (2.120)(4.716)

1 1 + 9 9

3.66 ± 4.71. This conﬁdence interval can be written in the form [–1.05, 8.37]. The interval is fairly wide and includes both positive and negative values. If µ1 − µ2 is positive, µ1 > µ2 and the standard procedure has a larger expected assembly time than the new procedure. If µ1 − µ2 is really negative, the reverse is true. Because the interval contains both positive and negative values, neither training method can be said to produce a mean assembly time that differs from the other.

430

Chapter 8

Estimation

Summary of Small-Sample Conﬁ dence Intervals for Means of Normal Distributions with Unknown Variance(s) Parameter µ µ 1 − µ2

Conﬁdence Interval (ν = df) S Y ± tα/2 √ , ν = n − 1. n 1 1 1 (Y 1 − Y 2 ) ± tα/2 S p + , n1 n2 where ν = n 1 + n 2 − 2 and S 2p =

(n 1 − 1)S12 + (n 2 − 1)S22 n1 + n2 − 2

(requires that the samples are independent and the assumption that σ12 = σ22 ). As the sample size (or sizes) gets large, the number of degrees of freedom for the t distribution increases, and the t distribution can be approximated quite closely by the standard normal distribution. As a result, the small-sample conﬁdence intervals of this section are nearly indistinguishable from the large-sample conﬁdence intervals of Section 8.6 for large n (or large n 1 and n 2 ). The intervals are nearly equivalent when the degrees of freedom exceed 30. The conﬁdence intervals for a single mean and the difference in two means were developed under the assumptions that the populations of interest are normally distributed. There is considerable empirical evidence that these intervals maintain their nominal conﬁdence coefﬁcient as long as the populations sampled have roughly mound-shaped distributions. If n 1 ≈ n 2 , the intervals for µ1 − µ2 also maintain their nominal conﬁdence coefﬁcients as long as the population variances are roughly equal. The independence of the samples is the most crucial assumption in using the conﬁdence intervals developed in this section to compare two population means.

Exercises 8.80

Although there are many treatments for bulimia nervosa, some subjects fail to beneﬁt from treatment. In a study to determine which factors predict who will beneﬁt from treatment, Wendy Baell and E. H. Wertheim12 found that self-esteem was one of the important predictors. The mean and standard deviation of posttreatment self-esteem scores for n = 21 subjects were y = 26.6 and s = 7.4, respectively. Find a 95% conﬁdence interval for the true posttreatment self-esteem scores.

8.81

The carapace lengths of ten lobsters examined in a study of the infestation of the Thenus orientalis lobster by two types of barnacles, Octolasmis tridens and O. lowei, are given in the 12. Source: Wendy K. Baell and E. H. Wertheim, “Predictors of Outcome in the Treatment of Bulimia Nervosa,” British Journal of Clinical Psychology 31 (1992).

Exercises

431

following table. Find a 95% conﬁdence interval for the mean carapace length (in millimeters, mm) of T. orientalis lobsters caught in the seas in the vicinity of Singapore.13 Lobster Field Number Carapace Length (mm)

8.82

A061 A062 A066 A070 A067 A069 A064 A068 A065 A063 78

66

65

63

60

60

58

56

52

50

Scholastic Assessment Test (SAT) scores, which have fallen slowly since the inception of the test, have now begun to rise. Originally, a score of 500 was intended to be average. The mean scores for 2005 were approximately 508 for the verbal test and 520 for the mathematics test. A random sample of the test scores of 20 seniors from a large urban high school produced the means and standard deviations listed in the accompanying table: Verbal

Mathematics

505 57

495 69

Sample mean Sample standard deviation

a Find a 90% conﬁdence interval for the mean verbal SAT scores for high school seniors from the urban high school. b Does the interval that you found in part (a) include the value 508, the true mean verbal SAT score for 2005? What can you conclude? c Construct a 90% conﬁdence interval for the mean mathematics SAT score for the urban high school seniors. Does the interval include 520, the true mean mathematics score for 2005? What can you conclude?

8.83

Chronic anterior compartment syndrome is a condition characterized by exercise-induced pain in the lower leg. Swelling and impaired nerve and muscle function also accompany the pain, which is relieved by rest. Susan Beckham and her colleagues14 conducted an experiment involving ten healthy runners and ten healthy cyclists to determine if pressure measurements within the anterior muscle compartment differ between runners and cyclists. The data—compartment pressure, in millimeters of mercury—are summarized in the following table:

Condition Rest 80% maximal O2 consumption

Runners Mean s

Cyclists Mean s

14.5 12.2

11.1 11.5

3.92 3.49

3.98 4.95

a Construct a 95% conﬁdence interval for the difference in mean compartment pressures between runners and cyclists under the resting condition. b Construct a 90% conﬁdence interval for the difference in mean compartment pressures between runners and cyclists who exercise at 80% of maximal oxygen (O2 ) consumption. c Consider the intervals constructed in parts (a) and (b). How would you interpret the results that you obtained?

13. Source: W. B. Jeffries, H. K. Voris, and C. M. Yang, “Diversity and Distribution of the Pedunculate Barnacle Octolasmis Gray, 1825 Epizoic on the Scyllarid Lobster, Thenus orientalis (Lund 1793),” Crustaceana 46(3) (1984). 14. Source: S. J. Beckham, W. A. Grana, P. Buckley, J. E. Breasile, and P. L. Claypool, “A Comparison of Anterior Compartment Pressures in Competitive Runners and Cyclists,” American Journal of Sports Medicine 21(1) (1993).

432

Chapter 8

Estimation

8.84

Organic chemists often purify organic compounds by a method known as fractional crystallization. An experimenter wanted to prepare and purify 4.85 g of aniline. Ten 4.85-gram specimens of aniline were prepared and puriﬁed to produce acetanilide. The following dry yields were obtained: 3.85,

3.88,

3.90,

3.62,

3.72,

3.80,

3.85,

3.36,

4.01,

3.82

Construct a 95% conﬁdence interval for the mean number of grams of acetanilide that can be recovered from 4.85 grams of aniline.

8.85

Two new drugs were given to patients with hypertension. The ﬁrst drug lowered the blood pressure of 16 patients an average of 11 points, with a standard deviation of 6 points. The second drug lowered the blood pressure of 20 other patients an average of 12 points, with a standard deviation of 8 points. Determine a 95% conﬁdence interval for the difference in the mean reductions in blood pressure, assuming that the measurements are normally distributed with equal variances.

Text not available due to copyright restrictions

8.87

Refer to Exercise 8.86. a Construct a 90% conﬁdence interval for the difference in the mean price for light tuna packed in water and light tuna packed in oil. b Based on the interval obtained in part (a), do you think that the mean prices differ for light tuna packed in water and oil? Why?

8.88

The Environmental Protection Agency (EPA) has collected data on LC50 measurements (concentrations that kill 50% of test animals) for certain chemicals likely to be found in

Text not available due to copyright restrictions

Exercises

433

freshwater rivers and lakes. (See Exercise 7.13 for additional details.) For certain species of ﬁsh, the LC50 measurements (in parts per million) for DDT in 12 experiments were as follows: 16,

5,

21,

19,

10,

5,

8,

2,

7,

2,

4,

9

Estimate the true mean LC50 for DDT with conﬁdence coefﬁcient .90. Assume that the LC50 measurements have an approximately normal distribution.

8.89

Refer to Exercise 8.88. Another common insecticide, diazinon, yielded LC50 measurements in three experiments of 7.8, 1.6, and 1.3. a Estimate the mean LC50 for diazinon, with a 90% conﬁdence interval. b Estimate the difference between the mean LC50 for DDT and that for diazinon, with a 90% conﬁdence interval. What assumptions are necessary for the method that you used to be valid?

8.90

Do SAT scores for high school students differ depending on the students’ intended ﬁeld of study? Fifteen students who intended to major in engineering were compared with 15 students who intended to major in language and literature. Given in the accompanying table are the means and standard deviations of the scores on the verbal and mathematics portion of the SAT for the two groups of students:16 Verbal

Math

Engineering

y = 446

s = 42

y = 548

s = 57

Language/literature

y = 534

s = 45

y = 517

s = 52

a Construct a 95% conﬁdence interval for the difference in average verbal scores of students majoring in engineering and of those majoring in language/literature. b Construct a 95% conﬁdence interval for the difference in average math scores of students majoring in engineering and of those majoring in language/literature. c Interpret the results obtained in parts (a) and (b). d What assumptions are necessary for the methods used previously to be valid?

8.91

Seasonal ranges (in hectares) for alligators were monitored on a lake outside Gainesville, Florida, by biologists from the Florida Game and Fish Commission. Five alligators monitored in the spring showed ranges of 8.0, 12.1, 8.1, 18.2, and 31.7. Four different alligators monitored in the summer showed ranges of 102.0, 81.7, 54.7, and 50.7. Estimate the difference between mean spring and summer ranges, with a 95% conﬁdence interval. What assumptions did you make?

8.92

Solid copper produced by sintering (heating without melting) a powder under speciﬁed environmental conditions is then measured for porosity (the volume fraction due to voids) in a laboratory. A sample of n 1 = 4 independent porosity measurements have mean y 1 = .22 and variance s12 = .0010. A second laboratory repeats the same process on solid copper formed from an identical powder and gets n 2 = 5 independent porosity measurements with y 2 = .17 and s22 = .0020. Estimate the true difference between the population means (µ1 − µ2 ) for these two laboratories, with conﬁdence coefﬁcient .95.

*8.93

A factory operates with two machines of type A and one machine of type B. The weekly repair costs X for type A machines are normally distributed with mean µ1 and variance σ 2 . The weekly repair costs Y for machines of type B are also normally distributed but with mean µ2 16. Source: “SAT Scores by Intended Field of Study,” Riverside (Calif.) Press Enterprise, April 8, 1993.

434

Chapter 8

Estimation

and variance 3σ 2 . The expected repair cost per week for the factory is thus 2µ1 + µ2 . If you are given a random sample X 1 , X 2 , . . . , X n on costs of type A machines and an independent random sample Y1 , Y2 , . . . , Ym on costs for type B machines, show how you would construct a 95% conﬁdence interval for 2µ1 + µ2 a if σ 2 is known. b if σ 2 is not known.

8.94

Suppose that we obtain independent samples of sizes n 1 and n 2 from two normal populations with equal variances. Use the appropriate pivotal quantity from Section 8.8 to derive a 100(1 − α)% upper conﬁdence bound for µ1 − µ2 .

8.9 Conﬁdence Intervals for σ 2 The population variance σ 2 quantiﬁes the amount of variability in the population. Many times, the actual value of σ 2 is unknown to an experimenter, nand he or she2 must estimate σ 2 . In Section 8.3, we proved that S 2 = [1/(n − 1)] i=1 (Yi − Y ) is an unbiased estimator for σ 2 . Throughout our construction of conﬁdence intervals for µ, we used S 2 to estimate σ 2 when σ 2 was unknown. In addition to needing information about σ 2 to calculate conﬁdence intervals for µ and µ1 − µ2 , we may be interested in forming a conﬁdence interval for σ 2 . For example, if we performed a careful chemical analysis of tablets of a particular medication, we would be interested in the mean amount of active ingredient per tablet and the amount of tablet-to-tablet variability, as quantiﬁed by σ 2 . Obviously, for a medication, we desire a small amount of tablet-to-tablet variation and hence a small value for σ 2 . To proceed with our interval estimation procedure, we require the existence of a pivotal quantity. Again, assume that we have a random sample Y1 , Y2 , . . . , Yn from a normal distribution with mean µ and variance σ 2 , both unknown. We know from Theorem 7.3 that n 2 (n − 1)S 2 i=1 (Yi − Y ) = 2 σ σ2 has a χ 2 distribution with (n − 1) df. We can then proceed by the pivotal method to ﬁnd two numbers χ L2 and χU2 such that (n − 1)S 2 2 2 ≤ χU = 1 − α P χL ≤ σ2 for any conﬁdence coefﬁcient (1 − α). (The subscripts L and U stand for lower and upper, respectively.) The χ 2 density function is not symmetric, so we have some freedom in choosing χ L2 and χU2 . We would like to ﬁnd the shortest interval that includes σ 2 with probability (1 − α). Generally, this is difﬁcult and requires a trialand-error search for the appropriate values of χ L2 and χU2 . We compromise by choosing points that cut off equal tail areas, as indicated in Figure 8.11. As a result, we obtain . (n − 1)S 2 2 2 ≤ ≤ χ P χ1−(α/2) (α/2) = 1 − α, σ2

8.9

Conﬁdence Intervals for σ 2

435

F I G U R E 8.11 Location of χ12 − (α/2) 2 and χ α/2 ␣ 兾2

0 2 L

␣ 兾2

2U

and a reordering of the inequality in the probability statement gives (n − 1)S 2 (n − 1)S 2 2 ≤σ ≤ = 1 − α. P 2 2 χ(α/2) χ1−(α/2) The conﬁdence interval for σ 2 is as follows. A 100(1 − α)% Conﬁ dence Interval for σ 2

(n − 1)S 2 (n − 1)S 2 , 2 2 χα/2 χ1−(α/2)

EXAMPLE 8.13

An experimenter wanted to check the variability of measurements obtained by using equipment designed to measure the volume of an audio source. Three independent measurements recorded by this equipment for the same sound were 4.1, 5.2, and 10.2. Estimate σ 2 with conﬁdence coefﬁcient .90.

Solution

If normality of the measurements recorded by this equipment can be assumed, the conﬁdence interval just developed applies. For the data given, s 2 = 10.57. With 2 2 = .103 and χ.05 = α/2 = .05 and (n − 1) = 2 df, Table 6, Appendix 3, gives χ.95 2 5.991. Thus, the 90% conﬁdence interval for σ is (n − 1)s 2 (n − 1)s 2 (2)(10.57) (2)(10.57) , , or , 2 2 5.991 .103 χ.05 χ.95 and ﬁnally, (3.53, 205.24). Notice that this interval for σ 2 is very wide, primarily because n is quite small.

We have previously indicated that the conﬁdence intervals developed in Section 8.8 for µ and µ1 − µ2 had conﬁdence coefﬁcients near the nominal level even if the underlying populations were not normally distributed. In contrast, the intervals for σ 2 presented in this section can have conﬁdence coefﬁcients that differ markedly from the nominal level if the sampled population is not normally distributed.

436

Chapter 8

Estimation

Exercises 8.95

The EPA has set a maximum noise level for heavy trucks at 83 decibels (dB). The manner in which this limit is applied will greatly affect the trucking industry and the public. One way to apply the limit is to require all trucks to conform to the noise limit. A second but less satisfactory method is to require the truck ﬂeet’s mean noise level to be less than the limit. If the latter rule is adopted, variation in the noise level from truck to truck becomes important because a large value of σ 2 would imply that many trucks exceed the limit, even if the mean ﬂeet level were 83 dB. A random sample of six heavy trucks produced the following noise levels (in decibels): 85.4 86.8

86.1

85.3

84.8

86.0.

Use these data to construct a 90% conﬁdence interval for σ 2 , the variance of the truck noiseemission readings. Interpret your results.

8.96

In Exercise 8.81, we gave the carapace lengths of ten mature Thenus orientalis lobsters caught in the seas in the vicinity of Singapore. For your convenience, the data are reproduced here. Suppose that you wished to describe the variability of the carapace lengths of this population of lobsters. Find a 90% conﬁdence interval for the population variance σ 2 . Lobster Field Number

A061 A062 A066 A070 A067 A069 A064 A068 A065 A063

Carapace Length (mm)

8.97

78

66

65

63

60

60

58

56

52

50

Suppose that S 2 is the sample variance based on a sample of size n from a normal population with unknown mean and variance. Derive a 100(1 − α)% a upper conﬁdence bound for σ 2 . b lower conﬁdence bound for σ 2 .

8.98

Given a random sample of size n from a normal population with unknown mean and variance, we developed a conﬁdence interval for the population variance σ 2 in this section. What is the formula for a conﬁdence interval for the population standard deviation σ ?

8.99

In Exercise 8.97, you derived upper and lower conﬁdence bounds, each with conﬁdence coefﬁcient 1 − α, for σ 2 . How would you construct a 100(1 − α)% a b

8.100

upper conﬁdence bound for σ ? lower conﬁdence bound for σ ?

Industrial light bulbs should have a mean life length acceptable to potential users and a relatively small variation in life length. If some bulbs fail too early in their life, users become annoyed and are likely to switch to bulbs produced by a different manufacturer. Large variations above the mean reduce replacement sales; in general, variation in life lengths disrupts the user’s replacement schedules. A random sample of 20 bulbs produced by a particular manufacturer produced the following lengths of life (in hours): 2100 1924

2302 2183

1951 2077

2067 2392

2415 2286

1883 2501

2101 1946

2146 2161

2278 2253

2019 1827

Set up a 99% upper conﬁdence bound for the standard deviation of the lengths of life for the bulbs produced by this manufacturer. Is the true population standard deviation less than 150 hours? Why or why not?

8.101

In laboratory work, it is desirable to run careful checks on the variability of readings produced on standard samples. In a study of the amount of calcium in drinking water undertaken as part of a water quality assessment, the same standard sample was run through the laboratory six

References and Further Readings

437

times at random intervals. The six readings, in parts per million, were 9.54, 9.61, 9.32, 9.48, 9.70, and 9.26. Estimate the population variance σ 2 for readings on this standard, using a 90% conﬁdence interval.

8.102

The ages of a random sample of ﬁve university professors are 39, 54, 61, 72, and 59. Using this information, ﬁnd a 99% conﬁdence interval for the population standard deviation of the ages of all professors at the university, assuming that the ages of university professors are normally distributed.

8.103

A precision instrument is guaranteed to read accurately to within 2 units. A sample of four instrument readings on the same object yielded the measurements 353, 351, 351, and 355. Find a 90% conﬁdence interval for the population variance. What assumptions are necessary? Does the guarantee seem reasonable?

8.10 Summary The objective of many statistical investigations is to make inferences about population parameters based on sample data. Often these inferences take the form of estimates— either point estimates or interval estimates. We prefer unbiased estimators with small variance. The goodness of an unbiased estimator θˆ can be measured by σθˆ because the error of estimation is generally smaller than 2σθˆ with high probability. The mean ˆ = V (θ) ˆ + [B(θ)] ˆ 2 , is small only if the estimator square error of an estimator, MSE(θ) has small variance and small bias. Interval estimates of many parameters, such as µ and p, can be derived from the normal distribution for large sample sizes because of the central limit theorem. If sample sizes are small, the normality of the population must be assumed, and the t distribution is used in deriving conﬁdence intervals. However, the interval for a single mean is quite robust in relation to moderate departures from normality. That is, the actual conﬁdence coefﬁcient associated with intervals that have a nominal conﬁdence coefﬁcient of 100(1 − α)% is very close to the nominal level even if the population distribution differs moderately from normality. The conﬁdence interval for a difference in two means is also robust in relation to moderate departures from normality and to the assumption of equal population variances if n 1 ≈ n 2 . As n 1 and n 2 become more dissimilar, the assumption of equal population variances becomes more crucial. If sample measurements have been selected from a normal distribution, a conﬁdence interval for σ 2 can be developed through use of the χ 2 distribution. These intervals are very sensitive to the assumption that the underlying population is normally distributed. Consequently, the actual conﬁdence coefﬁcient associated with the interval estimation procedure can differ markedly from the nominal value if the underlying population is not normally distributed.

References and Further Readings Casella, G., and R. L. Berger. 2002. Statistical Inference, 2d ed. Paciﬁc Grove, Calif.: Duxbury. Hoel, P. G. 1984. Introduction to Mathematical Statistics, 5th ed. New York: Wiley.

438

Chapter 8

Estimation

Hogg, R. V., A. T. Craig, and J. W. McKean. 2005. Introduction to Mathematical Statistics, 6th ed. Upper Saddle River, N.J.: Pearson Prentice Hall. Mood, A. M., F. A. Graybill, and D. Boes. 1974. Introduction to the Theory of Statistics, 3d ed. New York: McGraw-Hill.

Supplementary Exercises 8.104

Multiple Choice A survey was conducted to determine what adults prefer in cell phone services. The results of the survey showed that 73% of cell phone users wanted e-mail services, with a margin of error of ±4%. What is meant by the phrase “±4%”? a They estimate that 4% of the surveyed population may change their minds between the time that the poll was conducted and the time that the results were published. b There is a 4% chance that the true percentage of cell phone users who want e-mail service will not be in the interval (0.69, 0.77). c Only 4% of the population was surveyed. d It would be unlikely to get the observed sample proportion of 0.73 unless the actual proportion of cell phone users who want e-mail service is between 0.69 and 0.77. e The probability is .04 that the sample proportion is in the interval (0.69, 0.77).

8.105

A random sample of size 25 was taken from a normal population with σ 2 = 6. A conﬁdence interval for the mean was given as (5.37, 7.37). What is the conﬁdence coefﬁcient associate with this interval?

8.106

In a controlled pollination study involving Phlox drummondii, a spring-ﬂowering annual plant common along roadsides in sandy ﬁelds in central Texas, Karen Pittman and Donald Levin17 found that seed survival rates were not affected by water or nutrition deprivation. In the experiment, ﬂowers on plants were identiﬁed as males when they donated pollen and as females when they were pollinated by donor pollen in three treatment groups: control, low water, and low nutrient. The data in the following table reﬂect one aspect of the ﬁndings of the experiment: the number of seeds surviving to maturity for each of the three groups for both male and female parents. Male Treament Control Low water Low nutrient

Female

n

Number Surviving

n

Number Surviving

585 578 568

543 522 510

632 510 589

560 466 546

a Find a 99% conﬁdence interval for the difference between survival proportions in the low-water group versus the low-nutrient group for male parents. b Find a 99% conﬁdence interval for the difference between survival proportions in male and female parents subjected to low water. 17. Source: Karen Pittman and Donald Levin, “Effects of Parental Identities and Environment on Components of Crossing Success on Phlox drummondii,” American Journal of Botany 76(3) (1989).

Supplementary Exercises

439

8.107

Refer to Exercise 8.106. Suppose that you plan to estimate the difference in the survival rates of seeds for male parents in low-water and low-nutrient environments to within .03 with probability .95. If you plan to use an equal number of seeds from male parents in each environment (that is, n 1 = n 2 ), how large should n 1 and n 2 be?

8.108

A chemist who has prepared a product designed to kill 60% of a particular type of insect wants to evaluate the kill rate of her preparation. What sample size should she use if she wishes to be 95% conﬁdent that her experimental results fall within .02 of the true fraction of insects killed?

8.109

To estimate the proportion of unemployed workers in Panama, an economist selected at random 400 persons from the working class. Of these, 25 were unemployed. a Estimate the true proportion of unemployed workers and place bounds on the error of estimation. b How many persons must be sampled to reduce the bound on the error of estimation to .02?

8.110

Past experience shows that the standard deviation of the yearly income of textile workers in a certain state is $400. How many textile workers would you need to sample if you wished to estimate the population mean to within $50.00, with probability .95?

8.111

How many voters must be included in a sample collected to estimate the fraction of the popular vote favorable to a presidential candidate in a national election if the estimate must be correct to within .005? Assume that the true fraction lies somewhere in the neighborhood of .5. Use a conﬁdence coefﬁcient of approximately .95.

8.112

In a poll taken among college students, 300 of 500 fraternity men favored a certain proposition whereas 64 of 100 nonfraternity men favored it. Estimate the difference in the proportions favoring the proposition and place a 2-standard-deviation bound on the error of estimation.

8.113

Refer to Exercise 8.112. How many fraternity and nonfraternity men must be included in a poll if we wish to obtain an estimate, correct to within .05, for the difference in the proportions favoring the proposition? Assume that the groups will be of equal size and that p = .6 will sufﬁce as an approximation of both proportions.

8.114

A chemical process has produced, on the average, 800 tons of chemical per day. The daily yields for the past week are 785, 805, 790, 793, and 802 tons. Estimate the mean daily yield, with conﬁdence coefﬁcient .90, from the data. What assumptions did you make?

8.115

Refer to Exercise 8.114. Find a 90% conﬁdence interval for σ 2 , the variance of the daily yields.

8.116

Do we lose our memory capacity as we get older? In a study of the effect of glucose on memory in elderly men and women, C. A. Manning and colleagues18 tested 16 volunteers (5 men and 11 women) for long-term memory, recording the number of words recalled from a list read to each person. Each person was reminded of the words missed and was asked to recall as many words as possible from the original list. The mean and standard deviation of the long-term word memory scores were y = 79.47 and s = 25.25. Give a 99% conﬁdence interval for the true long-term word memory scores for elderly men and women. Interpret this interval.

8.117

The annual main stem growth, measured for a sample of 17 4-year-old red pine trees, produced a mean of 11.3 inches and a standard deviation of 3.4 inches. Find a 90% conﬁdence interval for the mean annual main stem growth of a population of 4-year-old red pine trees subjected to similar environmental conditions. Assume that the growth amounts are normally distributed.

18. Source: C. A. Manning, J. L. Hall, and P. E. Gold, “Glucose Effects on Memory and Other Neuropsychological Tests in Elderly Humans,” Psychological Science 1(5) (1990).

440

Chapter 8

Estimation

8.118

Owing to the variability of trade-in allowance, the proﬁt per new car sold by an automobile dealer varies from car to car. The proﬁts per sale (in hundreds of dollars), tabulated for the past week, were 2.1, 3.0, 1.2, 6.2, 4.5, and 5.1. Find a 90% conﬁdence interval for the mean proﬁt per sale. What assumptions must be valid for the technique that you used to be appropriate?

8.119

A mathematics test is given to a class of 50 students randomly selected from high school 1 and also to a class of 45 students randomly selected from high school 2. For the class at high school 1, the sample mean is 75 points, and the sample standard deviation is 10 points. For the class at high school 2, the sample mean is 72 points, and the sample standard deviation is 8 points. Construct a 95% conﬁdence interval for the difference in the mean scores. What assumptions are necessary?

8.120

Two methods for teaching reading were applied to two randomly selected groups of elementary schoolchildren and were compared on the basis of a reading comprehension test given at the end of the learning period. The sample means and variances computed from the test scores are shown in the accompanying table. Find a 95% conﬁdence interval for (µ1 − µ2 ). What assumptions are necessary? Statistic Number of children in group y s2

8.121

Method 1

Method 2

11 64 52

14 69 71

A comparison of reaction times for two different stimuli in a psychological word-association experiment produced the results (in seconds) shown in the accompanying table when applied to a random sample of 16 people. Obtain a 90% conﬁdence interval for (µ1 − µ2 ). What assumptions are necessary? Stimulus 1

Stimulus 2

1 3 2 1

4 2 3 3

2 1 3 2

1 2 3 3

8.122

The length of time between billing and receipt of payment was recorded for a random sample of 100 of a certiﬁed public accountant (CPA) ﬁrm’s clients. The sample mean and standard deviation for the 100 accounts were 39.1 days and 17.3 days, respectively. Find a 90% conﬁdence interval for the mean time between billing and receipt of payment for all of the CPA ﬁrm’s accounts. Interpret the interval.

8.123

Television advertisers may mistakenly believe that most viewers understand most of the advertising that they see and hear. A recent research study asked 2300 viewers above age 13 to look at 30-second television advertising excerpts. Of these, 1914 of the viewers misunderstood all or part of the excerpt they saw. Find a 95% conﬁdence interval for the proportion of all viewers (of which the sample is representative) who will misunderstand all or part of the television excerpts used in this study.

8.124

A survey of 415 corporate, government, and accounting executives of the Financial Accounting Foundation found that 278 rated cash ﬂow (as opposed to earnings per share, etc.) as the most important indicator of a company’s ﬁnancial health. Assume that these 415 executives constitute a random sample from the population of all executives. Use the data to ﬁnd a 95% conﬁdence

Supplementary Exercises

441

interval for the fraction of all corporate executives who consider cash ﬂow the most important measure of a company’s ﬁnancial health.

8.125

Suppose that independent samples of sizes n 1 and n 2 are taken from two normally distributed populations with variances σ12 and σ22 , respectively. If S12 and S22 denote the respective sample variances, Theorem 7.3 implies that (n 1 − 1)S12 /σ12 and (n 2 − 1)S22 /σ22 have χ 2 distributions with n 1 − 1 and n 2 − 1 df, respectively. Further, these χ 2 -distributed random variables are independent because the samples were independently taken. Use these quantities to construct a random variable that has an F distribution with n 1 − 1 numerator degrees of freedom and n 2 − 1 denominator degrees of freedom. b Use the F-distributed quantity from part (a) as a pivotal quantity, and derive a formula for a 100(1 − α)% conﬁdence interval for σ22 /σ12 . a

8.126

A pharmaceutical manufacturer purchases raw material from two different suppliers. The mean level of impurities is approximately the same for both suppliers, but the manufacturer is concerned about the variability in the amount of impurities from shipment to shipment. If the level of impurities tends to vary excessively for one source of supply, this could affect the quality of the ﬁnal product. To compare the variation in percentage impurities for the two suppliers, the manufacturer selects ten shipments from each supplier and measures the percentage of impurities in each shipment. The sample variances were s12 = .273 and s22 = .094, respectively. Form a 95% conﬁdence interval for the ratio of the true population variances.

*8.127

Let Y denote the mean of a sample of size 100 taken from a gamma distribution with known α = c0 and unknown β. Show that an approximate 100(1 − α)% conﬁdence interval for β is given by Y Y . √ , √ c0 + .1z α/2 c0 c0 − .1z α/2 c0

*8.128

Suppose that we take a sample of size n 1 from a normally distributed population with mean and variance µ1 and σ12 and an independent of sample size n 2 from a normally distributed population with mean and variance µ2 and σ22 . If it is reasonable to assume that σ12 = σ22 , then the results given in Section 8.8 apply. What can be done if we cannot assume that the unknown variances are equal but are fortunate enough to know that σ22 = kσ12 for some known constant k =

1? Suppose, as previously, that the sample means are given by Y 1 and Y 2 and the sample variances by S12 and S22 , respectively. a

Show that Z given below has a standard normal distribution. Z =

b

Show that W given below has a χ 2 distribution with n 1 + n 2 − 2 df. W =

c

(Y 1 − Y 2 ) − (µ1 − µ2 ) . 1 k 1 σ1 + n1 n2

(n 1 − 1)S12 + (n 2 − 1)S22 /k . σ12

Notice that Z and W from parts (a) and (b) are independent. Finally, show that T =

(Y 1 − Y 2 ) − (µ1 − µ2 ) , 1 k 1 S p + n1 n2

has a t distribution with n 1 + n 2 − 2 df.

where S 2 p =

(n 1 − 1)S12 + (n 2 − 1)S22 /k n1 + n2 − 2

442

Chapter 8

Estimation

Use the result in part (c) to give a 100(1 − α)% conﬁdence interval for µ1 − µ2 , assuming that σ22 = kσ12 . e What happens if k = 1 in parts (a)–(d)?

d

*8.129

We noted in Section 8.3 that if n 2

S =

i=1 (Yi

− Y )2

n

n and

S = 2

i=1 (Yi

− Y )2 , n−1

then S 2 is a biased estimator of σ 2 , but S 2 is an unbiased estimator of the same parameter. If we sample from a normal population, a ﬁnd V (S 2 ). b show that V (S 2 ) > V (S 2 ).

*8.130

Exercise 8.129 suggests that S 2 is superior to S 2 in regard to bias and that S 2 is superior to S 2 because it possesses smaller variance. Which is the better estimator? [Hint: Compare the mean square errors.]

*8.131

Refer 1.129 and 1.130. S 2 and S 2 are two estimators for σ 2 that are of the form n to Exercises 2 c i=1 (Yi − Y ) . What value for c yields estimator for σ 2 with the smallest mean square the n error among all estimators of the form c i=1 (Yi − Y )2 ?

8.132

Refer to Exercises 6.17 and 8.14. The distribution function for a power family distribution is given by 0, y < 0, y α , 0 ≤ y ≤ θ, F(y) = θ 1, y > θ, where α, θ > 0. Assume that a sample of size n is taken from a population with a power family distribution and that α = c where c > 0 is known. a

b

Show that the distribution function of Y(n) = max{Y1 , Y2 , . . . , Yn } is given by 0, y < 0, y nc , 0 ≤ y ≤ θ, FY(n) (y) = θ 1, y > θ, where θ > 0. Show that Y(n) /θ is a pivotal quantity and that for 0 < k < 1 Y(n) ≤ 1 = 1 − k cn . P k< θ

c Suppose that n = 5 and α = c = 2.4. i Use the result from part (b) to ﬁnd k so that Y(5) ≤ 1 = 0.95. P k< θ ii

Give a 95% conﬁdence interval for θ.

Supplementary Exercises

*8.133

443

Suppose that two independent random samples of n 1 and n 2 observations are selected from normal populations. Further, assume that the populations possess a common variance σ 2 . Let n i 2 j=1 (Yi j − Y i ) Si2 = , i = 1, 2. ni − 1 a Show that S 2p , the pooled estimator of σ 2 (which follows), is unbiased: S 2p = b

(n 1 − 1)S12 + (n 2 − 1)S22 . n1 + n2 − 2

Find V (S 2p ).

*8.134

The small-sample conﬁdence interval for µ, based on Student’s t (Section 8.8), possesses a random width—in contrast to the large-sample conﬁdence interval (Section 8.6), where the width is not random if σ 2 is known. Find the expected value of the interval width in the small-sample case if σ 2 is unknown.

*8.135

A conﬁdence interval is unbiased if the expected value of the interval midpoint is equal to the estimated parameter. The expected value of the midpoint of the large-sample conﬁdence interval (Section 8.6) is equal to the estimated parameter, and the same is true for the smallsample conﬁdence √ intervals for µ and (µ1 − µ2 ) (Section 8.8). For example, the midpoint of the interval y ± ts/ n is y, and E(Y ) = µ. Now consider the conﬁdence interval for σ 2 . Show that the expected value of the midpoint of this conﬁdence interval is not equal to σ 2 .

*8.136

The sample mean Y is a good point estimator of the population mean µ. It can also be used to predict a future value of Y independently selected from the population. Assume that you have a sample mean Y and variance S 2 based on a random sample of n measurements from a normal population. Use Student’s t to form a pivotal quantity to ﬁnd a prediction interval for some new value of Y —say, Y p —to be observed in the future. [Hint: Start with the quantity Y p − Y .] Notice the terminology: Parameters are estimated; values of random variables are predicted.

CHAPTER

9

Properties of Point Estimators and Methods of Estimation 9.1 Introduction 9.2 Relative Efﬁciency 9.3 Consistency 9.4 Sufﬁciency 9.5 The Rao–Blackwell Theorem and Minimum-Variance Unbiased Estimation 9.6 The Method of Moments 9.7 The Method of Maximum Likelihood 9.8 Some Large-Sample Properties of Maximum-Likelihood Estimators (Optional) 9.9 Summary References and Further Readings

9.1 Introduction In Chapter 8, we presented some intuitive estimators for parameters often of interest in practical problems. An estimator θˆ for a target parameter θ is a function of the random variables observed in a sample and therefore is itself a random variable. Consequently, an estimator has a probability distribution, the sampling distribution ˆ = θ, then the estimator has the of the estimator. We noted in Section 8.2 that, if E(θ) (sometimes) desirable property of being unbiased. In this chapter, we undertake a more formal and detailed examination of some of the mathematical properties of point estimators—particularly the notions of efﬁciency, consistency, and sufﬁciency. We present a result, the Rao–Blackwell theorem, that provides a link between sufﬁcient statistics and unbiased estimators for parameters. Generally speaking, an unbiased estimator with small variance is or can be made to be 444

9.2

Relative Efﬁciency

445

a function of a sufﬁcient statistic. We also demonstrate a method that can sometimes be used to ﬁnd minimum-variance unbiased estimators for parameters of interest. We then offer two other useful methods for deriving estimators: the method of moments and the method of maximum likelihood. Some properties of estimators derived by these methods are discussed.

9.2 Relative Efﬁciency It usually is possible to obtain more than one unbiased estimator for the same target parameter θ . In Section 8.2 (Figure 8.3), we mentioned that if θˆ1 and θˆ2 denote two unbiased estimators for the same parameter θ, we prefer to use the estimator with the smaller variance. That is, if both estimators are unbiased, θˆ1 is relatively more efﬁcient than θˆ2 if V (θˆ2 ) > V (θˆ1 ). In fact, we use the ratio V (θˆ2 )/V (θˆ1 ) to deﬁne the relative efﬁciency of two unbiased estimators.

DEFINITION 9.1

Given two unbiased estimators θˆ1 and θˆ2 of a parameter θ, with variances V (θˆ1 ) and V (θˆ2 ), respectively, then the efﬁciency of θˆ1 relative to θˆ2 , denoted eff (θˆ1 , θˆ2 ), is deﬁned to be the ratio V (θˆ2 ) . eff (θˆ1 , θˆ2 ) = V (θˆ1 )

If θˆ1 and θˆ2 are unbiased estimators for θ, the efﬁciency of θˆ1 relative to θˆ2 , eff (θˆ1 , θˆ2 ), is greater than 1 only if V (θˆ2 ) > V (θˆ1 ). In this case, θˆ1 is a better unbiased estimator than θˆ2 . For example, if eff (θˆ1 , θˆ2 ) = 1.8, then V (θˆ2 ) = (1.8)V (θˆ1 ), and θˆ1 is preferred to θˆ2 . Similarly, if eff (θˆ1 , θˆ2 ) is less than 1—say, .73—then V (θˆ2 ) = (.73)V (θˆ1 ), and θˆ2 is preferred to θˆ1 . Let us consider an example involving two different estimators for a population mean. Suppose that we wish to estimate the mean of a normal population. Let θˆ1 be the sample median, the middle observation when the sample measurements are ordered according to magnitude (n odd) or the average of the two middle observations (n even). Let θˆ2 be the sample mean. Although proof is omitted, it can be shown that the variance of the sample median, for large n, is V (θˆ1 ) = (1.2533)2 (σ 2 /n). Then the efﬁciency of the sample median relative to the sample mean is 1 V (θˆ2 ) σ 2 /n eff (θˆ1 , θˆ2 ) = = = .6366. = 2 σ 2 /n 2 ˆ (1.2533) (1.2533) V (θ1 ) Thus, we see that the variance of the sample mean is approximately 64% of the variance of the sample median. Therefore, we would prefer to use the sample mean as the estimator for the population mean.

446

Chapter 9

Properties of Point Estimators and Methods of Estimation

E X A M PL E 9.1

Let Y1 , Y2 , . . . , Yn denote a random sample from the uniform distribution on the interval (0, θ ). Two unbiased estimators for θ are n+1 θˆ1 = 2Y and θˆ2 = Y(n) , n where Y(n) = max(Y1 , Y2 , . . . , Yn ). Find the efﬁciency of θˆ1 relative to θˆ2 .

Solution

Because each Yi has a uniform distribution on the interval (0, θ ), µ = E(Yi ) = θ/2 and σ 2 = V (Yi ) = θ 2/12. Therefore, θ ˆ E(θ1 ) = E(2Y ) = 2E(Y ) = 2(µ) = 2 = θ, 2 and θˆ1 is unbiased, as claimed. Further, 2 4 θ θ2 V (Yi ) ˆ . V (θ1 ) = V (2Y ) = 4V (Y ) = 4 = = n n 12 3n To ﬁnd the mean and variance of θˆ2 , recall (see Exercise 6.74) that the density function of Y(n) is given by y n−1 1 n , 0 ≤ y ≤ θ, n−1 f Y (y) = g(n) (y) = n[FY (y)] θ θ 0, elsewhere. Thus,

n y dy = θ, n+1 0 and it follows that E{[(n + 1)/n]Y(n) } = θ; that is, θˆ2 is an unbiased estimator for θ . Because " θ n n 2 E(Y(n) )= n y n+1 dy = θ 2, θ 0 n+2 n E(Y(n) ) = n θ

"

θ

n

we obtain

V (Y(n) ) =

2 E(Y(n) )

and

− [E(Y(n) )] =

2

n − n+2

n n+1

2 θ2

n+1 n+1 2 V (Y(n) ) Y(n) = n n θ2 (n + 1)2 − 1 θ2 = . = n(n + 2) n(n + 2) Therefore, the efﬁciency of θˆ1 relative to θˆ2 is given by 3 V (θˆ2 ) θ 2 /[n(n + 2)] = . eff (θˆ1 , θˆ2 ) = = 2 ˆ θ /3n n+2 V (θ1 ) This efﬁciency is less than 1 if n > 1. That is, if n > 1, θˆ2 has a smaller variance than θˆ1 , and therefore θˆ2 is generally preferable to θˆ1 as an estimator of θ. V (θˆ2 ) = V

Exercises

447

We present some methods for ﬁnding estimators with small variances later in this chapter. For now we wish only to point out that relative efﬁciency is one important criterion for comparing estimators.

Exercises 9.1

In Exercise 8.8, we considered a random sample of size 3 from an exponential distribution with density function given by $ (1/θ )e−y/θ , 0 < y, f (y) = 0, elsewhere, ˆ ˆ ˆ and determined that θ1 = Y1 , θ2 = (Y1 + Y2 )/2, θ3 = (Y1 + 2Y2 )/3, and θˆ5 = Y are all unbiased estimators for θ . Find the efﬁciency of θˆ1 relative to θˆ5 , of θˆ2 relative to θˆ5 , and of θˆ3 relative to θˆ5 .

9.2

Let Y1 , Y2 , . . . , Yn denote a random sample from a population with mean µ and variance σ 2 . Consider the following three estimators for µ: 1 1 1 Y2 + · · · + Yn−1 + Yn , µ ˆ 2 = Y1 + µ ˆ3 = Y. µ ˆ 1 = (Y1 + Y2 ), 2 4 2(n − 2) 4 a b

Show that each of the three estimators is unbiased. Find the efﬁciency of µ ˆ 3 relative to µ ˆ 2 and µ ˆ 1 , respectively.

9.3

Let Y1 , Y2 , . . . , Yn denote a random sample from the uniform distribution on the interval (θ, θ + 1). Let 1 n and θˆ2 = Y(n) − . θˆ1 = Y − 2 n+1 a Show that both θˆ1 and θˆ2 are unbiased estimators of θ. b Find the efﬁciency of θˆ1 relative to θˆ2 .

9.4

Let Y1 , Y2 , . . . , Yn denote a random sample of size n from a uniform distribution on the interval (0, θ ). If Y(1) = min(Y1 , Y2 , . . . , Yn ), the result of Exercise 8.18 is that θˆ1 = (n + 1)Y(1) is an unbiased estimator for θ . If Y(n) = max(Y1 , Y2 , . . . , Yn ), the results of Example 9.1 imply that θˆ2 = [(n + 1)/n]Y(n) is another unbiased estimator for θ. Show that the efﬁciency of θˆ1 to θˆ2 is 1/n 2 . Notice that this implies that θˆ2 is a markedly superior estimator.

9.5

Suppose that Y1 , Y2 , . . . , Yn is a random sample from a normal distribution with mean µ and variance σ 2 . Two unbiased estimators of σ 2 are n 1 1 (Yi − Y )2 and σˆ 22 = (Y1 − Y2 )2 . σˆ 12 = S 2 = n − 1 i=1 2 Find the efﬁciency of σˆ 12 relative to σˆ 22 .

9.6

Suppose that Y1 , Y2 , . . . , Yn denote a random sample of size n from a Poisson distribution with mean λ. Consider λˆ 1 = (Y1 + Y2 )/2 and λˆ 2 = Y . Derive the efﬁciency of λˆ 1 relative to λˆ 2 .

9.7

Suppose that Y1 , Y2 , . . . , Yn denote a random sample of size n from an exponential distribution with density function given by $ (1/θ )e−y/θ , 0 < y, f (y) = 0, elsewhere.

448

Chapter 9

Properties of Point Estimators and Methods of Estimation

In Exercise 8.19, we determined that θˆ1 = nY(1) is an unbiased estimator of θ with MSE(θˆ1 ) = θ 2 . Consider the estimator θˆ2 = Y and ﬁnd the efﬁciency of θˆ1 relative to θˆ2 .

*9.8

Let Y1 , Y2 , . . . , Yn denote a random sample from a probability density function f (y), which has unknown parameter θ. If θˆ is an unbiased estimator of θ, then under very general conditions 2 −1 ∂ ln f (Y ) V (θˆ ) ≥ I (θ ), where I (θ ) = n E − . ∂θ 2 (This is known as the Cramer–Rao inequality.) If V (θˆ ) = I (θ ), the estimator θˆ is said to be efﬁcient.1 a Suppose that f (y) is the normal density with mean µ and variance σ 2 . Show that Y is an efﬁcient estimator of µ. b This inequality also holds for discrete probability functions p(y). Suppose that p(y) is the Poisson probability function with mean λ. Show that Y is an efﬁcient estimator of λ.

9.3 Consistency Suppose that a coin, which has probability p of resulting in heads, is tossed n times. If the tosses are independent, then Y , the number of heads among the n tosses, has a binomial distribution. If the true value of p is unknown, the sample proportion Y /n is an estimator of p. What happens to this sample proportion as the number of tosses n increases? Our intuition leads us to believe that as n gets larger, Y/n should get closer to the true value of p. That is, as the amount of information in the sample increases, our estimator should get closer to the quantity being estimated. Figure 9.1 illustrates the values of pˆ = Y/n for a single sequence of 1000 Bernoulli trials when the true value of p is 0.5. Notice that the values of pˆ bounce around 0.5 when the number of trials is small but approach and stay very close to p = 0.5 as the number of trials increases. The single sequence of 1000 trials illustrated in Figure 9.1 resulted (for larger n) in values for the estimate that were very close to the true value, p = 0.5. Would additional sequences yield similar results? Figure 9.2 shows the combined results of 50 sequences of 1000 trials. Notice that the 50 distinct sequences were not identical. Rather, Figure 9.2 shows a “convergence” of sorts to the true value p = 0.5. This is exhibited by a wider spread of the values of the estimates for smaller numbers of trials but a much narrower spread of values of the estimates when the number of trials is larger. Will we observe this same phenomenon for different values of p? Some of the exercises at the end of this section will allow you to use applets (accessible at www.thomsonedu.com/statistics/wackerly) to explore more fully for yourself. How can we technically express the type of “convergence” exhibited in Figure 9.2? Because Y /n is a random variable, we may express this “closeness” to p in probabilistic terms. In particular, let us examine the probability that the distance between the estimator and the target parameter, |(Y/n) − p|, will be less than some arbitrary positive real number ε. Figure 9.2 seems to indicate that this probability might be

1. Exercises preceded by an asterisk are optional.

9.3

F I G U R E 9.1 Values of pˆ = Y/n for a single sequence of 1000 Bernoulli trials, p = 0.5

Consistency

449

Estimate of p 1.00

0.75

0.504 0.50

0.25

0.00

200

400

600

800

1000

Trials

F I G U R E 9.2 Values of pˆ = Y/n for 50 sequences of 1000 Bernoulli trials, p = 0.5

Estimate of p 1.00

0.75

0.50

0.500

0.25

0.00

200

400

600 Trials

800

1000

increasing as n gets larger. If our intuition is correct and n is large, this probability, * * * *Y P ** − p ** ≤ ε , n should be close to 1. If this probability in fact does tend to 1 as n → ∞, we then say that (Y /n) is a consistent estimator of p, or that (Y/n) “converges in probability to p.”

450

Chapter 9

Properties of Point Estimators and Methods of Estimation

DEFINITION 9.2

The estimator θˆn is said to be a consistent estimator of θ if, for any positive number ε, lim P(|θˆn − θ| ≤ ε) = 1

n→∞

or, equivalently, lim P(|θˆn − θ| > ε) = 0.

n→∞

The notation θˆn expresses that the estimator for θ is calculated by using a sample of size n. For example, Y 2 is the average of two observations whereas Y 100 is the average of the 100 observations contained in a sample of size n = 100. If θˆn is an unbiased estimator, the following theorem can often be used to prove that the estimator is consistent.

THEOREM 9.1

An unbiased estimator θˆn for θ is a consistent estimator of θ if lim V (θˆn ) = 0.

n→∞

Proof

If Y is any random variable with E(Y ) = µ and V (Y ) = σ 2 < ∞ and if k is any nonnegative constant, Tchebysheff’s theorem (see Theorem 4.13) implies that 1 P(|Y − µ| > kσ ) ≤ 2 . k ˆ Because θn is an unbiased estimator for θ, it follows that E(θˆn ) = θ. Let σθˆn = ( V (θˆn ) denote the standard error of the estimator θˆn . If we apply Tchebysheff’s theorem for the random variable θˆn , we obtain * * 1 P *θˆn − θ * > kσθˆn ≤ 2 . k Let n be any ﬁxed sample size. For any positive number ε, ε k= σθˆn is a positive number. Application of Tchebysheff’s theorem for this ﬁxed n and this choice of k shows that * * * * ε V (θˆn ) 1 * * * * ˆ ˆ P θn − θ > ε = P θn − θ > . σθˆn ≤ 2 = σθˆn ε2 ε/σθˆn Thus, for any ﬁxed n, * V (θˆn ) * . 0 ≤ P *θˆn − θ * > ε ≤ ε2

9.3

Consistency

451

If limn→∞ V (θˆn ) = 0 and we take the limit as n → ∞ of the preceding sequence of probabilities, * * V (θˆn ) lim (0) ≤ lim P *θˆn − θ * > ε ≤ lim = 0. n→∞ n→∞ n→∞ ε 2 Thus, θˆn is a consistent estimator for θ . The consistency property given in Deﬁnition 9.2 and discussed in Theorem 9.1 involves a particular type of convergence of θˆn to θ. For this reason, the statement “θˆn is a consistent estimator for θ” is sometimes replaced by the equivalent statement “θˆn converges in probability to θ.” E X A M PL E 9.2

sample from a distribution with mean µ and Let Y1 , Y2 , . . . , Yn denote a random n Yi is a consistent estimator of µ. (Note: variance σ 2 < ∞. Show that Y n = n1 i=1 We use the notation Y n to explicitly indicate that Y is calculated by using a sample of size n.)

Solution

We know from earlier chapters that E(Y n ) = µ and V (Y n ) = σ 2 /n. Because Y n is unbiased for µ and V (Y n ) → 0 as n → ∞, Theorem 9.1 establishes that Y n is a consistent estimator of µ. Equivalently, we may say that Y n converges in probability to µ. The fact that Y n is consistent for µ, or converges in probability to µ, is sometimes referred to as the law of large numbers. It provides the theoretical justiﬁcation for the averaging process employed by many experimenters to obtain precision in measurements. For example, an experimenter may take the average of the weights of many animals to obtain a more precise estimate of the average weight of animals of this species. The experimenter’s feeling, a feeling conﬁrmed by Theorem 9.1, is that the average of many independently selected weights should be quite close to the true mean weight with high probability. In Section 8.3, we considered an intuitive estimator for µ1 − µ2 , the difference in the means of two populations. The estimator discussed at that time was Y 1 − Y 2 , the difference in the means of independent random samples selected from two populations. The results of Theorem 9.2 will be very useful in establishing the consistency of such estimators.

THEOREM 9.2

Suppose that θˆn converges in probability to θ and that θˆn converges in probability to θ . a b c d

θˆn + θˆn converges in probability to θ + θ . θˆn × θˆn converges in probability to θ × θ .

0, θˆn /θˆn converges in probability to θ/θ . If θ = If g(·) is a real-valued function that is continuous at θ, then g(θˆn ) converges in probability to g(θ ).

452

Chapter 9

Properties of Point Estimators and Methods of Estimation

The proof of Theorem 9.2 closely resembles the corresponding proof in the case where {an } and {bn } are sequences of real numbers converging to real limits a and b, respectively. For example, if an → a and bn → b then an + bn → a + b.

E X A M PL E 9.3

Suppose that Y1 , Y2 , . . . , Yn represent a random sample such that E(Yi ) = µ, E(Yi2 ) = µ2 and E(Yi4 ) = µ4 are all ﬁnite. Show that Sn2 =

n 1 (Yi − Y n )2 n − 1 i=1

is a consistent estimator of σ 2 = V (Yi ). (Note: We use subscript n on both S 2 and Y to explicitly convey their dependence on the value of the sample size n.) Solution

We have seen in earlier chapters that S 2 , now written as Sn2 , is n n 1 n 1 2 2 Sn2 = Y 2 − nY n = Y2 − Yn . n − 1 i=1 i n−1 n i=1 i n Yi2 is the average of n independent and identically distributed The statistic (1/n) i=1 µ4 − (µ2 )2 < ∞. By the law random variables, with E(Yi2 ) = µ2 and V (Yi2 ) = n Yi2 converges in probabiof large numbers (Example 9.2), we know that (1/n) i=1 lity to µ2 . Example 9.2 also implies that Y n converges in probability to µ. Because the function g(x) = x 2 is continuous for all ﬁnite values of x, Theorem 9.2(d) implies 2 that Y n converges in probability to µ2 . It then follows from Theorem 9.2(a) that n 1 2 Yi2 − Y n n i=1

converges in probability to µ2 − µ2 = σ 2 . Because n/(n − 1) is a sequence of constants converging to 1 as n → ∞, we can conclude that Sn2 converges in probability to σ 2 . Equivalently, Sn2 , the sample variance, is a consistent estimator for σ 2 , the population variance.

In Section 8.6, we considered large-sample conﬁdence intervals for some parameters of practical interest. In particular, if Y1 , Y2 , . . . , Yn is a random sample from any distribution with mean µ and variance σ 2 , we established that σ Y ± z α/2 √ n is a valid large-sample conﬁdence interval with conﬁdence coefﬁcient approximately equal to (1 − α). If σ 2 is known, this interval can and should be calculated. However, if σ 2 is not known but the sample size is large, we recommended substituting S for σ in the calculation because this entails no signiﬁcant loss of accuracy. The following theorem provides the theoretical justiﬁcation for these claims.

9.3

THEOREM 9.3

Consistency

453

Suppose that Un has a distribution function that converges to a standard normal distribution function as n → ∞. If Wn converges in probability to 1, then the distribution function of Un /Wn converges to a standard normal distribution function. This result follows from a general result known as Slutsky’s theorem (Serﬂing, 2002). The proof of this result is beyond the scope of this text. However, the usefulness of the result is illustrated in the following example.

E X A M PL E 9.4

Suppose that Y1 , Y2 , . . . , Yn is a random sample of size n from a distribution with E(Yi ) = µ and V (Yi ) = σ 2 . Deﬁne Sn2 as Sn2 =

n 1 (Yi − Y n )2 . n − 1 i=1

Show that the distribution function of √

n

Yn − µ Sn

converges to a standard normal distribution function. Solution

In√ Example 9.3, we showed that Sn2 converges in probability to σ 2 . Notice that g(x) = + x/c is a continuous function of x(if both x and c are positive. Hence, it follows from Theorem 9.2(d) that Sn /σ = + Sn2 /σ 2 converges in probability to 1. We also know from the central limit theorem (Theorem 7.4) that the distribution function of √ Yn − µ Un = n σ converges to a standard normal distribution function. Therefore, Theorem 9.3 implies that the distribution function of 2 √ √ Yn − µ Yn − µ n (Sn /σ ) = n σ Sn converges to a standard normal distribution function. √ The result of Example 9.4 tells us that, when n is large, n(Y n − µ)/Sn has approximately a standard normal distribution whatever is the form of the distribution from which the sample is taken. If the√sample is taken from a normal distribution, the results of Chapter 7 imply that t = n(Y n − µ)/Sn has a t distribution with n − 1 degrees of freedom (df). Combining this information, we see that,√if a large sample is taken from a normal distribution, the distribution function of t = n(Y n − µ)/Sn can be approximated by a standard normal distribution function. That is, as n gets large and hence as the number of degrees of freedom gets large, the t-distribution function converges to the standard normal distribution function.

454

Chapter 9

Properties of Point Estimators and Methods of Estimation

If√we obtain a large sample from any distribution, we know from Example 9.4 that n(Y n − µ)/Sn has approximately a standard normal distribution. Therefore, it follows that √ Yn − µ P −z α/2 ≤ n ≤ z α/2 ≈ 1 − α. Sn If we manipulate the inequalities in the probability statement to isolate µ in the middle, we obtain Sn Sn ≤ µ ≤ Y n + z α/2 √ ≈ 1 − α. P Y n − z α/2 √ n n √ Thus, Y n ± z α/2 (Sn / n) forms a valid large-sample conﬁdence interval for µ, with conﬁdence coefﬁcient approximately equal to 1 − α. Similarly, Theorem 9.3 can be applied to show that 1 pˆ n qˆ n pˆ n ± z α/2 n is a valid large-sample conﬁdence interval for p with conﬁdence coefﬁcient approximately equal to 1 − α. In this section, we have seen that the property of consistency tells us something about the distance between an estimator and the quantity being estimated. We have seen that, when the sample size is large, Y n is close to µ, and Sn2 is close to σ 2 , with high probability. We will see other examples of consistent estimators in the exercises and later in the chapter. In this section, we have used the notation Y n , Sn2 , pˆ n , and, in general, θˆn to explicitly convey the dependence of the estimators on the sample size n. We needed to do so because we were interested in computing lim P(|θˆn − θ| ≤ ε).

n→∞

If this limit is 1, then θˆn is a “consistent” estimator for θ (more precisely, θˆn a consistent sequence of estimators for θ ). Unfortunately, this notation makes our estimators look overly complicated. Henceforth, we will revert to the notation θˆ as our estimator for θ and not explicitly display the dependence of the estimator on n. The dependence of θˆ on the sample size n is always implicit and should be used whenever the consistency of the estimator is considered.

Exercises 9.9

Applet Exercise How was Figure 9.1 obtained? Access the applet PointSingle at www. thomsonedu.com/statistics/wackerly. The top applet will generate a sequence of Bernoulli trials [X i = 1, 0 with p(1) = p, p(0) = 1 − p] with p = .5, a scenario equivalent to succesn sively tossing a balanced coin. Let Yn = i=1 X i = the number of 1s in the ﬁrst n trials and pˆ n = Yn /n. For each n, the applet computes pˆ n and plots it versus the value of n. a If pˆ 5 = 2/5, what value of X 6 will result in pˆ 6 > pˆ 5 ? b Click the button “One Trial” a single time. Your ﬁrst observation is either 0 or 1. Which value did you obtain? What was the value of pˆ 1 ? Click the button “One Trial” several more

Exercises

455

times. How many trials n have you simulated? What value of pˆ n did you observe? Is the value close to .5, the true value of p? Is the graph a ﬂat horizontal line? Why or why not? c Click the button “100 Trials” a single time. What do you observe? Click the button “100 Trials” repeatedly until the total number of trials is 1000. Is the graph that you obtained identical to the one given in Figure 9.1? In what sense is it similar to the graph in Figure 9.1? d Based on the sample of size 1000, what is the value of pˆ 1000 ? Is this value what you expected to observe? e Click the button “Reset.” Click the button “100 Trials” ten times to generate another sequence of values for pˆ . Comment.

9.10

Applet Exercise Refer to Exercise 9.9. Scroll down to the portion of the screen labeled “Try different probabilities.” Use the button labeled “p =” in the lower right corner of the display to change the value of p to a value other than .5. a Click the button “One Trial” a few times. What do you observe? b Click the button “100 Trials” a few times. What do you observe about the values of pˆ n as the number of trials gets larger?

9.11

Applet Exercise Refer to Exercises 9.9 and 9.10. How can the results of several sequences of Bernoulli trials be simultaneously plotted? Access the applet PointbyPoint. Scroll down until you can view all six buttons under the top graph. a Do not change the value of p from the preset value p = .5. Click the button “One Trial” a few times to verify that you are obtaining a result similar to those obtained in Exercise 9.9. Click the button “5 Trials” until you have generated a total of 50 trials. What is the value of pˆ 50 that you obtained at the end of this ﬁrst sequence of 50 trials? b Click the button “New Sequence.” The color of your initial graph changes from red to green. Click the button “5 Trials” a few times. What do you observe? Is the graph the same as the one you observed in part (a)? In what sense is it similar? c Click the button “New Sequence.” Generate a new sequence of 50 trials. Repeat until you have generated ﬁve sequences. Are the paths generated by the ﬁve sequences identical? In what sense are they similar?

9.12

Applet Exercise Refer to Exercise 9.11. What happens if each sequence is longer? Scroll down to the portion of the screen labeled “Longer Sequences of Trials.” a Repeat the instructions in parts (a)–(c) of Exercise 9.11. b What do you expect to happen if p is not 0.5? Use the button in the lower right corner to change to value of p. Generate several sequences of trials. Comment.

9.13

Applet Exercise Refer to Exercises 9.9–9.12. Access the applet Point Estimation. a Chose a value for p. Click the button “New Sequence” repeatedly. What do you observe? b Scroll down to the portion of the applet labeled “More Trials.” Choose a value for p and click the button “New Sequence” repeatedly. You will obtain up to 50 sequences, each based on 1000 trials. How does the variability among the estimates change as a function of the sample size? How is this manifested in the display that you obtained?

9.14

Applet Exercise Refer to Exercise 9.13. Scroll down to the portion of the applet labeled “Mean of Normal Data.” Successive observed values of a standard normal random variable can be generated and used to compute the value of the sample mean Y n . These successive values are then plotted versus the respective sample size to obtain one “sample path.”

456

Chapter 9

Properties of Point Estimators and Methods of Estimation

a Do you expect the values of Y n to cluster around any particular value? What value? b If the results of 50 sample paths are plotted, how do you expect the variability of the estimates to change as a function of sample size? c Click the button “New Sequence” several times. Did you observe what you expected based on your answers to parts (a) and (b)?

9.15

Refer to Exercise 9.3. Show that both θˆ1 and θˆ2 are consistent estimators for θ .

9.16

Refer to Exercise 9.5. Is σˆ 22 a consistent estimator of σ 2 ?

9.17

Suppose that X 1 , X 2 , . . . , X n and Y1 , Y2 , . . . , Yn are independent random samples from populations with means µ1 and µ2 and variances σ12 and σ22 , respectively. Show that X − Y is a consistent estimator of µ1 − µ2 .

9.18

In Exercise 9.17, suppose that the populations are normally distributed with σ12 = σ22 = σ 2 . Show that n n 2 2 i=1 (X i − X ) + i=1 (Yi − Y ) 2n − 2 is a consistent estimator of σ 2 .

9.19

Let Y1 , Y2 , . . . , Yn denote a random sample from the probability density function $ θ y θ −1 , 0 < y < 1, f (y) = 0, elsewhere, where θ > 0. Show that Y is a consistent estimator of θ/(θ + 1).

9.20

If Y has a binomial distribution with n trials and success probability p, show that Y /n is a consistent estimator of p.

9.21

Let Y1 , Y2 , . . . , Yn be a random sample of size n from a normal population with mean µ and variance σ 2 . Assuming that n = 2k for some integer k, one possible estimator for σ 2 is given by σˆ 2 =

k 1 (Y2i − Y2i−1 )2 . 2k i=1

a Show that σˆ 2 is an unbiased estimator for σ 2 . b Show that σˆ 2 is a consistent estimator for σ 2 .

9.22

Refer to Exercise 9.21. Suppose that Y1 , Y2 , . . . , Yn is a random sample of size n from a Poisson-distributed population with mean λ. Again, assume that n = 2k for some integer k. Consider k 1 λˆ = (Y2i − Y2i−1 )2 . 2k i=1

a Show that λˆ is an unbiased estimator for λ. b Show that λˆ is a consistent estimator for λ.

9.23

Refer to Exercise 9.21. Suppose that Y1 , Y2 , . . . , Yn is a random sample of size n from a population for which the ﬁrst four moments are ﬁnite. That is, m 1 = E(Y1 ) < ∞, m 2 = E(Y12 ) < ∞, m 3 = E(Y13 ) < ∞, and m 4 = E(Y14 ) < ∞. (Note: This assumption is valid for the normal and Poisson distributions in Exercises 9.21 and 9.22, respectively.) Again, assume

Exercises

457

that n = 2k for some integer k. Consider σˆ 2 =

k 1 (Y2i − Y2i−1 )2 . 2k i=1

a Show that σˆ 2 is an unbiased estimator for σ 2 . b Show that σˆ 2 is a consistent estimator for σ 2 . c Why did you need the assumption that m 4 = E(Y14 ) < ∞?

9.24

Let Y1 , Y2 , Y3 , . . . Yn be independent standard normal random variables. n a What is the distribution of i=1 Yi2 ? n 1 b Let Wn = n i=1 Yi2 . Does Wn converge in probability to some constant? If so, what is the value of the constant?

9.25

Suppose that Y1 , Y2 , . . . , Yn denote a random sample of size n from a normal distribution with mean µ and variance 1. Consider the ﬁrst observation Y1 as an estimator for µ. a Show that Y1 is an unbiased estimator for µ. b Find P(|Y1 − µ| ≤ 1). c Look at the basic deﬁnition of consistency given in Deﬁnition 9.2. Based on the result of part (b), is Y1 a consistent estimator for µ?

*9.26

It is sometimes relatively easy to establish consistency or lack of consistency by appealing directly to Deﬁnition 9.2, evaluating P(|θˆn − θ | ≤ ε) directly, and then showing that limn→∞ P(|θˆn − θ | ≤ ε) = 1. Let Y1 , Y2 , . . . , Yn denote a random sample of size n from a uniform distribution on the interval (0, θ ). If Y(n) = max(Y1 , Y2 , . . . , Yn ), we showed in Exercise 6.74 that the probability distribution function of Y(n) is given by 0, y < 0, F(n) (y) = (y/θ )n , 0 ≤ y ≤ θ, 1, y > θ. For each n ≥ 1 and every ε > 0, it follows that P(|Y(n) − θ | ≤ ε) = P(θ − ε ≤ Y(n) ≤ θ + ε). If ε > θ , verify that P(θ − ε ≤ Y(n) ≤ θ + ε) = 1 and that, for every positive ε < θ , we obtain P(θ − ε ≤ Y(n) ≤ θ + ε) = 1 − [(θ − ε)/θ ]n . b Using the result from part (a), show that Y(n) is a consistent estimator for θ by showing that, for every ε > 0, limn→∞ P(|Y(n) − θ | ≤ ε) = 1. a

*9.27

Use the method described in Exercise 9.26 to show that, if Y(1) = min(Y1 , Y2 , . . . , Yn ) when Y1 , Y2 , . . . , Yn are independent uniform random variables on the interval (0, θ ), then Y(1) is not a consistent estimator for θ . [Hint: Based on the methods of Section 6.7, Y(1) has the distribution function y < 0, 0, F(1) (y) =

*9.28

1 − (1 − y/θ )n ,

0 ≤ y ≤ θ,

1,

y > θ.]

Let Y1 , Y2 , . . . , Yn denote a random sample of size n from a Pareto distribution (see Exercise 6.18). Then the methods of Section 6.7 imply that Y(1) = min(Y1 , Y2 , . . . , Yn ) has the distribution function given by 0, y ≤ β, F(1) (y) = αn 1 − (β/y) , y > β. Use the method described in Exercise 9.26 to show that Y(1) is a consistent estimator of β.

458

Chapter 9

Properties of Point Estimators and Methods of Estimation

*9.29

Let Y1 , Y2 , . . . , Yn denote a random sample of size n from a power family distribution (see Exercise 6.17). Then the methods of Section 6.7 imply that Y(n) = max(Y1 , Y2 , . . . , Yn ) has the distribution function given by 0, y < 0, αn F(n) (y) = (y/θ ) , 0 ≤ y ≤ θ, 1, y > θ. Use the method described in Exercise 9.26 to show that Y(n) is a consistent estimator of θ.

9.30

Let Y1 , Y2 , . . . , Yn be independent random variables, each with probability density function $ 2 3y , 0 ≤ y ≤ 1, f (y) = 0, elsewhere. Show that Y converges in probability to some constant and ﬁnd the constant.

9.31

If Y1 , Y2 , . . . , Yn denote a random sample from a gamma distribution with parameters α and β, show that Y converges in probability to some constant and ﬁnd the constant.

9.32

Let Y1 , Y2 , . . . , Yn denote a random sample from the probability density function 2 , y ≥ 2, f (y) = y 2 0, elsewhere. Does the law of large numbers apply to Y in this case? Why or why not?

9.33

An experimenter wishes to compare the numbers of bacteria of types A and B in samples of water. A total of n independent water samples are taken, and counts are made for each sample. Let X i denote the number of type A bacteria and Yi denote the number of type B bacteria for sample i. Assume that the two bacteria types are sparsely distributed within a water sample so that X 1 , X 2 , . . . , X n and Y1 , Y2 , . . . , Yn can be considered independent random samples from Poisson distributions with means λ1 and λ2 , respectively. Suggest an estimator of λ1 /(λ1 + λ2 ). What properties does your estimator have?

9.34

The Rayleigh density function is given by 2y e−y 2 /θ , y > 0, f (y) = θ 0, elsewhere. In Exercise 6.34(a), you established that Y 2 has an exponential distribution with mean θ. If Y1 , Y2 , . . . , Yn denote a random sample from a Rayleigh distribution, show that Wn = n 1 2 i=1 Yi is a consistent estimator for θ. n

9.35

Let Y1 , Y2 , . . . be a sequence of random variables with E(Yi ) = µ and V (Yi ) = σi2 . Notice that the σi2 ’s are not all equal. a What is E(Y n )? b What is V (Y n )? c Under what condition (on the σi2 ’s) can Theorem 9.1 be applied to show that Y n is a consistent estimator for µ?

9.36

Suppose that Y has a binomial distribution based on n trials and success probability p. Then pˆ n = Y /n is an unbiased estimator of p. Use Theorem 9.3 to prove that the distribution of

9.4

Sufﬁciency

459

√ ( pˆ n − p)/ pˆ n qˆ n /n converges to a standard normal distribution. [Hint: Write Y as we did in Section 7.5.]

9.4 Sufﬁciency Up to this point, we have chosen estimators on the basis of intuition. Thus, we chose Y and S 2 as the estimators of the mean and variance, respectively, of the normal distribution. (It seems like these should be good estimators of the population parameters.) We have seen that it is sometimes desirable to use estimators that are unbiased. Indeed, Y and S 2 have been shown to be unbiased estimators of the population mean µ and variance σ 2 , respectively. Notice that we have used the information in a sample of size n to calculate the value of two statistics that function as estimators for the parameters of interest. At this stage, the actual sample values are no longer important; rather, we summarize the information in the sample that relates to the parameters of interest by using the statistics Y and S 2 . Has this process of summarizing or reducing the data to the two statistics, Y and S 2 , retained all the information about µ and σ 2 in the original set of n sample observations? Or has some information about these parameters been lost or obscured through the process of reducing the data? In this section, we present methods for ﬁnding statistics that in a sense summarize all the information in a sample about a target parameter. Such statistics are said to have the property of sufﬁciency; or more simply, they are called sufﬁcient statistics. As we will see in the next section, “good” estimators are (or can be made to be) functions of any sufﬁcient statistic. Indeed, sufﬁcient statistics often can be used to develop estimators that have the minimum variance among all unbiased estimators. To illustrate the notion of a sufﬁcient statistic, let us consider the outcomes of n trials of a binomial experiment, X 1 , X 2 , . . . , X n , where $ Xi =

1, if the ith trial is a success, 0, if the ith trial is a failure.

If p is the probability of success on any trial then, for i = 1, 2, . . . , n, $

1, with probability p, 0, with probability q = 1 − p. n Suppose that we are given a value of Y = i=1 X i , the number of successes among the n trials. If we know the value of Y , can we gain any further information about p by looking at other functions of X 1 , X 2 , . . . , X n ? One way to answer this question is to look at the conditional distribution of X 1 , X 2 , . . . , X n , given Y : Xi =

P(X 1 = x1 , . . . , X n = xn , Y = y) . P(Y = y) n xi =

y, and it is the The numerator on the right side of this expression is 0 if i=1 probability of an independent sequence of 0s and 1s with a total of y 1s and (n − y) n 0s if i=1 xi = y. Also, the denominator is the binomial probability of exactly y P(X 1 = x1 , . . . , X n = xn |Y = y) =

460

Chapter 9

Properties of Point Estimators and Methods of Estimation

successes in n trials. Therefore, if y = 0, 1, 2, . . . , n, y p (1 − p)n−y 1 n = n , y n−y p (1 − p) P(X 1 = x1 , . . . , X n = xn |Y = y) = y y 0,

if

n

xi = y,

i=1

otherwise.

It is important to note that the conditional distribution of X 1 , X 2 , . . . , X n , given Y , does not depend upon p. That is, once Y is known, no other function of X 1 , X 2 , . . . , X n will shed additional light on the possible value of p. In this sense, Y contains all the information about p. Therefore, the statistic Y is said to be sufﬁcient for p. We generalize this idea in the following deﬁnition. DEFINITION 9.3

Let Y1 , Y2 , . . . , Yn denote a random sample from a probability distribution with unknown parameter θ . Then the statistic U = g(Y1 , Y2 , . . . , Yn ) is said to be sufﬁcient for θ if the conditional distribution of Y1 , Y2 , . . . , Yn , given U , does not depend on θ. In many previous discussions, we have considered the probability function p(y) associated with a discrete random variable [or the density function f (y) for a continuous random variable] to be functions of the argument y only. Our future discussions will be simpliﬁed if we adopt notation that will permit us to explicitly display the fact that the distribution associated with a random variable Y often depends on the value of a parameter θ. If Y is a discrete random variable that has a probability mass function that depends on the value of a parameter θ, instead of p(y) we use the notation p(y | θ ). Similarly, we will indicate the explicit dependence of the form of a continuous density function on the value of a parameter θ by writing the density function as f (y | θ ) instead of the previously used f (y). Deﬁnition 9.3 tells us how to check whether a statistic is sufﬁcient, but it does not tell us how to ﬁnd a sufﬁcient statistic. Recall that in the discrete case the joint distribution of discrete random variables Y1 , Y2 , . . . , Yn is given by a probability function p(y1 , y2 , . . . , yn ). If this joint probability function depends explicitly on the value of a parameter θ, we write it as p(y1 , y2 , . . . , yn | θ). This function gives the probability or likelihood of observing the event (Y1 = y1 , Y2 = y2 , . . . , Yn = yn ) when the value of the parameter is θ. In the continuous case when the joint distribution of Y1 , Y2 , . . . , Yn depends on a parameter θ , we will write the joint density function as f (y1 , y2 , . . . , yn | θ ). Henceforth, it will be convenient to have a single name for the function that deﬁnes the joint distribution of the variables Y1 , Y2 , . . . , Yn observed in a sample.

DEFINITION 9.4

Let y1 , y2 , . . . , yn be sample observations taken on corresponding random variables Y1 , Y2 , . . . , Yn whose distribution depends on a parameter θ. Then, if Y1 , Y2 , . . . , Yn are discrete random variables, the likelihood of the sample, L(y1 , y2 , . . . , yn | θ ), is deﬁned to be the joint probability of y1 , y2 , . . . , yn .

9.4

Sufﬁciency

461

If Y1 , Y2 , . . . , Yn are continuous random variables, the likelihood L(y1 , y2 , . . . , yn | θ ) is deﬁned to be the joint density evaluated at y1 , y2 , . . . , yn . If the set of random variables Y1 , Y2 , . . . , Yn denotes a random sample from a discrete distribution with probability function p(y | θ), then L(y1 , y2 , . . . , yn | θ ) = p(y1 , y2 , . . . , yn | θ) = p(y1 | θ) × p(y2 | θ) ×· · ·× p(yn | θ), whereas if Y1 , Y2 , . . . , Yn have a continuous distribution with density function f (y | θ ), then L(y1 , y2 , . . . , yn | θ ) = f (y1 , y2 , . . . , yn | θ) = f (y1 | θ) × f (y2 | θ) × · · · × f (yn | θ). To simplify notation, we will sometimes denote the likelihood by L(θ) instead of by L(y1 , y2 , . . . , yn | θ ). The following theorem relates the property of sufﬁciency to the likelihood L(θ). THEOREM 9.4

Let U be a statistic based on the random sample Y1 , Y2 , . . . , Yn . Then U is a sufﬁcient statistic for the estimation of a parameter θ if and only if the likelihood L(θ ) = L(y1 , y2 , . . . , yn | θ ) can be factored into two nonnegative functions, L(y1 , y2 , . . . , yn | θ) = g(u, θ) × h(y1 , y2 , . . . , yn ) where g(u, θ) is a function only of u and θ and h(y1 , y2 , . . . , yn ) is not a function of θ. Although the proof of Theorem 9.4 (also known as the factorization criterion) is beyond the scope of this book, we illustrate the usefulness of the theorem in the following example.

E X A M PL E 9.5

Let Y1 , Y2 , . . . , Yn be a random sample in which Yi possesses the probability density function $ (1/θ)e−yi /θ , 0 ≤ yi < ∞, f (yi | θ ) = 0, elsewhere, where θ > 0, i = 1, 2, . . . , n. Show that Y is a sufﬁcient statistic for the parameter θ.

Solution

The likelihood L(θ ) of the sample is the joint density L(y1 , y2 , . . . , yn | θ ) = f (y1 , y2 , . . . , yn | θ) = f (y1 | θ) × f (y2 | θ) × · · · × f (yn | θ)

e−y1 /θ e−n y/θ e−y2 /θ e−yn /θ e− yi /θ = = . × × ··· × = n θ θ θ θ θn

462

Chapter 9

Properties of Point Estimators and Methods of Estimation

Notice that L(θ ) is a function only of θ and y and that if g(y, θ ) =

e−n y/θ θn

and

h(y1 , y2 , . . . , yn ) = 1,

then L(y1 , y2 , . . . , yn | θ) = g(y, θ) × h(y1 , y2 , . . . , yn ). Hence, Theorem 9.4 implies that Y is a sufﬁcient statistic for the parameter θ.

Theorem 9.4 can be used to show that there are many possible sufﬁcient statistics for any one population parameter. First of all, according to Deﬁnition 9.3 or the factorization criterion (Theorem 9.4), the random sample itself is a sufﬁcient statistic. Second, if Y1 , Y2 , . . . , Yn denote a random sample from a distribution with a density function with parameter θ , then the set of order statistics Y(1) ≤ Y(2) ≤ · · · ≤ Y(n) , which is a function of Y1 , Y2 , . . . , Yn , is sufﬁcient for θ. In Example 9.5, we decided that Y is a sufﬁcientstatistic for the estimation of θ. Theorem 9.4 could also have been n used to show that i=1 Yi is another sufﬁcient statistic. Indeed, for the exponential distribution described in Example 9.5, any statistic that is a one–to–one function of Y is a sufﬁcient statistic. In our initial n example of this section, involving the number of successes in n triX i reduces the data X 1 , X 2 , . . . , X n to a single value that remains als, Y = i=1 sufﬁcient for p. Generally, we would like to ﬁnd a sufﬁcient statistic that reduces the data in the sample as much as possible. Although many statistics are sufﬁcient for the parameter θ associated with a speciﬁc distribution, application of the factorization criterion typically leads to a statistic that provides the “best” summary of the information in the data. In Example 9.5, this statistic is Y (or some one-to-one function of it). In the next section, we show how these sufﬁcient statistics can be used to develop unbiased estimators with minimum variance.

Exercises 9.37

Let X 1 , X 2 , . . . , X n denote n independent and identically distributed Bernoulli random variables such that P(X i = 1) = p and P(X i = 0) = 1 − p, n X i is sufﬁcient for p by using the factorization for each i = 1, 2, . . . , n. Show that i=1 criterion given in Theorem 9.4.

9.38

Let Y1 , Y2 , . . . , Yn denote a random sample from a normal distribution with mean µ and variance σ 2 . a If µ is unknown and σ 2 is known, show that Y is sufﬁcient for µ. n b If µ is known and σ 2 is unknown, show that i=1 (Yi − µ)2 is sufﬁcient for σ 2 . n n 2 c If µ and σ are both unknown, show that i=1 Yi and i=1 Yi2 are jointly sufﬁcient for µ n 2 2 and σ . [Thus, it follows that Y and i=1 (Yi − Y ) or Y and S 2 are also jointly sufﬁcient for µ and σ 2 .]

Exercises

463

9.39

Let Y1 , Y2 , . . . , Yn denote a random sample from a Poisson distribution with parameter λ. n Show by conditioning that i=1 Yi is sufﬁcient for λ.

9.40

Let Y1 , Y2 , . . . , Yn denote a randomsample from a Rayleigh distribution with parameter θ. n (Refer to Exercise 9.34.) Show that i=1 Yi2 is sufﬁcient for θ .

9.41

Let Y1 , Y2 , . . . , Yn denote a random sample from distribution with known m and na Weibull unknown α. (Refer to Exercise 6.26.) Show that i=1 Yim is sufﬁcient for α.

9.42

If Y1 , Y2 , . . . , Yn denote a random sample from a geometric distribution with parameter p, show that Y is sufﬁcient for p.

9.43

Let Y1 , Y2 , . . . , Yn denote independent and identically distributed random variables from a power family distribution with parameters α and θ. Then, by the result in Exercise 6.17, if α, θ > 0, $ α−1 α αy /θ , 0 ≤ y ≤ θ, f (y | α, θ ) = 0, elsewhere. 3n If θ is known, show that i=1 Yi is sufﬁcient for α.

9.44

Let Y1 , Y2 , . . . , Yn denote independent and identically distributed random variables from a Pareto distribution with parameters α and β. Then, by the result in Exercise 6.18, if α, β > 0, $ α −(α+1) αβ y , y ≥ β, f (y | α, β) = 0, elsewhere. 3n Yi is sufﬁcient for α. If β is known, show that i=1

9.45

Suppose that Y1 , Y2 , . . . , Yn is a random sample from a probability density function in the (one-parameter) exponential family so that $ a(θ )b(y)e−[c(θ )d(y)] , a ≤ y ≤ b, f (y | θ ) = 0, elsewhere, n d(Yi ) is sufﬁcient for θ. where a and b do not depend on θ. Show that i=1

9.46

If Y1 , Y2 , . . . , Yn denote a random sample from an exponential distribution with mean β, show that f (y | β) is in the exponential family and that Y is sufﬁcient for β.

9.47

Refer to Exercise 9.43. If θ is known, show that the power family of distributions is in the exponential family. What is a sufﬁcient statistic for α? Does this contradict your answer to Exercise 9.43?

9.48

Refer to Exercise 9.44. If β is known, show that the Pareto distribution is in the exponential family. What is a sufﬁcient statistic for α? Argue that there is no contradiction between your answer to this exercise and the answer you found in Exercise 9.44.

*9.49

Let Y1 , Y2 , . . . , Yn denote a random sample from the uniform distribution over the interval (0, θ). Show that Y(n) = max(Y1 , Y2 , . . . , Yn ) is sufﬁcient for θ.

*9.50

Let Y1 , Y2 , . . . , Yn denote a random sample from the uniform distribution over the interval (θ1 , θ2 ). Show that Y(1) = min(Y1 , Y2 , . . . , Yn ) and Y(n) = max(Y1 , Y2 , . . . , Yn ) are jointly sufﬁcient for θ1 and θ2 .

*9.51

Let Y1 , Y2 , . . . , Yn denote a random sample from the probability density function $ −(y−θ) , y ≥ θ, e f (y | θ ) = 0, elsewhere. Show that Y(1) = min(Y1 , Y2 , . . . , Yn ) is sufﬁcient for θ .

464

Chapter 9

Properties of Point Estimators and Methods of Estimation

*9.52

Let Y1 , Y2 , . . . , Yn be a random sample from a population with density function 2 3y , 0 ≤ y ≤ θ, 3 f (y | θ ) = θ 0, elsewhere. Show that Y(n) = max(Y1 , Y2 , . . . , Yn ) is sufﬁcient for θ .

*9.53

Let Y1 , Y2 , . . . , Yn be a random sample from a population with density function 2 2θ , θ < y < ∞, f (y | θ ) = y3 0, elsewhere. Show that Y(1) = min(Y1 , Y2 , . . . , Yn ) is sufﬁcient for θ .

*9.54

Let Y1 , Y2 , . . . , Yn denote independent and identically distributed random variables from a power family distribution with parameters α and θ. Then, as in Exercise 9.43, if α, θ > 0, $ α−1 α αy /θ , 0 ≤ y ≤ θ, f (y | α, θ ) = 0, elsewhere. 3n Yi are jointly sufﬁcient for α and θ . Show that max(Y1 , Y2 , . . . , Yn ) and i=1

*9.55

Let Y1 , Y2 , . . . , Yn denote independent and identically distributed random variables from a Pareto distribution with parameters α and β. Then, as in Exercise 9.44, if α, β > 0, $ α −(α+1) , y ≥ β, αβ y f (y | α, β) = 0, elsewhere. 3n Show that i=1 Yi and min(Y1 , Y2 , . . . , Yn ) are jointly sufﬁcient for α and β.

9.5 The Rao–Blackwell Theorem and Minimum-Variance Unbiased Estimation Sufﬁcient statistics play an important role in ﬁnding good estimators for parameters. If θˆ is an unbiased estimator for θ and if U is a statistic that is sufﬁcient for θ, then there is a function of U that is also an unbiased estimator for θ and has no larger variance than θˆ . If we seek unbiased estimators with small variances, we can restrict our search to estimators that are functions of sufﬁcient statistics. The theoretical basis for the preceding remarks is provided in the following result, known as the Rao–Blackwell theorem. THEOREM 9.5

The Rao–Blackwell Theorem Let θˆ be an unbiased estimator for θ such that V (θˆ ) < ∞. If U is a sufﬁcient statistic for θ, deﬁne θˆ ∗ = E(θˆ | U ). Then, for all θ, ˆ E θˆ ∗ = θ and V θˆ ∗ ≤ V (θ).

Proof

Because U is sufﬁcient for θ, the conditional distribution of any statistic (including θˆ ), given U , does not depend on θ. Thus, θˆ ∗ = E(θˆ | U ) is not a function of θ and is therefore a statistic.

9.5

The Rao–Blackwell Theorem and Minimum-Variance Unbiased Estimation

465

Recall Theorems 5.14 and 5.15 where we considered how to ﬁnd means and variances of random variables by using conditional means and variances. Because θˆ is an unbiased estimator for θ, Theorem 5.14 implies that ˆ = θ. E(θˆ ∗ ) = E[E(θˆ | U )] = E(θ) Thus, θˆ ∗ is an unbiased estimator for θ. Theorem 5.15 implies that V (θˆ ) = V [E(θˆ | U )] + E[V (θˆ | U )] = V (θˆ ∗ ) + E[V (θˆ | U )]. Because V (θˆ | U = u) ≥ 0 for all u, it follows that E[V (θˆ | U )] ≥ 0 and therefore that V (θˆ ) ≥ V (θˆ ∗ ), as claimed. Theorem 9.5 implies that an unbiased estimator for θ with a small variance is or can be made to be a function of a sufﬁcient statistic. If we have an unbiased estimator for θ, we might be able to improve it by using the result in Theorem 9.5. It might initially seem that the Rao–Blackwell theorem could be applied once to get a better unbiased estimator and then reapplied to the resulting new estimator to get an even better unbiased estimator. If we apply the Rao–Blackwell theorem using the sufﬁcient statistic U , then θˆ ∗ = E(θˆ | U ) will be a function of the statistic U , say, θˆ ∗ = h(U ). Suppose that we reapply the Rao–Blackwell theorem to θˆ ∗ by using the same sufﬁcient statistic U . Since, in general, E(h(U ) | U ) = h(U ), we see that by using the Rao– Blackwell theorem again, our “new” estimator is just h(U ) = θˆ ∗ . That is, if we use the same sufﬁcient statistic in successive applications of the Rao–Blackwell theorem, we gain nothing after the ﬁrst application. The only way that successive applications can lead to better unbiased estimators is if we use a different sufﬁcient statistic when the theorem is reapplied. Thus, it is unnecessary to use the Rao–Blackwell theorem successively if we use the right sufﬁcient statistic in our initial application. Because many statistics are sufﬁcient for a parameter θ associated with a distribution, which sufﬁcient statistic should we use when we apply this theorem? For the distributions that we discuss in this text, the factorization criterion typically identiﬁes a statistic U that best summarizes the information in the data about the parameter θ. Such statistics are called minimal sufﬁcient statistics. Exercise 9.66 introduces a method for determining a minimal sufﬁcient statistic that might be of interest to some readers. In a few of the subsequent exercises, you will see that this method usually yields the same sufﬁcient statistics as those obtained from the factorization criterion. In the cases that we consider, these statistics possess another property (completeness) that guarantees that, if we apply Theorem 9.5 using U , we not only get an estimator with a smaller variance but also actually obtain an unbiased estimator for θ with minimum variance. Such an estimator is called a minimum-variance unbiased estimator (MVUE). See Casella and Berger (2002), Hogg, Craig, and McKean (2005), or Mood, Graybill, and Boes (1974) for additional details. Thus, if we start with an unbiased estimator for a parameter θ and the sufﬁcient statistic obtained through the factorization criterion, application of the Rao–Blackwell theorem typically leads to an MVUE for the parameter. Direct computation of

466

Chapter 9

Properties of Point Estimators and Methods of Estimation

conditional expectations can be difﬁcult. However, if U is the sufﬁcient statistic that best summarizes the data and some function of U —say, h(U )—can be found such that E[h(U )] = θ, it follows that h(U ) is the MVUE for θ. We illustrate this approach with several examples. E X A M PL E 9.6

Solution

Let Y1 , Y2 , . . . , Yn denote a random sample from a distribution where P(Yi = 1) = p and P(Yi = 0) = 1 − p, with p unknown (such random variables are often called Bernoulli variables). Use the factorization criterion to ﬁnd a sufﬁcient statistic that best summarizes the data. Give an MVUE for p. Notice that the preceding probability function can be written as P(Yi = yi ) = p yi (1 − p)1−yi ,

yi = 0, 1.

Thus, the likelihood L( p) is L(y1 , y2 , . . . , yn | p) = p(y1 , y2 , . . . , yn | p) = p y1 (1 − p)1−y1 × p y2 (1 − p)1−y2 × · · · × p yn (1 − p)1−yn = p

yi

(1 − p)n− g ( yi , p )

yi

×

1

.

h(y1 , y2 ,...,yn )

n

According to the factorization criterion, U = i=1 Yi is sufﬁcient for p. This statistic best summarizes the information about the parameter p. Notice that E(U ) = np, or estimator for p. Because equivalently, E(U/n) = p. Thus, U/n = Y is an unbiased n this estimator is a function of the sufﬁcient statistic i=1 Yi , the estimator pˆ = Y is the MVUE for p.

E X A M PL E 9.7

Suppose that Y1 , Y2 , . . . , Yn denote a random sample from the Weibull density function, given by 2y −y 2 /θ , y > 0, e f (y | θ ) = θ 0, elsewhere. Find an MVUE for θ .

Solution

We begin by using the factorization criterion to ﬁnd the sufﬁcient statistic that best summarizes the information about θ. L(y1 , y2 , . . . , yn | θ ) = f (y1 , y2 , . . . , yn | θ) n n 2 1 2 = (y1 × y2 × · · · × yn ) exp − y θ θ i=1 i n n 2 1 2 = exp − y × (y1 × y2 × · · · × yn ) . θ θ i=1 i h(y1 ,y2 ,...,yn ) g ( yi2 , θ )

9.5

467

The Rao–Blackwell Theorem and Minimum-Variance Unbiased Estimation

n Yi2 is the minimal sufﬁcient statistic for θ. Thus, U = i=1 We now must ﬁnd a function of this statistic that is unbiased for θ. Letting W = Yi2 , we have √ √ d( w) 2 √ −w/θ 1 1 −w/θ = we , w > 0. e = f W (w) = f ( w) √ dw θ θ 2 w That is, Yi2 has an exponential distribution with parameter θ. Because n 2 2 Yi = nθ, E(Yi ) = E(W ) = θ and E i=1

it follows that n 1 Y2 θˆ = n i=1 i

is an unbiased estimator of θ that is a function of the sufﬁcient statistic Therefore, θˆ is an MVUE of the Weibull parameter θ.

n i=1

Yi2 .

The following example illustrates the use of this technique for estimating two unknown parameters. E X A M PL E 9.8 Solution

Suppose Y1 , Y2 , . . . , Yn denotes a random sample from a normal distribution with unknown mean µ and variance σ 2 . Find the MVUEs for µ and σ 2 . Again, looking at the likelihood function, we have L(y1 , y2 , . . . , yn | µ, σ 2 ) = f (y1 , y2 , . . . , yn |µ, σ 2 ) n n 1 1 2 = exp − 2 (yi − µ) √ 2σ i=1 σ 2π n n n 1 1 2 2 = exp − 2 yi − 2µ yi + nµ √ 2σ σ 2π i=1 i=1 n n n 1 −nµ2 1 2 = exp yi − 2µ yi . exp − 2 √ 2σ 2 2σ σ 2π i=1 i=1 n n Thus, i=1 Yi and i=1 Yi2 , jointly, are sufﬁcient statistics for µ and σ 2 . We know from past work that Y is unbiased for µ and n n 1 1 2 (Yi − Y )2 = Y 2 − nY S2 = n − 1 i=1 n − 1 i=1 i is unbiased for σ 2 . Because these estimators are functions of the statistics that best summarize the information about µ and σ 2 , they are MVUEs for µ and σ 2 .

468

Chapter 9

Properties of Point Estimators and Methods of Estimation

The factorization criterion, together with the Rao–Blackwell theorem, can also be used to ﬁnd MVUEs for functions of the parameters associated with a distribution. We illustrate the technique in the following example. E X A M PL E 9.9

Let Y1 , Y2 , . . . , Yn denote a random sample from the exponential density function given by 1 −y/θ e , y > 0, f (y | θ ) = θ 0, elsewhere. Find an MVUE of V (Yi ).

Solution

2 In Chapter 4, we determined n that E(Yi ) = θ and that V (Yi ) = θ . The factorization criterion implies that i=1 Yi is the best sufﬁcient statistic for θ. In fact, Y is the 2 MVUE of θ. Therefore, it is tempting to use Y as an estimator of θ 2 . But 2 θ2 n+1 + θ2 = θ 2. E Y = V (Y ) + [E(Y )]2 = n n 2

It follows that Y is a biased estimate for θ 2 . However, n 2 Y n+1 is an MVUE of θ 2 because it is an unbiased estimator for θ 2 and a function of the sufﬁcient statistic. No other unbiased estimator of θ 2 will have a smaller variance than this one.

A sufﬁcient statistic for a parameter θ often can be used to construct an exact conﬁdence interval for θ if the probability distribution of the statistic can be found. The resulting intervals generally are the shortest that can be found with a speciﬁed conﬁdence coefﬁcient. We illustrate the technique with an example involving the Weibull distribution. E X A M PL E 9.10

The following data, with measurements in hundreds of hours, represent the lengths of life of ten identical electronic components operating in a guidance control system for missiles: .637 1.531 .733 2.256 2.364 1.601 .152 1.826 1.868 1.126 The length of life of a component of this type is assumed to follow a Weibull distribution with density function given by 2y −y 2 /θ e , y > 0, f (y | θ ) = θ 0, elsewhere. Use the data to construct a 95% conﬁdence interval for θ.

9.5

Solution

The Rao–Blackwell Theorem and Minimum-Variance Unbiased Estimation

469

We saw in Example 9.7 that the sufﬁcient statistic that best summarizes the information n about θ is i=1 Yi2 . We will use this statistic to form a pivotal quantity for constructing the desired conﬁdence interval. Recall from Example 9.7 that Wi = Yi2 has an exponential distribution with mean θ. Now consider the transformation Ti = 2Wi /θ. Then 1 −(θt/2)/θ θ 1 −t/2 θt d(θ t/2) f T (t) = f W = e = e , t > 0. 2 dt θ 2 2 Thus, for each i = 1, 2, . . . , n, Ti has a χ 2 distribution with 2 df. Further, because the variables Yi are independent, the variables Ti are independent, for i = 1, 2, . . . , n. The sum of independent χ 2 random variables has a χ 2 distribution with degrees of freedom equal to the sum of the degrees of freedom of the variables in the sum. Therefore, the quantity 10 i=1

Ti =

10 10 2 2 Wi = Y2 θ i=1 θ i=1 i

has a χ distribution with 20 df. Thus, 2

10 2 Y2 θ i=1 i

is a pivotal quantity, and we can use the pivotal method (Section 8.5) to construct the desired conﬁdence interval. From Table 6, Appendix 3, we can ﬁnd two numbers a and b such that 10 2 P a≤ Y 2 ≤ b = .95. θ i=1 i Manipulating the inequality to isolate θ in the middle, we have 10 θ 1 1 2 2 ≤ 10 2 ≤ Y ≤b = P .95 = P a ≤ θ i=1 i b a 2 i=1 Yi 10 2 10 Yi2 Yi 2 i=1 2 i=1 ≤θ ≤ =P . b a From Table 6, Appendix 3, the value that cuts off an area of .025 in the lower tail of the χ 2 distribution with 20 df is a = 9.591. The value that cuts off an area of .025 in the upper tail of the same distribution is b = 34.170. For the preceding data, 10 2 i=1 Yi = 24.643. Therefore, the 95% conﬁdence interval for the Weibull parameter θ is 2(24.643) 2(24.643) , , or (1.442, 5.139). 34.170 9.591 This is a fairly wide interval for θ , but it is based on only ten observations.

In this section, we have seen that the Rao–Blackwell theorem implies that unbiased estimators with small variances are functions of sufﬁcient statistics. Generally

470

Chapter 9

Properties of Point Estimators and Methods of Estimation

speaking, the factorization criterion presented in Section 9.4 can be applied to ﬁnd sufﬁcient statistics that best summarize the information contained in sample data about parameters of interest. For the distributions that we consider in this text, an MVUE for a target parameter θ can be found as follows. First, determine the best sufﬁcient statistic, U . Then, ﬁnd a function of U , h(U ), such that E[h(U )] = θ . This method often works well. However, sometimes a best sufﬁcient statistic is a fairly complicated function of the observable random variables in the sample. In cases like these, it may be difﬁcult to ﬁnd a function of the sufﬁcient statistic that is an unbiased estimator for the target parameter. For this reason, two additional methods of ﬁnding estimators—the method of moments and the method of maximum likelihood—are presented in the next two sections. A third important method for estimation, the method of least squares, is the topic of Chapter 11.

Exercises 9.56

Refer to Exercise 9.38(b). Find an MVUE of σ 2 .

9.57

Refer to Exercise 9.18. Is the estimator of σ 2 given there an MVUE of σ 2 ? n Refer to Exercise 9.40. Use i=1 Yi2 to ﬁnd an MVUE of θ .

9.58 9.59

The number of breakdowns Y per day for a certain machine is a Poisson random variable with mean λ. The daily cost of repairing these breakdowns is given by C = 3Y 2 . If Y1 , Y2 , . . . , Yn denote the observed number of breakdowns for n independently selected days, ﬁnd an MVUE for E(C).

9.60

Let Y1 , Y2 , . . . , Yn denote a random sample from the probability density function $ θ−1 θ y , 0 < y < 1, θ > 0, f (y | θ ) = 0, elsewhere. a

Show n that this density function is in the (one-parameter) exponential family and that i=1 − ln(Yi ) is sufﬁcient for θ. (See Exercise 9.45.) b If Wi = − ln(Yi ), show that Wi has an exponential distribution with mean 1/θ . n c Use methods similar to those in Example 9.10 to show that 2θ i=1 Wi has a χ 2 distribution with 2n df. d Show that 1 1 n E . = 2θ i=1 Wi 2(n − 1) e

[Hint: Recall Exercise 4.112.] What is the MVUE for θ ?

9.61

Refer to Exercise 9.49. Use Y(n) to ﬁnd an MVUE of θ . (See Example 9.1.)

9.62

Refer to Exercise 9.51. Find a function of Y(1) that is an MVUE for θ .

9.63

Let Y1 , Y2 , . . . , Yn be a random sample from a population with density function 2 3y , 0 ≤ y ≤ θ, f (y | θ ) = θ3 0, elsewhere.

Exercises

471

In Exercise 9.52 you showed that Y(n) = max(Y1 , Y2 , . . . , Yn ) is sufﬁcient for θ. a

b

9.64

Show that Y(n) has probability density function 3ny 3n−1 , 3n f (n) (y | θ ) = θ 0,

0 ≤ y ≤ θ, elsewhere.

Find the MVUE of θ .

Let Y1 , Y2 , . . . , Yn be a random sample from a normal distribution with mean µ and variance 1. 42 = Y − 1/n. a Show that the MVUE of µ2 is µ 4 2 b Derive the variance of µ . 2

*9.65

In this exercise, we illustrate the direct use of the Rao–Blackwell theorem. Let Y1 , Y2 , . . . , Yn be independent Bernoulli random variables with p(yi | p) = p yi (1 − p)1−yi ,

yi = 0, 1.

= 0) = 1 − p. Find the MVUE of p(1 − p), which is a That is, P(Yi = 1) = p and P(Yi n term in the variance of Yi or W = i=1 Yi , by the following steps. a

Let

T =

b

1,

if Y1 = 1 and Y2 = 0,

0,

otherwise.

Show that E(T ) = p(1 − p). Show that P(T = 1 | W = w) =

c

Show that n E(T | W ) = n−1

W n

W 1− n

w(n − w) . n(n − 1) =

n Y (1 − Y ) n−1

and hence that nY (1 − Y )/(n − 1) is the MVUE of p(1 − p).

*9.66

The likelihood function L(y1 , y2 , . . . , yn | θ ) takes on different values depending on the arguments (y1 , y2 , . . . , yn ). A method for deriving a minimal sufﬁcient statistic developed by Lehmann and Scheff´e uses the ratio of the likelihoods evaluated at two points, (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ): L(x1 , x2 , . . . , xn | θ ) . L(y1 , y2 , . . . , yn | θ ) Many times it is possible to ﬁnd a function g(x1 , x2 , . . . , xn ) such that this ratio is free of the unknown parameter θ if and only if g(x1 , x2 , . . . , xn ) = g(y1 , y2 , . . . , yn ). If such a function g can be found, then g(Y1 , Y2 , . . . , Yn ) is a minimal sufﬁcient statistic for θ . a

Let Y1 , Y2 , . . . , Yn be a random sample from a Bernoulli distribution (see Example 9.6 and Exercise 9.65) with p unknown. i

Show that L(x1 , x2 , . . . , xn | p) = L(y1 , y2 , . . . , yn | p)

p 1− p

xi −yi .

472

Chapter 9

Properties of Point Estimators and Methods of Estimation

ii Argue that for this ratio to be independent of p, we must have n i=1

xi −

n

yi = 0

or

i=1

n i=1

xi =

n

yi .

i=1

iii Using the method of Lehmann and Scheff´e, what is a minimal sufﬁcient statistic for p? How does this sufﬁcient statistic compare to the sufﬁcient statistic derived in Example 9.6 by using the factorization criterion? b Consider the Weibull density discussed in Example 9.7. i

Show that

n n 1 x1 x2 · · · xn L(x1 , x2 , . . . , xn | θ ) 2 2 . exp − = x − yi L(y1 , y2 , . . . , yn | θ ) y1 y2 · · · yn θ i=1 i i=1 n ii Argue that i=1 Yi2 is a minimal sufﬁcient statistic for θ.

*9.67

Refer to Exercise 9.66. Suppose that a sample of size from a normal population nn is taken n with mean µ and variance σ 2 . Show that i=1 Yi , and i=1 Yi2 jointly form minimal sufﬁcient statistics for µ and σ 2 .

*9.68

Suppose that a statistic U has a probability density function that is positive over the interval a ≤ u ≤ b and suppose that the density depends on a parameter θ that can range over the interval α1 ≤ θ ≤ α2 . Suppose also that g(u) is continuous for u in the interval [a, b]. If E[g(U ) | θ ] = 0 for all θ in the interval [α1 , α2 ] implies that g(u) is identically zero, then the family of density functions { fU (u | θ ), α1 ≤ θ ≤ α2 } is said to be complete. (All statistics that we employed in Section 9.5 have complete families of density functions.) Suppose that U is a sufﬁcient statistic for θ , and g1 (U ) and g2 (U ) are both unbiased estimators of θ. Show that, if the family of density functions for U is complete, g1 (U ) must equal g2 (U ), and thus there is a unique function of U that is an unbiased estimator of θ. Coupled with the Rao–Blackwell theorem, the property of completeness of fU (u | θ ), along with the sufﬁciency of U , assures us that there is a unique minimum-variance unbiased estimator (UMVUE) of θ .

9.6 The Method of Moments In this section, we will discuss one of the oldest methods for deriving point estimators: the method of moments. A more sophisticated method, the method of maximum likelihood, is the topic of Section 9.7. The method of moments is a very simple procedure for ﬁnding an estimator for one or more population parameters. Recall that the kth moment of a random variable, taken about the origin, is µk = E(Y k ). The corresponding kth sample moment is the average n 1 m k = Y k. n i=1 i The method of moments is based on the intuitively appealing idea that sample moments should provide good estimates of the corresponding population moments.

9.6

The Method of Moments

473

That is, m k should be a good estimator of µk , for k = 1, 2, . . . . Then because the population moments µ1 , µ2 , . . . , µk are functions of the population parameters, we can equate corresponding population and sample moments and solve for the desired estimators. Hence, the method of moments can be stated as follows. Method of Moments Choose as estimates those values of the parameters that are solutions of the equations µk = m k , for k = 1, 2, . . . , t, where t is the number of parameters to be estimated.

EXAMPLE 9.11

Solution

A random sample of n observations, Y1 , Y2 , . . . , Yn , is selected from a population in which Yi , for i = 1, 2, . . . , n, possesses a uniform probability density function over the interval (0, θ) where θ is unknown. Use the method of moments to estimate the parameter θ. The value of µ1 for a uniform random variable is θ µ1 = µ = . 2 The corresponding ﬁrst sample moment is n 1 Yi = Y . m 1 = n i=1 Equating the corresponding population and sample moment, we obtain θ µ1 = = Y . 2 The method-of-moments estimator for θ is the solution of the above equation. That is, θˆ = 2Y .

For the distributions that we consider in this text, the methods of Section 9.3 can be used to show that sample moments are consistent estimators of the corresponding population moments. Because the estimators obtained from the method of moments obviously are functions of the sample moments, estimators obtained using the method of moments are usually consistent estimators of their respective parameters. EXAMPLE 9.12

Show that the estimator θˆ = 2Y , derived in Example 9.11, is a consistent estimator for θ.

Solution

In Example 9.1, we showed that θˆ = 2Y is an unbiased estimator for θ and that ˆ = 0, Theorem 9.1 implies that θˆ = 2Y is a V (θˆ ) = θ 2 /3n. Because limn→∞ V (θ) consistent estimator for θ.

474

Chapter 9

Properties of Point Estimators and Methods of Estimation

Although the estimator θˆ derived in Example 9.11 is consistent, it is not necessarily the best estimator for θ. Indeed, the factorization criterion yields Y(n) = max(Y1 , Y2 , . . . , Yn ) to be the best sufﬁcient statistic for θ . Thus, according to the Rao–Blackwell theorem, the method-of-moments estimator will have larger variance than an unbiased estimator based on Y(n) . This, in fact, was shown to be the case in Example 9.1. E X A M PL E 9.13

A random sample of n observations, Y1 , Y2 , . . . , Yn , is selected from a population where Yi , for i = 1, 2, . . . , n, possesses a gamma probability density function with parameters α and β (see Section 4.6 for the gamma probability density function). Find method-of-moments estimators for the unknown parameters α and β.

Solution

Because we seek estimators for two parameters α and β, we must equate two pairs of population and sample moments. The ﬁrst two moments of the gamma distribution with parameters α and β are (see the inside of the back cover of the text, if necessary) µ1 = µ = αβ

and µ2 = σ 2 + µ2 = αβ 2 + α 2 β 2 .

Now equate these quantities to their corresponding sample moments and solve for αˆ ˆ Thus, and β. µ1 = αβ = m 1 = Y , µ2 = αβ 2 + α 2 β 2 = m 2 =

n 1 Y 2. n i=1 i

ˆ Substituting into the second equation From the ﬁrst equation, we obtain βˆ = Y /α. and solving for α, ˆ we obtain αˆ =

Y

2

2 Yi2 /n − Y

= n

nY

2

i=1 (Yi

− Y )2

.

Substituting αˆ into the ﬁrst equation, we obtain n (Yi − Y )2 Y βˆ = = i=1 . αˆ nY The method-of-moments estimators αˆ and βˆ in Example 9.13 are consistent. Y n Yi2 converges in probability converges in probability to E(Yi ) = αβ, and (1/n) i=1 2 2 2 2 to E(Yi ) = αβ + α β . Thus, αˆ =

1 n

n

Y

i=1

2

Yi2 − Y

2

is a consistent estimator of

(αβ)2 = α, αβ 2 + α 2 β 2 − (αβ)2

and Y βˆ = αˆ

is a consistent estimator of

αβ = β. α

Exercises

475

3n n Yi and the product i=1 Yi to be Using the factorization criterion, we can show i=1 sufﬁcient statistics for the gamma density function. Because the method-of-moments estimators αˆ and βˆ are not functions of these sufﬁcient statistics, we can ﬁnd more efﬁcient estimators for the parameters α and β. However, it is considerably more difﬁcult to apply other methods to ﬁnd estimators for these parameters. To summarize, the method of moments ﬁnds estimators of unknown parameters by equating corresponding sample and population moments. The method is easy to employ and provides consistent estimators. However, the estimators derived by this method are often not functions of sufﬁcient statistics. As a result, method-of-moments estimators are sometimes not very efﬁcient. In many cases, the method-of-moments estimators are biased. The primary virtues of this method are its ease of use and that it sometimes yields estimators with reasonable properties.

Exercises 9.69

Let Y1 , Y2 , . . . , Yn denote a random sample from the probability density function $ (θ + 1)y θ , 0 < y < 1; θ > −1, f (y | θ ) = 0, elsewhere. Find an estimator for θ by the method of moments. Show that the estimator is consistent. Is n the estimator a function of the sufﬁcient statistic − i=1 ln(Yi ) that we can obtain from the factorization criterion? What implications does this have?

9.70

Suppose that Y1 , Y2 , . . . , Yn constitute a random sample from a Poisson distribution with mean λ. Find the method-of-moments estimator of λ.

9.71

If Y1 , Y2 , . . . , Yn denote a random sample from the normal distribution with known mean µ = 0 and unknown variance σ 2 , ﬁnd the method-of-moments estimator of σ 2 .

9.72

If Y1 , Y2 , . . . , Yn denote a random sample from the normal distribution with mean µ and variance σ 2 , ﬁnd the method-of-moments estimators of µ and σ 2 .

9.73

An urn contains θ black balls and N − θ white balls. A sample of n balls is to be selected without replacement. Let Y denote the number of black balls in the sample. Show that (N /n)Y is the method-of-moments estimator of θ.

9.74

Let Y1 , Y2 , . . . , Yn constitute a random sample from the probability density function given by 2 (θ − y), 0 ≤ y ≤ θ, f (y | θ ) = θ2 0, elsewhere. a Find an estimator for θ by using the method of moments. b Is this estimator a sufﬁcient statistic for θ?

9.75

Let Y1 , Y2 , . . . , Yn be a random sample from the probability density function given by (2θ ) (y θ −1 )(1 − y)θ −1 , 0 ≤ y ≤ 1, f (y | θ ) = [(θ )]2 0, elsewhere. Find the method-of-moments estimator for θ .

476

Chapter 9

Properties of Point Estimators and Methods of Estimation

9.76

Let X 1 , X 2 , X 3 , . . . be independent Bernoulli random variables such that P(X i = 1) = p and P(X i = 0) = 1 − p for each i = 1, 2, 3, . . . . Let the random variable Y denote the number of trials necessary to obtain the ﬁrst success—that is, the value of i for which X i = 1 ﬁrst occurs. Then Y has a geometric distribution with P(Y = y) = (1 − p) y−1 p, for y = 1, 2, 3, . . . . Find the method-of-moments estimator of p based on this single observation Y .

9.77

Let Y1 , Y2 , . . . , Yn denote independent and identically distributed uniform random variables on the interval (0, 3θ ). Derive the method-of-moments estimator for θ.

9.78

Let Y1 , Y2 , . . . , Yn denote independent and identically distributed random variables from a power family distribution with parameters α and θ = 3. Then, as in Exercise 9.43, if α > 0, $ α−1 α αy /3 , 0 ≤ y ≤ 3, f (y|α) = 0, elsewhere. Show that E(Y1 ) = 3α/(α + 1) and derive the method-of-moments estimator for α.

*9.79

Let Y1 , Y2 , . . . , Yn denote independent and identically distributed random variables from a Pareto distribution with parameters α and β, where β is known. Then, if α > 0, $ α −(α+1) , y ≥ β, αβ y f (y|α, β) = 0, elsewhere. Show that E(Yi ) = αβ/(α − 1) if α > 1 and E(Yi ) is undeﬁned if 0 < α < 1. Thus, the method-of-moments estimator for α is undeﬁned.

9.7 The Method of Maximum Likelihood In Section 9.5, we presented a method for deriving an MVUE for a target parameter: using the factorization criterion together with the Rao–Blackwell theorem. The method requires that we ﬁnd some function of a minimal sufﬁcient statistic that is an unbiased estimator for the target parameter. Although we have a method for ﬁnding a sufﬁcient statistic, the determination of the function of the minimal sufﬁcient statistic that gives us an unbiased estimator can be largely a matter of hit or miss. Section 9.6 contained a discussion of the method of moments. The method of moments is intuitive and easy to apply but does not usually lead to the best estimators. In this section, we present the method of maximum likelihood that often leads to MVUEs. We use an example to illustrate the logic upon which the method of maximum likelihood is based. Suppose that we are confronted with a box that contains three balls. We know that each of the balls may be red or white, but we do not know the total number of either color. However, we are allowed to randomly sample two of the balls without replacement. If our random sample yields two red balls, what would be a good estimate of the total number of red balls in the box? Obviously, the number of red balls in the box must be two or three (if there were zero or one red ball in the box, it would be impossible to obtain two red balls when sampling without replacement). If there are two red balls and one white ball in the box, the probability of randomly selecting two red balls is 2 1 2 0 1 = . 3 3 2

9.7

The Method of Maximum Likelihood

477

On the other hand, if there are three red balls in the box, the probability of randomly selecting two red balls is 3 2 = 1. 3 2 It should seem reasonable to choose three as the estimate of the number of red balls in the box because this estimate maximizes the probability of obtaining the observed sample. Of course, it is possible for the box to contain only two red balls, but the observed outcome gives more credence to there being three red balls in the box. This example illustrates a method for ﬁnding an estimator that can be applied to any situation. The technique, called the method of maximum likelihood, selects as estimates the values of the parameters that maximize the likelihood (the joint probability function or joint density function) of the observed sample (see Deﬁnition 9.4). Recall that we referred to this method of estimation in Chapter 3 where in Examples 3.10 and 3.13 and Exercise 3.101 we found the maximum-likelihood estimates of the parameter p based on single observations on binomial, geometric, and negative binomial random variables, respectively. Method of Maximum Likelihood Suppose that the likelihood function depends on k parameters θ1 , θ2 , . . . , θk . Choose as estimates those values of the parameters that maximize the likelihood L(y1 , y2 , . . . , yn | θ1 , θ2 , . . . , θk ). To emphasize the fact that the likelihood function is a function of the parameters θ1 , θ2 , . . . , θk , we sometimes write the likelihood function as L(θ1 , θ2 , . . . , θk ). It is common to refer to maximum-likelihood estimators as MLEs. We illustrate the method with an example. EXAMPLE 9.14

A binomial experiment consisting of n trials resulted in observations y1 , y2 , . . . , yn , where yi = 1 if the ith trial was a success and yi = 0 otherwise. Find the MLE of p, the probability of a success.

Solution

The likelihood of the observed sample is the probability of observing y1 , y2 , . . . , yn . Hence, n where y = yi . L( p) = L(y1 , y2 , . . . , yn | p) = p y (1 − p)n−y , i=1

We now wish to ﬁnd the value of p that maximizes L( p). If y = 0, L( p) = (1− p)n , and L( p) is maximized when p = 0. Analogously, if y = n, L( p) = pn and L( p) is maximized when p = 1. If y = 1, 2, . . . , n − 1, then L( p) = p y (1 − p)n−y is zero when p = 0 and p = 1 and is continuous for values of p between 0 and 1. Thus, for y = 1, 2, . . . , n − 1, we can ﬁnd the value of p that maximizes L( p) by setting the derivative d L( p)/d p equal to 0 and solving for p. You will notice that ln[L( p)] is a monotonically increasing function of L( p). Hence, both ln[L( p)] and L( p) are maximized for the same value of p. Because

478

Chapter 9

Properties of Point Estimators and Methods of Estimation

L( p) is a product of functions of p and ﬁnding the derivative of products is tedious, it is easier to ﬁnd the value of p that maximizes ln[L( p)]. We have ln[L( p)] = ln p y (1 − p)n−y = y ln p + (n − y) ln(1 − p). If y = 1, 2, . . . , n − 1, the derivative of ln[L( p)] with respect to p, is d ln[L( p)] 1 −1 =y + (n − y) . dp p 1− p For y = 1, 2, . . . , n − 1, the value of p that maximizes (or minimizes) ln[L( p)] is the solution of the equation y n−y − = 0. pˆ 1 − pˆ Solving, we obtain the estimate pˆ = y/n. You can easily verify that this solution occurs when ln[L( p)] [and hence L( p)] achieves a maximum. Because L( p) is maximized at p = 0 when y = 0, at p = 1 when y = n and at p = y/n when y = 1, 2, . . . , n − 1, whatever the observed value of y, L( p) is maximized when p = y/n. The MLE, pˆ = Y /n, is the fraction of successes in the total number of trials n. Hence, the MLE of p is actually the intuitive estimator for p that we used throughout Chapter 8.

E X A M PL E 9.15

Let Y1 , Y2 , . . . , Yn be a random sample from a normal distribution with mean µ and variance σ 2 . Find the MLEs of µ and σ 2 .

Solution

Because Y1 , Y2 , . . . , Yn are continuous random variables, L(µ, σ 2 ) is the joint density of the sample. Thus, L(µ, σ 2 ) = f (y1 , y2 , . . . , yn | µ, σ 2 ). In this case, L(µ, σ 2 ) = f (y1 , y2 , . . . , yn | µ, σ 2 ) = f (y1 | µ, σ 2 ) × f (y2 |µ, σ 2 ) ×· · ·× f (yn |µ, σ 2 ) ) $ ) $ −(y1 − µ)2 −(yn − µ)2 1 1 exp × · · · × = √ exp √ 2σ 2 2σ 2 σ 2π σ 2π n/2 n 1 −1 = exp (yi − µ)2 . 2πσ 2 2σ 2 i=1 [Recall that exp(w) is just another way of writing ew .] Further, n 1 n n ln L(µ, σ 2 ) = − ln σ 2 − ln 2π − (yi − µ)2 . 2 2 2σ 2 i=1 The MLEs of µ and σ 2 are the values that make ln L(µ, σ 2 ) a maximum. Taking derivatives with respect to µ and σ 2 , we obtain n ∂{ln[L(µ, σ 2 )]} 1 (yi − µ) = 2 ∂µ σ i=1

9.7

and

The Method of Maximum Likelihood

479

n n 1 ∂{ln[L(µ, σ 2 )]} 1 + = − (yi − µ)2 . ∂σ 2 2 σ2 2σ 4 i=1

Setting these derivatives equal to zero and solving simultaneously, we obtain from the ﬁrst equation n n n 1 1 (y − µ) ˆ = 0, or y − n µ ˆ = 0, and µ ˆ = yi = y. i i σˆ 2 i=1 n i=1 i=1 ˆ in the second equation and solving for σˆ 2 , we have Substituting y for µ n n n 1 1 − 2 + 4 (yi − y)2 = 0, or σˆ 2 = (yi − y)2 . σˆ σˆ i=1 n i=1 n 2 2 Thus, Y and σˆ 2 = n1 i=1 (Yi − Y ) are the MLEs of µ and σ , respectively. Notice 2 2 that Y is unbiased for µ. Although σˆ is not unbiased for σ , it can easily be adjusted to the unbiased estimator S 2 (see Example 8.1).

EXAMPLE 9.16

Solution

Let Y1 , Y2 , . . . , Yn be a random sample of observations from a uniform distribution with probability density function f (yi | θ) = 1/θ, for 0 ≤ yi ≤ θ and i = 1, 2, . . . , n. Find the MLE of θ . In this case, the likelihood is given by L(θ ) = f (y1 , y2 , . . . , yn | θ) = f (y1 | θ) × f (y2 | θ) × · · · × f (yn | θ) 1 × 1 × · · · × 1 = 1 , if 0 ≤ yi ≤ θ, i = 1, 2, . . . , n, θ θ θn = θ 0, otherwise. Obviously, L(θ ) is not maximized when L(θ) = 0. You will notice that 1/θ n is a monotonically decreasing function of θ. Hence, nowhere in the interval 0 < θ < ∞ is d[1/θ n ]/dθ equal to zero. However, 1/θ n increases as θ decreases, and 1/θ n is maximized by selecting θ to be as small as possible, subject to the constraint that all of the yi values are between zero and θ. The smallest value of θ that satisﬁes this constraint is the maximum observation in the set y1 , y2 , . . . , yn . That is, θˆ = Y(n) = max(Y1 , Y2 , . . . , Yn ) is the MLE for θ. This MLE for θ is not an unbiased estimator of θ, but it can be adjusted to be unbiased, as shown in Example 9.1.

We have seen that sufﬁcient statistics that best summarize the data have desirable properties and often can be used to ﬁnd an MVUE for parameters of interest. If U is any sufﬁcient statistic for the estimation of a parameter θ, including the sufﬁcient statistic obtained from the optimal use of the factorization criterion, the MLE is always some function of U . That is, the MLE depends on the sample observations only through the value of a sufﬁcient statistic. To show this, we need only observe

480

Chapter 9

Properties of Point Estimators and Methods of Estimation

that if U is a sufﬁcient statistic for θ , the factorization criterion (Theorem 9.4) implies that the likelihood can be factored as L(θ ) = L(y1 , y2 , . . . , yn | θ) = g(u, θ)h(y1 , y2 , . . . , yn ), where g(u, θ) is a function of only u and θ and h(y1 , y2 , . . . , yn ) does not depend on θ. Therefore, it follows that ln[L(θ )] = ln[g(u, θ)] + ln[h(y1 , y2 , . . . , yn )]. Notice that ln[h(y1 , y2 , . . . , yn )] does not depend on θ and therefore maximizing ln[L(θ )] relative to θ is equivalent to maximizing ln[g(u, θ)] relative to θ. Because ln[g(u, θ)] depends on the data only through the value of the sufﬁcient statistic U , the MLE for θ is always some function of U . Consequently, if an MLE for a parameter can be found and then adjusted to be unbiased, the resulting estimator often is an MVUE of the parameter in question. MLEs have some additional properties that make this method of estimation particularly attractive. In Example 9.9, we considered estimation of θ 2 , a function of the parameter θ. Functions of other parameters may also be of interest. For example, the variance of a binomial random variable is np(1 − p), a function of the parameter p. If Y has a Poisson distribution with mean λ, it follows that P(Y = 0) = e−λ ; we may wish to estimate this function of λ. Generally, if θ is the parameter associated with a distribution, we are sometimes interested in estimating some function of θ—say t (θ )—rather than θ itself. In Exercise 9.9