PUT YOUR NAME ON EVERY PAGE OF YOUR ANSWERS

1. In a breeding locale, both Canada geese (the kind that populate the UW

campus) and white geese exist. The Canada geese constitute 60% of the

population; white geese constitute 40%. A survey of "next generation chick

broods" was conducted. The survey noted whether the chicks were a result

of two Canada geese mating, or two white geese mating, or a "mix" (one

parent is a Canada goose and one parent is a white goose). The

investigators are interested in whether the mating among members of the

two populations is essentially random, or whether some other phenomenon is

operating.

The actual data are as follows:

CANADA-CANADA BROODS 641

WHITE-WHITE BROODS 130

CANADA-WHITE BROODS 29

a. [09 pts.] Out of a total of 800 broods, write down the expected

number of broods that are:

CANADA-CANADA

WHITE-WHITE

CANADA-WHITE

P(Can-Can) = 0.6*0.6 = 0.36 0.36*800 = 288 Expected Can-Can

P(White-White) = 0.4*0.4 = 0.16 0.16*800 = 128 Expected White-White

P(Can-White) = P(Can-White)+P(White-Can) = 2*0.4*0.6 = 0.48

OR P(Can-White) = 1-(0.36+0.16) = 0.48

0.48*800 = 384 Expected Can-White

Check: 288+128+384 = 800

b. [15 pts.] Doing a test at the .05 level of significance, calculate the

test statistic and come to the appropriate conclusion regarding whether

the hypothesis of random mating holds for these data. What do you

conclude?

Ho: The geese mate randomly. Ha: The geese do not mate randomly

Ho: P(CC)=0.36;P(WW)=0.16;P(CW)=0.48

Ha: Mating follows some other distribution.

Chisq_0.05,2 = 5.991, Reject if Chisq_obs > 5.991

Obs Exp (O-E)²/E

CC 641 288 432.67

WW 130 128 0.03125

CW 29 384 328.19

Chisq₂ = 432.67+0.031+328.19 = 760.89>5.991, p<<0.001, Reject H₀

The geese do not mate randomly, they seem to show some preference.

c. [02 pts.] Which category is responsible for the largest contribution to

the value of your test statistic?

The CC category gave the largest contribution to the test statistic.

d. [04 pts.] From data inspection (NO further testing here!), what seems

to be going on with these data?

From the data we see much more CC pairings than expected and much fewer CW pairings than expected, while the WW pairings are very close to the expected value. It would seem that Canada geese especially prefer to mate with themselves (“like mates with like”).

2. Scientists from the University of Washington's Department of Urban

Planning and Design studied the relationship between counts of flash

flooding during certain types of storm events, and intensity of

urbanization. Four categories of intensity of urbanization were noted,

with counts of flash flood events recorded.

Intensity of Urbanization: A B C D

No. of flash flood events 20 18 33 19

a. [24 pts] The categories pertain to intensity of urbanization; they are

roughly equally spaced, and they are as follows: A is "No Urbanization",

B is "Low Amount of Urbanization", C is "Moderate Amount of Urbanization",

D is "Intense Urbanization". Using the most powerful available test, test

the null hypothesis of uniformity with respect to the distribution of

flash flood events over the 4 categories of urbanization using the .05

level of significance. What do you conclude?

Since the categories have order, the most powerful test would be the K-S test for ordinal data. However, n is not a multiple of k so I cannot perform the test. I will therefore use a Chisq goodness of fit test.

Ho: The number of flash flood events is uniformly distributed over the intensity of urbanization.

Ha: The number of flash flood events follows some other distribution

Ho: P(A)=P(B)=P(C)=P(D) = 1/4

Ha: at least on inequality exists

Chisq_0.05,3 = 7.815 Reject Ho if Chisq_obs > 7.815

Intensity: A B C D

f 20 18 33 19

fhat 22.5 22.5 22.5 22.5

Chisq₃ = 6.62 < 7.815, Fail to reject Ho.

b. [04 pts] By data inspection (*no* further testing here!), what seems

to be the relationship between intensity of urbanization and the presence

of flash flood events?

There is no apparent relationship between intensity of urbanization and the presence of flash flood events

3. Scientist assigned a group of homogeneous volunteers [i.e., all the same

sex, bodymass, and general fitness] randomly to each of four treatment

groups (A, B, C, D).

Each individual was exposed to a virus well-known for causing common warts, and after a specified period of time it was noted whether each individual had

experienced common warts, or not. Results are as follows:

A B C D

WARTS 21 08 02 10

NO WARTS 05 17 24 17

TOTAL 26 25 26 27

a. [08 pts.] Assuming that the null hypothesis of "homogeneity among the treatments" is TRUE, compute the best estimate of probability of experiencing warts for the Treatment C group.

21+8+2+10 = 41 = R₁ 26+25+26+27 = 104 = N 41/104 = 0.394

b. [05 pts.] Assuming that the null hypothesis of "homogeneity among the treatments" is true, compute the EXPECTED COUNT for the "No Warts, Treatment B" group.

5+17+24+17 = 63 = R₂ 25*63/104 = 15.14

c. [05 pts.] The computed test statistic to test the null hypothesis

[stated above] comes out to be 30.21. At the .05 level of significance,

what is the appropriate conclusion? Include the critical tabled value to

do the test.

Ho: p₁₁ = p₁₂ = p₁₃ = p₁₄; Ha: at least on inequality exists

Reject Ho if Chisq_obs > Chisq_0.05,3 = 7.815

30.21>7.815, Reject Ho p<0.001

It seems that the probability of getting a wart is not the same among the four viruses

d. [02 pts.] Find the two treatment groups that look *most like

each other* [this would be "Step 1 in a subdivision process".]

I can calculate column proportions to compare the groups.

A B C D

Wart 21/26=0.81 8/25=0.32 2/26=0.08 10/27=0.37

Viruses B and D have the most similar proportions, hence they look most like each other.

e. [10 pts.] Do the subtest for *only* the two treatment groups from part

[d].

Ho: p₁₁ = p₁₂ Ha: Not equal Reject Ho if Chisq_obs > 3.841

Observed Values

B D Total

WARTS 8 10 18

NO WARTS 17 17 34

Total 25 27 52

Short-cut calculation:

(8*17-17*10)²*52/(25*27*18*34)=0.145<3.841, fail to reject

Expected values

B D Total

WARTS 8.65 9.35 18

NO WARTS 16.35 17.65 34

Total 25 27 52

4. A statewide Washington Standards for Learning (WASL) test is designed to have a normal distribution of resulting scores, with a population a mean score of mu=250 and a standard deviation of sigma=50.

a. [05 pts] For an individual drawn at random from the population, what is

the probability of an individual's score being greater than 205?

P(X>205) = P(Z>(205-250)/50) = P(Z>-0.9) = 1-P(Z>0.9) = 1-0.1841 = 0.8159

b. [06 pts] What two values capture the middle 54% of the population?

1-.54 = 0.46 0.46/2 = 0.23 (area in each tail)

z = +/- 0.74

x = 0.74*50+250 = 287

x = -0.74*50+250 = 213

P(287<X<213) = 0.54

c. [05 pts] 91% of the population will have a score greater than what value?

1-0.91 = 0.09 P(z>1.34) = 0.09, P(z>-1.34) = 0.91

-1.34*50+250 = 183

d. [07 pts] If 25 individuals are drawn at random from this population,

what is the probability that their average score will be greater than 245?

Xbar~N(250,50/sqrt(25)); Xbar~N(250,10)

P(Xbar>245) = P(Z>(245-250)/10) = P(Z>-0.5) = 1-P(Z>0.5) = 1-0.3085=0.6915

e. [07 pts] If 25 individuals are drawn at random from this population,

65% of the time their average score will be between what two values?

1-0.65 = 0.35; 0.35/2 = 0.175 (area in each tail)

z = +/- 0.935

0.935*10+250 = 259.35

-0.935*10+250 = 240.65

P(240.65<Xbar<259.35) = 0.65