1.  In a breeding locale, both Canada geese (the kind that populate the UW

campus) and white geese exist. The Canada geese constitute 60% of the

population; white geese constitute 40%. A survey of "next generation chick

broods" was conducted. The survey noted whether the chicks were a result

of two Canada geese mating, or two white geese mating, or a "mix" (one

parent is a Canada goose and one parent is a white goose). The

investigators are interested in whether the mating among members of the

two populations is essentially random, or whether some other phenomenon is

operating.

 

The actual data are as follows:

 

CANADA-CANADA BROODS      641

WHITE-WHITE BROODS      130

CANADA-WHITE BROODS      29

 

a. [09 pts.] Out of a total of 800  broods, write down the expected

number of broods that are:

 

CANADA-CANADA

WHITE-WHITE

CANADA-WHITE

P(Can-Can) = 0.6*0.6 = 0.36        0.36*800 = 288 Expected Can-Can

P(White-White) = 0.4*0.4 = 0.16      0.16*800 = 128 Expected White-White

P(Can-White) = P(Can-White)+P(White-Can) = 2*0.4*0.6 = 0.48

OR P(Can-White) = 1-(0.36+0.16) = 0.48

0.48*800 = 384 Expected Can-White

Check: 288+128+384 = 800

 

b. [15 pts.] Doing a test at the .05 level of significance, calculate the

test statistic and come to the appropriate conclusion regarding whether

the hypothesis of random mating holds for these data. What do you

conclude?

Ho: The geese mate randomly.         Ha: The geese do not mate randomly

Ho: P(CC)=0.36;P(WW)=0.16;P(CW)=0.48

Ha: Mating follows some other distribution.

Chisq0.05,2 = 5.991, Reject if Chisqobs > 5.991

 

      Obs         Exp         (O-E)2/E

CC    641         288         432.67     

WW    130         128         0.03125    

CW    29          384         328.19

Chisq2 = 432.67+0.031+328.19 = 760.89>5.991, p<<0.001, Reject H0

The geese do not mate randomly, they seem to show some preference.

 

c. [02 pts.] Which category is responsible for the largest contribution to

the value of your test statistic?

The CC category gave the largest contribution to the test statistic.

 

d. [04 pts.] From data inspection (NO further testing here!), what seems

to be going on with these data?

 

From the data we see much more CC pairings than expected and much fewer CW pairings than expected, while the WW pairings are very close to the expected value.  It would seem that Canada geese especially prefer to mate with themselves (“like mates with like”).


2. Scientists from the University of Washington's Department of Urban

Planning and Design studied the relationship between counts of flash

flooding during certain types of storm events, and intensity of

urbanization.  Four categories of intensity of urbanization were noted,

with counts of flash flood events recorded.

 

Intensity of Urbanization:     A      B      C      D    

 

No. of flash flood events       20      18      33      19

 

a. [24 pts] The categories pertain to intensity of urbanization; they are

roughly equally spaced, and they are as follows:  A is "No Urbanization",

B is "Low Amount of Urbanization", C is "Moderate Amount of Urbanization",

D is "Intense Urbanization".  Using the most powerful available test, test

the null hypothesis of uniformity with respect to the distribution of

flash flood events over the 4 categories of urbanization using the .05

level of significance.  What do you conclude?

 

Since the categories have order, the most powerful test would be the K-S test for ordinal data.  However, n is not a multiple of k so I cannot perform the test.  I will therefore use a Chisq goodness of fit test.

 

Ho: The number of flash flood events is uniformly distributed over the intensity of urbanization.

Ha: The number of flash flood events follows some other distribution

 

Ho: P(A)=P(B)=P(C)=P(D) = 1/4

Ha: at least on inequality exists

Chisq0.05,3 = 7.815  Reject Ho if Chisqobs > 7.815

 

 

Intensity:      A      B      C      D    

 

f           20      18      33      19

fhat        22.5  22.5  22.5  22.5

 

Chisq3 = 6.62 < 7.815, Fail to reject Ho.

 

 

b. [04 pts] By data inspection (*no* further testing here!), what seems

to be the relationship between intensity of urbanization and the presence

of flash flood events?

 

There is no apparent relationship between intensity of urbanization and the presence of flash flood events


3. Scientist assigned a group of homogeneous volunteers [i.e., all the same

sex, bodymass, and general fitness] randomly to each of four treatment

groups (A, B, C, D).

 

Each individual was exposed to a virus well-known for causing common warts, and after a specified period of time it was noted whether each individual had

experienced common warts, or not. Results are as follows:

            A      B           C           D

WARTS       21      08          02          10

NO WARTS      05      17          24          17

TOTAL       26      25          26          27

 

a. [08 pts.] Assuming that the null hypothesis of "homogeneity among the treatments" is TRUE, compute the best estimate of probability of experiencing warts for the Treatment C group.

21+8+2+10 = 41 = R1      26+25+26+27 = 104 = N         41/104 = 0.394

 

b. [05 pts.] Assuming that the null hypothesis of "homogeneity among the treatments" is true, compute the EXPECTED COUNT for the "No Warts, Treatment B" group.

5+17+24+17 = 63 = R2           25*63/104 = 15.14

 

c. [05 pts.] The computed test statistic to test the null hypothesis

[stated above] comes out to be 30.21. At the .05 level of significance,

what is the appropriate conclusion? Include the critical tabled value to

do the test.

Ho: p11 = p12 = p13 = p14;        Ha: at least on inequality exists

Reject Ho if Chisqobs > Chisq0.05,3 = 7.815

30.21>7.815, Reject Ho         p<0.001

It seems that the probability of getting a wart is not the same among the four viruses

 

d. [02 pts.] Find the two treatment groups that look *most like

each other* [this would be "Step 1 in a subdivision process".]

 

I can calculate column proportions to compare the groups.

            A           B           C           D

Wart      21/26=0.81      8/25=0.32      2/26=0.08      10/27=0.37

Viruses B and D have the most similar proportions, hence they look most like each other.

 

e. [10 pts.] Do the subtest for *only* the two treatment groups from part

[d].

Ho: p11 = p12       Ha: Not equal      Reject Ho if Chisqobs > 3.841

Observed Values

            B           D           Total

 WARTS      8           10          18

NO WARTS      17          17          34

Total       25          27          52

Short-cut calculation:

      (8*17-17*10)2*52/(25*27*18*34)=0.145<3.841, fail to reject

Expected values

            B           D           Total

 WARTS      8.65        9.35        18

NO WARTS      16.35       17.65       34

Total       25          27          52


4.  A statewide Washington Standards for Learning (WASL) test is designed to have a normal distribution of resulting scores, with a population a mean score of mu=250 and a standard deviation of sigma=50.

 

 

a. [05 pts] For an individual drawn at random from the population, what is

the probability of an individual's score being greater than 205?

P(X>205) = P(Z>(205-250)/50) = P(Z>-0.9) = 1-P(Z>0.9) = 1-0.1841 = 0.8159

 

b. [06 pts] What two values capture the middle 54% of the population?

1-.54 = 0.46      0.46/2 = 0.23 (area in each tail)

z = +/- 0.74

x = 0.74*50+250 = 287

x = -0.74*50+250 = 213

P(287<X<213) = 0.54

 

c. [05 pts] 91% of the population will have a score greater than what value?

1-0.91 = 0.09      P(z>1.34) = 0.09, P(z>-1.34) = 0.91

-1.34*50+250 = 183

 

 

d. [07 pts] If 25 individuals are drawn at random from this population,

what is the probability that their average score will be greater than 245?

 

Xbar~N(250,50/sqrt(25));      Xbar~N(250,10)

P(Xbar>245) = P(Z>(245-250)/10) = P(Z>-0.5) = 1-P(Z>0.5) = 1-0.3085=0.6915 

 

e. [07 pts] If 25 individuals are drawn at random from this population,

65% of the time their average score will be between what two values?

 

1-0.65 = 0.35; 0.35/2 = 0.175 (area in each tail)

z = +/- 0.935

0.935*10+250 = 259.35

-0.935*10+250 = 240.65

P(240.65<Xbar<259.35) = 0.65