1. In a breeding locale, both Canada geese
(the kind that populate the UW
campus) and white geese exist. The
Canada geese constitute 60% of the
population; white geese constitute
40%. A survey of "next generation chick
broods" was conducted. The
survey noted whether the chicks were a result
of two Canada geese mating, or
two white geese mating, or a "mix" (one
parent is a Canada goose and one
parent is a white goose). The
investigators are interested in whether
the mating among members of the
two populations is essentially
random, or whether some other phenomenon is
operating.
The actual data are as
follows:
CANADA-CANADA BROODS 641
WHITE-WHITE BROODS 130
CANADA-WHITE BROODS 29
a. [09
pts.]
Out of a total of 800
broods, write down the expected
number of broods that are:
CANADA-CANADA
WHITE-WHITE
CANADA-WHITE
P(Can-Can)
= 0.6*0.6 = 0.36 0.36*800 = 288 Expected Can-Can
P(White-White)
= 0.4*0.4 = 0.16 0.16*800 = 128
Expected White-White
P(Can-White)
= P(Can-White)+P(White-Can) = 2*0.4*0.6 = 0.48
OR
P(Can-White) = 1-(0.36+0.16) = 0.48
0.48*800
= 384 Expected Can-White
Check:
288+128+384 = 800
b. [15
pts.]
Doing a test at the .05 level of significance, calculate the
test statistic and come to the
appropriate conclusion regarding whether
the hypothesis of random mating
holds for these data. What do you
conclude?
Ho: The geese mate
randomly. Ha: The geese do not mate randomly
Ho:
P(CC)=0.36;P(WW)=0.16;P(CW)=0.48
Ha: Mating follows
some other distribution.
Chisq0.05,2
= 5.991, Reject if Chisqobs > 5.991
Obs Exp (O-E)2/E
CC 641 288 432.67
WW 130 128 0.03125
CW 29 384 328.19
Chisq2
= 432.67+0.031+328.19 = 760.89>5.991, p<<0.001, Reject H0
The geese do not
mate randomly, they seem to show some preference.
c. [02
pts.]
Which category is responsible for the largest contribution to
the value of your test
statistic?
The CC category
gave the largest contribution to the test statistic.
d. [04
pts.]
From data inspection (NO further testing here!), what seems
to be going on with these
data?
From the data we
see much more CC pairings than expected and much fewer CW pairings than
expected, while the WW pairings are very close to the expected value. It would seem that Canada geese especially
prefer to mate with themselves (“like mates with like”).
2. Scientists from the
University of Washington's Department of Urban
Planning and Design studied
the relationship between counts of flash
flooding during certain types of
storm events, and intensity of
urbanization. Four categories of intensity of urbanization were noted,
with counts of flash flood
events recorded.
Intensity of Urbanization: A B C D
No. of flash flood events 20 18 33 19
a. [24 pts] The categories pertain to intensity of urbanization; they
are
roughly equally spaced, and they are
as follows: A is "No
Urbanization",
B is "Low Amount of
Urbanization", C is "Moderate Amount of
Urbanization",
D is "Intense
Urbanization". Using the most
powerful available test, test
the null hypothesis of
uniformity with respect to the distribution of
flash flood events over the 4
categories of urbanization using the .05
level of significance. What do you conclude?
Since the
categories have order, the most powerful test would be the K-S test for ordinal
data. However, n is not a multiple of k
so I cannot perform the test. I will
therefore use a Chisq goodness of fit test.
Ho: The number of
flash flood events is uniformly distributed over the intensity of urbanization.
Ha: The number of
flash flood events follows some other distribution
Ho: P(A)=P(B)=P(C)=P(D)
= 1/4
Ha: at least on
inequality exists
Chisq0.05,3
= 7.815 Reject Ho if Chisqobs
> 7.815
Intensity: A B C D
f 20 18 33 19
fhat 22.5 22.5
22.5 22.5
Chisq3
= 6.62 < 7.815, Fail to reject Ho.
b. [04 pts] By data inspection (*no* further testing here!), what seems
to be the relationship between
intensity of urbanization and the presence
of flash flood events?
There is no
apparent relationship between intensity of urbanization and the presence of
flash flood events
3. Scientist assigned a
group of homogeneous volunteers [i.e., all the same
sex, bodymass,
and general fitness] randomly to each of four treatment
groups (A, B, C, D).
Each individual was exposed
to a virus well-known for causing common warts, and after a specified period of
time it was noted whether each individual had
experienced common warts, or not.
Results are as follows:
A B C D
WARTS 21 08 02 10
NO WARTS 05 17 24 17
TOTAL 26 25 26 27
a. [08
pts.] Assuming that the null hypothesis of "homogeneity among the
treatments" is TRUE, compute the best estimate of probability of
experiencing warts for the Treatment C group.
21+8+2+10
= 41 = R1 26+25+26+27 =
104 = N 41/104
= 0.394
b. [05
pts.]
Assuming that the null hypothesis of "homogeneity among the treatments"
is true, compute the EXPECTED COUNT for the "No Warts, Treatment B"
group.
5+17+24+17 = 63 =
R2 25*63/104 = 15.14
c. [05
pts.]
The computed test statistic to test the null hypothesis
[stated
above] comes out to be 30.21. At the .05 level of significance,
what is the appropriate
conclusion? Include the critical tabled value to
do the test.
Ho: p11
= p12 = p13 = p14; Ha: at least on
inequality exists
Reject Ho if Chisqobs
> Chisq0.05,3 = 7.815
30.21>7.815,
Reject Ho p<0.001
It seems that the
probability of getting a wart is not the same among the four viruses
d. [02
pts.]
Find the two treatment groups that look *most like
each other* [this would be
"Step 1 in a subdivision process".]
I can calculate
column proportions to compare the groups.
A B C D
Wart 21/26=0.81 8/25=0.32 2/26=0.08 10/27=0.37
Viruses B and D
have the most similar proportions, hence they look most like each other.
e. [10
pts.] Do
the subtest for *only* the two treatment groups from part
[d].
Ho: p11
= p12 Ha: Not equal Reject
Ho if Chisqobs > 3.841
Observed Values
B D Total
WARTS 8 10 18
NO WARTS 17 17 34
Total 25 27 52
Short-cut
calculation:
(8*17-17*10)2*52/(25*27*18*34)=0.145<3.841,
fail to reject
Expected values
B D Total
WARTS 8.65 9.35 18
NO WARTS 16.35 17.65 34
Total 25 27 52
4. A statewide Washington Standards for Learning (WASL) test is
designed to have a normal distribution of resulting scores, with a population a
mean score of mu=250 and a standard deviation of
sigma=50.
a. [05 pts] For an individual drawn at random from the population, what
is
the probability of an
individual's score being greater than 205?
P(X>205) =
P(Z>(205-250)/50) = P(Z>-0.9) = 1-P(Z>0.9) = 1-0.1841 = 0.8159
b. [06 pts] What two values capture the middle 54% of the population?
1-.54 = 0.46 0.46/2 = 0.23 (area in each tail)
z = +/- 0.74
x = 0.74*50+250 =
287
x = -0.74*50+250 =
213
P(287<X<213)
= 0.54
c. [05 pts] 91% of the
population will have a score greater than what value?
1-0.91 = 0.09 P(z>1.34) = 0.09, P(z>-1.34) = 0.91
-1.34*50+250 = 183
d. [07 pts] If 25
individuals are drawn at random from this population,
what is the probability that
their average score will be greater than 245?
Xbar~N(250,50/sqrt(25)); Xbar~N(250,10)
P(Xbar>245) =
P(Z>(245-250)/10) = P(Z>-0.5) = 1-P(Z>0.5) = 1-0.3085=0.6915
e. [07 pts] If 25
individuals are drawn at random from this population,
65% of the time their
average score will be between what two values?
1-0.65 = 0.35; 0.35/2 = 0.175 (area in each tail)
z = +/- 0.935
0.935*10+250 =
259.35
-0.935*10+250 =
240.65
P(240.65<Xbar<259.35)
= 0.65