QSCI
482, Fall 2001 ANSWERS TO REVIEW
QUESTIONS FOR EXAM 1
1. Scientists have noticed that in certain
tropical locations, muddying of
the
waters from soil erosion is so bad that certain species of colorful
fish
are having difficulty distinguishing between mates of their
particular
species and mates of other species, so mating may essentially
occur
"at random" in terms of which species a fish mates with. Suppose
that 28
percent of the fish are of species A and that the remaining 72
percent
are of species B. 52 matings have been
observed, as follows:
TYPE OF
MATING AA BB AB
NUMBER 12 34 06
a.
Write out the expected counts for the number of: A-A matings, B-B
matings,
and A-B matings under the null hypothesis of random mating.
Probabilities
associated with AA, BB, AB mating are, in order: .0784,
.5184,
.4032. Multiplying each probability by 52 gives the expected
count
in each category: 4.08, 26.96, 20.97.
b. Calculate the residual for any ONE of the
three
categories
and interpret its value. The AA, BB, AB residuals are:
7.92,
7.04, -14.97. This interprets to: there are 7.92 more fish in the
AA
category than are expected under random mating; there are 7.04 more
fish in
the BB category than are expected under random mating; there are
14.97
fewer fish in the AB category than are expected under random
mating.
c. If we are testing the null hypothesis of
random mating at the
alpha=.05
level of significance, and the value of the observed test
statistic
is 27.9, what is the appropriate conclusion? Write down BOTH the
appropriate
tabled critical value and the P-value associated with your
test
statistic. The table critical value for a chisquare with 2 df, .05
level
of significance, is 5.991. Since 27.9 exceeds this critical value,
we can
reject the null hypothesis of random mating and conclude that
some
mating process other than random is going on. In fact, under the
random
mating hypothesis, the probability of seeing a test statistic as
large
as 27.9 is small, P<<.001.
d. From data inspection of observed and
expected values (no further
calculations
necessary!), what seems to be going on? This seems to be a
case of
"like pairs with like"--more AA, BB matings than expected, fewer
AB
matings than expected. So despite the muddying of the waters, A fish
and B
fish still seem to be able to seek out their own kind.
2.
Fruit flies are counted in the wild by putting out traps
with
yeast cultures (to lure the flies in with food). Five different
types
of yeast cultures, with different amounts of "Agent X"
were
used to investigate whether the different yeast cultures perform
similarly
as bait for fruit flies, or whether "Agent X" actually
makes a
difference as a lure. We do not know
the exact amount of
Agent X
in each of the cultures; we do know that we may consider
the
categories as being roughly equally spaced.
Level
of Agent X: None Low Moderate
Higher Highest
No. of
flies caught: 42 28
15
12 03
a. Test the null hypothesis that the level of
Attractant X has
no
effect upon the number of flies caught in the five yeast cultures.
(Level
of significance = 0.05.) Use the most
powerful test available,
the one
that is most likely to reject the null hpothesis when it is in
fact
false. What is your conclusion?
The
Kolmogorov-Smirnov [K-S] test takes into account the ordered nature of
the
categories [the chisquare goodness-of-fit test does not]. The
cumulative
OBSERVED counts are: 42,70,85,97,100. Cumulative EXPECTED
counts
are: 20,40,60,80,100. Max D_i is therefore |70-40|=30, which
exceeds
the .05 tabled K-S value of 11 for k=5,n=100. In fact, P<<.001 for
these
data. So we handily reject the null hypothesis of "no effect of
Agent X
Treatment upon # of flies caught"; some effect is indeed present.
the
number of flies
b. By data inspection (no further testing
here), what seems to be
the
relationship between the level of Agent X and the number of flies
caught?
There
are more than expected flies caught in the None and Low categories;
fewer
than expected caught in the Moderate,Higher,Highest categories. So
by
inspection, Agent X seems to be acting as a repellent.
3. Scientists in an Arctic expedition measured
the thermal dependence of
bacterial
enzyme rates in 4 different environments: permanent sea-ice, new
sea
ice, water-column, and sediment trap.
Enzymes were classified as
"Psychrophile"
if the enzyme maximal activity occurred below 15 degrees;
or
"Psychrotolerant" is the enzyme maximal activity occurred above 15
degrees.
Environment Result:
Psychrophile
Psychrotolerant Total samples
_____________________________________________________________________
Permanent sea-ice 3 1 4
1st year sea-ice 3 12 15
Water column 3 10 13
Sediment trap 10 14 24
Totals: 19 37 56
a. From the description above, write down the
appropriate null
hypothesis
(in specific statistical terms) that corresponds to "no
difference
among environments in terms of the distribution of maximal
enzyme
activity".
Denote
the 4 environments, in order, by the numbers 1, 2, 3, 4. Then
under
Ho, P1=P2=P3=P4, where P = Pr{psychrophile}. It would therefore
follow
that under Ho, Q1=Q2=Q3=Q4, where Q = Pr{psychrotolerant}.
b. If the null hypothesis is true, fill in the
expected counts
any 2
of 4 environments for the "Psychrophile" category.
For the
4 environments respectively, the expected counts are:
1.36,
5.09, 4.41, 8.14. [Use "row total x column total/N" formula]
c. If the value of the observed test statistic
is 5.63, state
the
statistical outcome AND the associated P-value that goes along with
the
observed test statistic.
Since
no level of significance is stated, use the conventional level of
.05.
The tabled value for a chisquare with 3 df is 7.815, so we DO NOT
reject
Ho of "no difference among the four environments in terms of
distribution
of maximal enzyme activity". And the P-value associated
with
our test statistic is 0.10 < P < 0.25 [note that the P-value exceeds
.05].
d. Write down the appropriate biological
interpretation from
the
results of the statistical test.
From
this data set, it does not appear that any of the four environments
has a
statistically significantly higher proportion of psychrophile (or
psychrotolerant)
enzyme maximal activity than the others. [Pooling the 4
environments,
the overall proportion of psychrophile is 19/56 = 0.34;
overall
proportion of psychrotolerant is 37/56 = 0.66].
4. A
normally distributed population of sardine lengths processed by
Ocean
Beauty Seafoods has a mean length of 4.54 inches with a standard
deviation
of 0.25 inches.
a. What
proportion of the sardines will be *greater than* 4.1 inches?
Take
the value 4.1 and standardize it to get z=-1.76. The righthand tail
probability
from Table B.2 is .0392; this is the prob. to the LEFT of
-1.76.
Therefore the prob. to the RIGHT of -1.76 is (1-.0392)=.9608.
Hence
96.1% of this population of sardine lengths will exceed
4.1
inches.
b. What
is the probability that a sardine drawn at random from this
population
will have a length less than 4.3 inches?
Take
the value 4.3 and standardize to get -.96. Table B.2 gives the
righthand
tail prob. for .96 as .1685, and this is the same as the
lefthand
tail prob. for -.96. Hence 16.85% of the population of sardine
lengths
will be shorter than 4.3 inches.
c. 90%
of the sardines will have a length less than what value?
Here we
are given the probability and must find that length associated
that
that probability. 1.28 is the z-value that cuts off .10 to the
right,
and therefore .90 to the left. Now "unstandardize" by taking
(z*sigma)
+ mu = (1.28*.25) + 4.54 = 4.86 inches. Indeed, if you start
with
the value of 4.86 and standardize, you get the z-value of 1.28.
d. What
values capture the middle 95% of the population?
In
standardized form, +/- 1.96 capture the middle 95% of a standard
normal population.
Now "unstandardize" again by taking (z*sigma) + mu.
-1.96
unstandardized is 4.05 inches; 1.96 unstandardized is 5.03 inches.
So 95%
of the population lengths occur between 4.05 and 5.03 inches.
e. If
50 sardines are drawn at random from this population, what is the
probability
that their *average length* will be less than 4.50 inches?
This is
a question about average length, x-bar, so when we standardize,
we'll
be using sigma/sqrt(n) in the denominator. So Z =
(4.50 -
4.54)/[.25/sqrt(50)] = -1.13. Using Table B.2, we deduce that the
prob.
to the left of -1.13 is .1292. So there is a 12.92%, about 13%,
that a
sample of size 50 would yield a sample average less than 4.50
inches.
f. If
we wanted to ensure that the standard error of the mean (Zar,
Section
6.3) for a random sample drawn from this population, was no
greater
than 5% of the population mean, how large a random sample would
we have
to take?
First,
figure out what 5% of the population mean is. That is .05*4.54 =
.227
inches. So, we want sigma/sqrt(n) to be no greater than .227
inches.
Sigma is 0.25 inches, so we want .25/sqrt(n) to be no greater
than
.227 inches. Solving for sqrt(n), we get sqrt(n) = .25/.227 = 1.1,
so n
has to be at least as large as (1.1)^2 = 1.21, so n only has to be
2 or
larger! And indeed, the standard error for n=2 is sigma/sqrt(2) =
.25/1.414 = .18, which indeed meets the stated criterion.