QSCI 482, Fall 2001 ANSWERS TO REVIEW QUESTIONS FOR EXAM 1

1. Scientists have noticed that in certain tropical locations, muddying of

the waters from soil erosion is so bad that certain species of colorful

fish are having difficulty distinguishing between mates of their

particular species and mates of other species, so mating may essentially

occur "at random" in terms of which species a fish mates with. Suppose

that 28 percent of the fish are of species A and that the remaining 72

percent are of species B. 52 matings have been observed, as follows:

TYPE OF MATING AA BB AB

NUMBER 12 34 06

a. Write out the expected counts for the number of: A-A matings, B-B

matings, and A-B matings under the null hypothesis of random mating.

Probabilities associated with AA, BB, AB mating are, in order: .0784,

.5184, .4032. Multiplying each probability by 52 gives the expected

count in each category: 4.08, 26.96, 20.97.

b. Calculate the residual for any ONE of the three

categories and interpret its value. The AA, BB, AB residuals are:

7.92, 7.04, -14.97. This interprets to: there are 7.92 more fish in the

AA category than are expected under random mating; there are 7.04 more

fish in the BB category than are expected under random mating; there are

14.97 fewer fish in the AB category than are expected under random

mating.

c. If we are testing the null hypothesis of random mating at the

alpha=.05 level of significance, and the value of the observed test

statistic is 27.9, what is the appropriate conclusion? Write down BOTH the

appropriate tabled critical value and the P-value associated with your

test statistic. The table critical value for a chisquare with 2 df, .05

level of significance, is 5.991. Since 27.9 exceeds this critical value,

we can reject the null hypothesis of random mating and conclude that

some mating process other than random is going on. In fact, under the

random mating hypothesis, the probability of seeing a test statistic as

large as 27.9 is small, P<<.001.

d. From data inspection of observed and expected values (no further

calculations necessary!), what seems to be going on? This seems to be a

case of "like pairs with like"--more AA, BB matings than expected, fewer

AB matings than expected. So despite the muddying of the waters, A fish

and B fish still seem to be able to seek out their own kind.

2. Fruit flies are counted in the wild by putting out traps

with yeast cultures (to lure the flies in with food). Five different

types of yeast cultures, with different amounts of "Agent X"

were used to investigate whether the different yeast cultures perform

similarly as bait for fruit flies, or whether "Agent X" actually

makes a difference as a lure. We do not know the exact amount of

Agent X in each of the cultures; we do know that we may consider

the categories as being roughly equally spaced.

Level of Agent X: None Low Moderate Higher Highest

No. of flies caught: 42 28 15 12 03

a. Test the null hypothesis that the level of Attractant X has

no effect upon the number of flies caught in the five yeast cultures.

(Level of significance = 0.05.) Use the most powerful test available,

the one that is most likely to reject the null hpothesis when it is in

fact false. What is your conclusion?

The Kolmogorov-Smirnov [K-S] test takes into account the ordered nature of

the categories [the chisquare goodness-of-fit test does not]. The

cumulative OBSERVED counts are: 42,70,85,97,100. Cumulative EXPECTED

counts are: 20,40,60,80,100. Max D_i is therefore |70-40|=30, which

exceeds the .05 tabled K-S value of 11 for k=5,n=100. In fact, P<<.001 for

these data. So we handily reject the null hypothesis of "no effect of

Agent X Treatment upon # of flies caught"; some effect is indeed present.

the number of flies

b. By data inspection (no further testing here), what seems to be

the relationship between the level of Agent X and the number of flies

caught?

There are more than expected flies caught in the None and Low categories;

fewer than expected caught in the Moderate,Higher,Highest categories. So

by inspection, Agent X seems to be acting as a repellent.

3. Scientists in an Arctic expedition measured the thermal dependence of

bacterial enzyme rates in 4 different environments: permanent sea-ice, new

sea ice, water-column, and sediment trap. Enzymes were classified as

"Psychrophile" if the enzyme maximal activity occurred below 15 degrees;

or "Psychrotolerant" is the enzyme maximal activity occurred above 15

degrees.

Environment Result:

Psychrophile Psychrotolerant Total samples

_____________________________________________________________________

Permanent sea-ice 3 1 4

1st year sea-ice 3 12 15

Water column 3 10 13

Sediment trap 10 14 24

Totals: 19 37 56

a. From the description above, write down the appropriate null

hypothesis (in specific statistical terms) that corresponds to "no

difference among environments in terms of the distribution of maximal

enzyme activity".

Denote the 4 environments, in order, by the numbers 1, 2, 3, 4. Then

under Ho, P1=P2=P3=P4, where P = Pr{psychrophile}. It would therefore

follow that under Ho, Q1=Q2=Q3=Q4, where Q = Pr{psychrotolerant}.

b. If the null hypothesis is true, fill in the expected counts

any 2 of 4 environments for the "Psychrophile" category.

For the 4 environments respectively, the expected counts are:

1.36, 5.09, 4.41, 8.14. [Use "row total x column total/N" formula]

c. If the value of the observed test statistic is 5.63, state

the statistical outcome AND the associated P-value that goes along with

the observed test statistic.

Since no level of significance is stated, use the conventional level of

.05. The tabled value for a chisquare with 3 df is 7.815, so we DO NOT

reject Ho of "no difference among the four environments in terms of

distribution of maximal enzyme activity". And the P-value associated

with our test statistic is 0.10 < P < 0.25 [note that the P-value exceeds

.05].

d. Write down the appropriate biological interpretation from

the results of the statistical test.

From this data set, it does not appear that any of the four environments

has a statistically significantly higher proportion of psychrophile (or

psychrotolerant) enzyme maximal activity than the others. [Pooling the 4

environments, the overall proportion of psychrophile is 19/56 = 0.34;

overall proportion of psychrotolerant is 37/56 = 0.66].

4. A normally distributed population of sardine lengths processed by

Ocean Beauty Seafoods has a mean length of 4.54 inches with a standard

deviation of 0.25 inches.

a. What proportion of the sardines will be *greater than* 4.1 inches?

Take the value 4.1 and standardize it to get z=-1.76. The righthand tail

probability from Table B.2 is .0392; this is the prob. to the LEFT of

-1.76. Therefore the prob. to the RIGHT of -1.76 is (1-.0392)=.9608.

Hence 96.1% of this population of sardine lengths will exceed

4.1 inches.

b. What is the probability that a sardine drawn at random from this

population will have a length less than 4.3 inches?

Take the value 4.3 and standardize to get -.96. Table B.2 gives the

righthand tail prob. for .96 as .1685, and this is the same as the

lefthand tail prob. for -.96. Hence 16.85% of the population of sardine

lengths will be shorter than 4.3 inches.

c. 90% of the sardines will have a length less than what value?

Here we are given the probability and must find that length associated

that that probability. 1.28 is the z-value that cuts off .10 to the

right, and therefore .90 to the left. Now "unstandardize" by taking

(z*sigma) + mu = (1.28*.25) + 4.54 = 4.86 inches. Indeed, if you start

with the value of 4.86 and standardize, you get the z-value of 1.28.

d. What values capture the middle 95% of the population?

In standardized form, +/- 1.96 capture the middle 95% of a standard

normal population. Now "unstandardize" again by taking (z*sigma) + mu.

-1.96 unstandardized is 4.05 inches; 1.96 unstandardized is 5.03 inches.

So 95% of the population lengths occur between 4.05 and 5.03 inches.

e. If 50 sardines are drawn at random from this population, what is the

probability that their *average length* will be less than 4.50 inches?

This is a question about average length, x-bar, so when we standardize,

we'll be using sigma/sqrt(n) in the denominator. So Z =

(4.50 - 4.54)/[.25/sqrt(50)] = -1.13. Using Table B.2, we deduce that the

prob. to the left of -1.13 is .1292. So there is a 12.92%, about 13%,

that a sample of size 50 would yield a sample average less than 4.50

inches.

f. If we wanted to ensure that the standard error of the mean (Zar,

Section 6.3) for a random sample drawn from this population, was no

greater than 5% of the population mean, how large a random sample would

we have to take?

First, figure out what 5% of the population mean is. That is .05*4.54 =

.227 inches. So, we want sigma/sqrt(n) to be no greater than .227

inches. Sigma is 0.25 inches, so we want .25/sqrt(n) to be no greater

than .227 inches. Solving for sqrt(n), we get sqrt(n) = .25/.227 = 1.1,

so n has to be at least as large as (1.1)^2 = 1.21, so n only has to be

2 or larger! And indeed, the standard error for n=2 is sigma/sqrt(2) =

.25/1.414 = .18, which indeed meets the stated criterion.