Homework 3: due 10/22

1. (Weir Exercise 2.2: data of S.Ohba, discussed by Yasuda and Kimura 1968) In a system with 4 alleles, O is recessive to S, M and F, which are codominant. In a sample of 1786 individuals, 1149 are SS or SO, 36 are MM or MO, 17 are FF or FO, 20 are OO, 336 are SM, 25 are MF, 203 are SF.
(a) Use the EM algorithm to estimate the frequencies of the four alleles. (Assume Hardy-Weinberg equilibrium)
(b) Explain how you would estimate the variances and convariances of the estimators, assuming Hardy-Weinberg equilibrium. (Use the theory about MLE's, but don't try to do it explicitly -- it is very messy)

2. (Example from Crow and Kimura, P.58) There are two plausible hypotheses for the much greater incidence of premature baldness in males than in females: (1) An autosomal dominant trait usually expressed only in males (known as a sex-limited trait), and (2) an X-linked recessive. In the following, ``bald'' refers to this genetic premature baldness.

Assuming Hardy-Weinberg equilibrium, and that the frequency of the allele for premature baldness is q,
(a) Under hypothesis (1), what proportion of the sons of bald fathers are expected to be bald?
What proportion of the sons of non-bald fathers are expected to be bald?
(b) Under hypothesis (2), what proportion of the sons of bald fathers are expected to be bald?
What proportion of the sons of non-bald fathers are expected to be bald?
(c) Show that the probability that the father of a bald son is also bald, is the same as the probability that the son of a bald father is also bald. (This is useful, because it is easier to get data by selecting bald men and asking about their fathers, than waiting for their sons to grow up.)
(d) Harris (1946) found that 13.3% of males in a British sample were prematurely bald. He also found that of 100 bald men, 56 had bald fathers. Show this is consistent with hypothesis (1) but not (2).

3. (From Crow and Kimura, P. 41-43) Waaler (1927) obtained the following data on red/green color-blindness among school children in Oslo. Out of 9049 boys, 725 were color-blind. Out of 9072 girls, 40 were color-blind. Assume Hardy-Weinberg equilibrium, and that the frequency of the allele for X-linked recessive color-blindness is p
(a) Show that the log-likelihood for p is
725 log p + 8324 log (1-p) + 40 log (p2) + 9032 log(1-p2)
= 805 log p + 17356 log (1-p) + 9032 log (1+p)
Hence verify that the MLE of p is 0.0772
(You can differentiate the log-likelihood, and solve a quadratic equation, is probably easiest way.)
(b) Hence test whether Waaler's data are in acordance with the hypothesis of a simple X-linked recessive.

An easier method of estimation, which is usually not bad, is to use the frequency in males, and then test the trait frequency in females using this allele frequency. For the rest of this question, use this simpler method.

Now in fact there are 4 kinds of red/green color-blindness (1) protanopia (red-blind), (2) protanomaly (partial red-blind), (3) deuteranopia (green-blind) and (4) deuteranomaly (partial green-blind). Let the frequencies of the four types, in males, be p1, p2, p3 and p4.
Now, in females, the capacities for red and green vision are complementary; that is a female with one red-deficient allele and one green-deficient allele has normal vision. (There is evidence for this in a woman with normal vision who had sons some with red-deficient vision, and some with green-deficient vision.) However, the deuteranomaly allele is dominant to deuteranopia, and protanomaly is dominant to protanopia. Now

(c) Show the frequencies of the four vision deficiencies in females should be p12, p22+2p1p2, p32 and p42+2p3p4, respectively.
(d) Of the 725 color-vision deficient boys in Waaler's sample, he classified 80, 94, 93, and 458 of types (1), (2), (3) and (4) respectively. Of the 40 color-vision deficient girls, he classified 0, 3, 1, 36 of types (1), (2), (3) and (4) respectively. Are these observed proportions in accordance with the above model for the four color-blindness types?

4. (Example, due to Weir.) Suppose a population consists of a mixture of two randomly-mating subpopulations of equal size. Consider a locus with three codominant alleles. In one subpopulation, the three alleles have frequencies p=0.6, q=0.3,and r=0.1, while in the other subpopulation they are p=0.4, q=0.1, r=0.5. Which genotypes have a smaller frequency than they would have in a single random-mating population with the same overall allele frequencies?