STAT/BIOST 550; Homework 6:

STAT 550 (DL): Homework 6

1. (based on Crow and Kimura, Ch 4, #9)
This makes exact the notion due to Wright that gene identity by descent leads to correlations between relatives.
Consider a particular allele A with allele frequency q, and define indicator random variables I(g), for a gene g, where I(g)=1 if the allelic type of gene g is A, and 0 otherwise.
(a) Show E(I(g)) = q, and var(I(g)) = q(1-q)
(b) Show that for genes g1 and g2 segregating from B and from C, the correlation between I(g1) and I(g2)) is the kinship coefficient between B and C.
(c) If g1 and g2 are the two genes in an individual, shown that the variance of (I(g1)+I(g2)) is 2q(1-q)(1+f), where f is the inbreeding coefficient of the individual.
(d) If g1 and g2 are the genes in a parent B, and g2 and g3 are the genes in his child C (that is, the g2 gene is the one inherited by C from B), show that the correlation between (I(g1)+I(g2)) and (I(g2)+I(g3)) is(1+2f_C+f_B)/(2((1+f_C)(1+f_B))^1/2) where f_B is the inbreeding coefficient of B, and f_C is the inbreeding coefficient of C.

2. This question relates to the same study you saw in Homework-4, by Dr. Arno Motulsky and coworkers, and published in Thompson et al. (1988; Am.J.Hum.Genet, 42, 113-124). There were three population samples (all from around Seattle), (Caucasian, African American, and Japanese American), and three tightly linked diallelic loci, designated M, P and S.
In the Japanese subsample, the 68 individuals (136 chromsomes) were scored from all three loci. As in the Homework-4 question, there were few double-heterozygotes for any pair of the loci. So you can treat the estimated sample haplotype frequencies as though they were observed.
(a) For loci S and M, the estimated haplotype frequencies are 0.551, 0.082, 0.008, 0.360.
Is there evidence for disequilibrium between loci S and M?
(b) For loci P and M, the estimated haplotype frequencies are 0.514, 0.398, 0.045, 0.043.
Is there evidence for disequilibrium between loci P and M?
(c) For loci S and P, the estimated haplotype frequencies are 0.592, 0.041, 0.320 0.047.
Is there evidence for disequilibrium between loci S and P?
(You can use a 2-by-2 contingency table chi-squared to test for significantly non-zero associations.)
(The interesting fact is that all three loci are known to be very tightly linked, and locus P is between the other two.)

3. In a simple backcross experiment between two inbred lines, hybrid AB/ab individuals are crossed with ab/ab individuals, and the numbers of recombinant offspring are counted.
Among a total of 120 offspring in which the hybrid individual was female, 50 were of the recombinant types.
Among 90 offspring in which the hybrid individual was male, 38 were of the recombinant types.
(a) Is there evidence for linkage, using only the data on the offspring of hybrid males?
(b) Is there evidence for linkage, using only the data on the offspring of hybrid females?
(c) Assuming male and female recombinantion frequencies are the same, is there evidence for linkage?
(d) Is there evidence from this experiment that male and female recombination frequencies differ?

4. Suppose we sample families with two offspring, in which one parent is heterozygous for a dominant genetic disease allele, and marker type ab, while the unaffected spouse has marker type aa. The familes are divided into two groups:
Group 1: the two offspring share the same disease status, or the same marker type, but not both
Group II: all other combinations.
Suppose the recombination frequency between the disease locus and the marker locus is r.

(a) Show that the probability a given family is of Group-I type is s = 2 r (1-r).
(b) Suppose n families are sampled, and k are of Group-I type. Show that the log-likelihood is, up to an additive constant,
k log s + (n-k) log (1-s) or k log(2 r (1-r )) + (n-k) log( 1 - 2 r (1-r ))
(c) Show that, provided k is not bigger than n/2 the maximum likelihood estimate of r
is (1 - sqrt(1 - 2k/n))/2 .
(d) It can be shown that the Fisher information for estimating r is ( 2 n (1-2 r)² )/ (r (1-r) (1 - 2 r (1-r)) )
You do not need to show this.
What does this tell you about estimating r, when in fact r is close to 1/2 ?