1. (based on Crow and Kimura, Ch 4, #9)
This makes exact the notion due to Wright that gene identity by
descent leads to correlations between relatives.
Consider a particular allele A with allele frequency q, and define
indicator random variables I(g), for a gene g, where I(g)=1
if the allelic type of gene g is A, and 0 otherwise.
(a) Show E(I(g)) = q, and var(I(g)) = q(1-q)
(b) Show that for genes g1 and g2 segregating from B and from C,
the correlation between I(g1) and I(g2)) is the kinship
coefficient between B and C.
(c) If g1 and g2 are the two genes in an individual, shown that
the variance of (I(g1)+I(g2)) is 2q(1-q)(1+f), where f is the
inbreeding coefficient of the individual.
(d) If g1 and g2 are the genes in a parent B, and g2 and g3 are the
genes in his child C (that is, the g2 gene is the one inherited
by C from B), show that the correlation between (I(g1)+I(g2)) and
(I(g2)+I(g3)) is(1+2fC+fB)/(2((1+fC)(1+fB))1/2)
where fB is the inbreeding coefficient of B, and fC
is the inbreeding coefficient of C.
2. This question relates to the same
study you saw in Homework-4, by Dr. Arno Motulsky and coworkers, and
published in Thompson et al. (1988; Am.J.Hum.Genet, 42, 113-124).
There were three population samples (all from around Seattle),
(Caucasian, African American, and
Japanese American), and three tightly linked diallelic loci,
designated M, P and S.
In the Japanese subsample, the 68 individuals (136 chromsomes)
were scored from all three loci. As in the Homework-4 question, there were few
double-heterozygotes for any pair of the loci. So you can treat
the estimated sample haplotype frequencies as though they were observed.
(a) For loci S and M, the estimated haplotype frequencies are
0.551, 0.082, 0.008, 0.360.
Is there evidence for disequilibrium between loci S and M?
(b) For loci P and M, the estimated haplotype frequencies are
0.514, 0.398, 0.045, 0.043.
Is there evidence for disequilibrium between loci P and M?
(c)
For loci S and P, the estimated haplotype frequencies are
0.592, 0.041, 0.320 0.047.
Is there evidence for disequilibrium between loci S and P?
(You can use a 2-by-2 contingency table chi-squared to test for significantly
non-zero associations.)
(The interesting fact is that all three loci are known to be very tightly
linked, and locus P is between the other two.)
3. In a simple backcross experiment between two inbred lines, hybrid
AB/ab individuals are crossed with ab/ab individuals, and the numbers of
recombinant offspring are counted.
Among a total of
120 offspring in which the hybrid individual was female, 50 were of
the recombinant types.
Among 90 offspring in which the hybrid individual was male, 38 were of
the recombinant types.
(a) Is there evidence for linkage, using only the data on the offspring of
hybrid males?
(b) Is there evidence for linkage, using only the data on the offspring of
hybrid females?
(c) Assuming male and female recombinantion frequencies are the same,
is there evidence for linkage?
(d) Is there evidence from this experiment that male and female
recombination frequencies differ?
4. Suppose we sample families with two offspring, in which
one parent is heterozygous for a dominant genetic disease allele,
and marker type ab,
while the unaffected spouse has marker type aa. The familes are
divided into two groups:
Group 1: the two offspring
share the same disease status, or the same marker type,
but not both
Group II: all other combinations.
Suppose the recombination frequency between the disease locus and the
marker locus is r.
(a) Show that the probability a given family is of Group-I type is
s = 2 r (1-r).
(b) Suppose n families are sampled, and k are of Group-I type.
Show that the log-likelihood is, up to an additive
constant,
k log s + (n-k) log (1-s) or
k log(2 r (1-r )) + (n-k) log( 1 - 2 r (1-r ))
(c) Show that, provided k is not bigger than n/2
the maximum likelihood estimate
of r
is (1 - sqrt(1 - 2k/n))/2 .
(d) It can be shown that the Fisher information for estimating r is
( 2 n (1-2 r)2 )/
(r (1-r) (1 - 2 r (1-r)) )
You do not need to show this.
What does this tell you about estimating r, when in fact r
is close to 1/2 ?