2. Information in families size 2 for phase-unknown backcross.
In families in which one parent has a dominant disease, and is
heterozygous for the marker, and the other parent is unaffected and
homozygous for the marker, we saw that there is a small amount of
information for linkage. Each family can be classified as one of two
types, one type (in which one kid is recombinant and the other not)
having probability 2r(1-r) where r is the recombination probability
between the trait locus and the marker.
(a) Suppose n families (i.e. pairs of kids)
are typed, and k families of the type having probability 2r(1-r) are
observed. Show that we can reject the hypothesis of linkage with a type I
error probability of approximately 0.025 if k < (n/2) - sqrt(n).
(b) Suppose the true value of r is 0.2. How many families must be typed
in order to have power 0.975 to detect linkage, using the test of (a)?
(c) What is the expected lod score (base 10)
per family, if the true value of r is 0.2? About how many families would
be needed to get a lod score of 3?
3. (based on Lange, Chapter 7, number 10)
Suppose in the same situation as above there is just one family but of
size n kids ( n at least 2). In this case the n kids can be divided into
two groups, one receiving AB or ab from the heterozygous
parent, and the other Ab or aB. One group is recombinant, and the other
not, but we do not know which is which. Let k be the number of kids in
the first group, and without loss of generality assume 2k is less than or
equal to n.
(a) Assuming nothing is known about any population association between the
alleles at the two loci, show that the likelihood is
(1/2) rk (1-r)n-k +
(1/2) (1-r)k rn-k
(b) Show the maximum likelihood estimate of r is 0 if k=0,
is 1/2 if (n-2k)2 is less than or equal to n, and is between
0 and 1/2 otherwise.
(Hint: Look at the derivative of the likelihood at r=0, and r=1/2. Look
at the second derivative to find whether the stationary point at r=1/2 is
a max or a min.)
(c) Suppose the true r is strictly between 0 and 1/2. As family
size n becomes very large (fruit flies maybe, not humans), show that this
case becomes equivalent to the phase-known backcross case.
4. Suppose two full sibs (mice, not humans) have an offspring.
Find a formula for the two-locus inbreeding coefficient of the offspring,
as a function of the recombination frequency r between the two loci.
5. Suppose we are trying to map the locus for a rare recessive trait, with
allele frequency 0.01. We have 20
affected individuals who are the offspring of
first-cousin marriages (f=1/16) and 20 who are the offspring of a
second-cousin marriages (f=1/64). By good luck, we have a marker so tightly
linked to the trait locus that the recombination frequency is essentially
0;
at this marker there are four alleles, each having frequency 0.25.
(a) What is the probability that an affected
offspring of a first cousin marriage
is ibd at the trait locus? And for the offspring of a second-cousin
marriage?
(b) What is the probability that an affected offspring of a first-cousin
marriage is homozygous at the marker locus? And for the offspring of a
second-cousin marriage?
(c) What is the probability that all 40 affected individuals are
homozygous at this marker locus?
(d) What is the expected lod score from this sample of 40 individuals?
Computing; for discussion only
1) Copy again the contents of the /user0/thompson/Class578C/Test
to your Test directory. (In particular the Makefile, and the autolink.c
programs have been updated.)
2) make ultraclean (always good practice to tidy up)
3) make autolink.out
The output file, called twoloc_out
gives two-locus inbreeding coefficients
for two different individuals in our standard jv-pedigree. One (531) is the
person at the bottom. The other is one of his parents -- remember these
parents are also offspring of a first-cousin marriage.
What do you notice, if anyting, about these two-locus innbreeding
coefficients?
4) make ultraclean (always good practice to tidy up)
5) make test_sim_ibd_link
This program simulates multilocus inbreeding in pedigrees. With the
current version there are three small component pedigrees: the jv
pedigree, the child of a first-cousin marriage, and the child of a
quadruple-half-first-cousin marriage. The multilocus ibd of the
individual at the bottom of the pedigree is scored.
The number of realizations is from that same file n_iter you
used before.
6) more out7 (out7 is the output file for this one)
The format for this output is a bit different, and not well formulated yet.
We have a binary outcome (in our case ibd (1) or non-ibd (0)) at each locus.
So over three loci, we have eight outcomes 000, 001, ..., 111. We give the
probabilities of these outcomes at overlapping trios of loci
-- locus 0,1,2, then 1,2,3, then 2,3,4, ....
The six loci in this example are 0 unlinked to 1, recomb 0.2 to 2,
unlinked to 3, recomb 0.1 to 4, recomb 0.1 to 5. So you have always
1 unlinked as a check, then a pair of linked, then a trio of linked.
7) make ultraclean