Homework 7: due Monday 12/6

1. (based on Lange, Chapter 7, number 7)
A couple, each of whom is unaffected, have two kids affected by a recessive disease. At a linked marker, the parents have four distinct alleles. One parent is ab, and the other is cd. Both the affected kids are bd at the marker, and the recombination probability between trait and marker is r. A fetus is typed as bc. Show that the probability the baby will be affected is
r(1-r)((1-r)4 + r(1-r) R + r4)/R2
where R = r2 + (1-r)2.

2. Information in families size 2 for phase-unknown backcross.
In families in which one parent has a dominant disease, and is heterozygous for the marker, and the other parent is unaffected and homozygous for the marker, we saw that there is a small amount of information for linkage. Each family can be classified as one of two types, one type (in which one kid is recombinant and the other not) having probability 2r(1-r) where r is the recombination probability between the trait locus and the marker.
(a) Suppose n families (i.e. pairs of kids) are typed, and k families of the type having probability 2r(1-r) are observed. Show that we can reject the hypothesis of linkage with a type I error probability of approximately 0.025 if k < (n/2) - sqrt(n).
(b) Suppose the true value of r is 0.2. How many families must be typed in order to have power 0.975 to detect linkage, using the test of (a)?
(c) What is the expected lod score (base 10) per family, if the true value of r is 0.2? About how many families would be needed to get a lod score of 3?

3. (based on Lange, Chapter 7, number 10)
Suppose in the same situation as above there is just one family but of size n kids ( n at least 2). In this case the n kids can be divided into two groups, one receiving AB or ab from the heterozygous parent, and the other Ab or aB. One group is recombinant, and the other not, but we do not know which is which. Let k be the number of kids in the first group, and without loss of generality assume 2k is less than or equal to n.
(a) Assuming nothing is known about any population association between the alleles at the two loci, show that the likelihood is
(1/2) rk (1-r)n-k + (1/2) (1-r)k rn-k
(b) Show the maximum likelihood estimate of r is 0 if k=0, is 1/2 if (n-2k)2 is less than or equal to n, and is between 0 and 1/2 otherwise.
(Hint: Look at the derivative of the likelihood at r=0, and r=1/2. Look at the second derivative to find whether the stationary point at r=1/2 is a max or a min.)
(c) Suppose the true r is strictly between 0 and 1/2. As family size n becomes very large (fruit flies maybe, not humans), show that this case becomes equivalent to the phase-known backcross case.

4. Suppose two full sibs (mice, not humans) have an offspring.
Find a formula for the two-locus inbreeding coefficient of the offspring, as a function of the recombination frequency r between the two loci.

5. Suppose we are trying to map the locus for a rare recessive trait, with allele frequency 0.01. We have 20 affected individuals who are the offspring of first-cousin marriages (f=1/16) and 20 who are the offspring of a second-cousin marriages (f=1/64). By good luck, we have a marker so tightly linked to the trait locus that the recombination frequency is essentially 0; at this marker there are four alleles, each having frequency 0.25.
(a) What is the probability that an affected offspring of a first cousin marriage is ibd at the trait locus? And for the offspring of a second-cousin marriage?
(b) What is the probability that an affected offspring of a first-cousin marriage is homozygous at the marker locus? And for the offspring of a second-cousin marriage?
(c) What is the probability that all 40 affected individuals are homozygous at this marker locus?
(d) What is the expected lod score from this sample of 40 individuals?

Computing; for discussion only
1) Copy again the contents of the /user0/thompson/Class578C/Test to your Test directory. (In particular the Makefile, and the autolink.c programs have been updated.)
2) make ultraclean (always good practice to tidy up)

3) make autolink.out
The output file, called twoloc_out gives two-locus inbreeding coefficients for two different individuals in our standard jv-pedigree. One (531) is the person at the bottom. The other is one of his parents -- remember these parents are also offspring of a first-cousin marriage.
What do you notice, if anyting, about these two-locus innbreeding coefficients?

4) make ultraclean (always good practice to tidy up)

5) make test_sim_ibd_link
This program simulates multilocus inbreeding in pedigrees. With the current version there are three small component pedigrees: the jv pedigree, the child of a first-cousin marriage, and the child of a quadruple-half-first-cousin marriage. The multilocus ibd of the individual at the bottom of the pedigree is scored.
The number of realizations is from that same file n_iter you used before.
6) more out7 (out7 is the output file for this one)
The format for this output is a bit different, and not well formulated yet. We have a binary outcome (in our case ibd (1) or non-ibd (0)) at each locus. So over three loci, we have eight outcomes 000, 001, ..., 111. We give the probabilities of these outcomes at overlapping trios of loci -- locus 0,1,2, then 1,2,3, then 2,3,4, ....
The six loci in this example are 0 unlinked to 1, recomb 0.2 to 2, unlinked to 3, recomb 0.1 to 4, recomb 0.1 to 5. So you have always 1 unlinked as a check, then a pair of linked, then a trio of linked. 7) make ultraclean