lab 5; 2012

Stat550: Lab 5: MORGAN program lm_linkage

Finally we have another MORGAN program, which uses MCMC to estimate lod scores across a chromosome of markers. This is the program lm_linkage, and we will use essentially the same example used for lm_auto in Lab-4. You can find out more about lm_linkage in the first part of Chapter 11 of the Tutorial.

Download the following four files, to the IBD subdirectory of your MORGAN_Examples directory.

The pedigree file ped30,
the first lm_linkage parameter file lm_link1.par,
the second lm_linkage parameter file lm_link2.par,
and, as before, a seed file sampler.seed.

You do not have to keep the same file names when you download them, but those are the names I will use in describing them, so you will find it easier if you do. For information about how the seed file works, see Lab-4.

The Pedigree file: ped30

Since we are computing linkage lod scores, we require both trait and marker data, but the first pedigree of the ped45 file had no trait data, so we now have ped30 consisting only of the second two components. The trait and marker data on these components are exactly as before.

The parameter files: lm_link1.par and lm_link2.par

The file lm_link1.par is almost exactly the same as the lm_auto.par of Lab-4. The only differences are that the proband gamete statements have been removed, and instead we have two statements that specify the locations for lod score computation:
map test tloc 11 all interval proportions 0.2 0.5 0.8
map test tloc 11 external recomb frac 0.05 0.2 0.4
This says that the lod scores will be computed at 3 points in each marker interval, at 20%, 50% and 80% of the interval, and also at 3 points at the end, at recombination fractions 0.05, 0.2 and 0.4 before the first marker and after the last.

The only difference between the two parameter files is that in lm_link2.par the marker allele frequenciess have been changed, so that the C allele (allele "3") which may be segregating with the recessive disease allele in this family, now has a higher frequency (and the other marker alleles correspondingly lower, since they must sum to 1). All markrs now have the same allele frequencies.

A: Running the lm_linkage program:

Run lm_linkage twice, by typing % lm_linkage lm_link1.par > link1.out
and
% lm_linkage lm_link2.par > link2.out
This may give you some warning messages to the screen -- maybe about seeds, but will send the main outputs to link1.out and link2.out. It generates quite a bit of output, so it is probably easier to look at it in a file.

Now look at the output. A lot of MORGAN output is involved in it telling you what it understood you to tell it to do. This can be a bit tedious, but is well worth checking!! Most of the silly errors we have made in running MORGAN were because there was something a little bit wrong with a parameter file, and it did what the parameter file said.

The initial output is now as before, except that in addition to the marker map it prints additional maps showing the locations ("T") at which lod scores will be computed. All interval proportions and reconbination fractions are converted to centiMorgans.

Then it summarizes all the other input just as in Lab-4, and finally it runs its MCMC and prints out a table of the estimated lod scores. The table gives the cM positions (setting the first marker as the origin, since we did not tell it otherwise), and then the base-10 lod score (which is the key output), and finally an estimate Monte Carlo standard error (which you need not worry about).

Note that lod scores are not computed at the marker locations. This is because our trait is genotypic, and so it is possible to generate marler inheritances that are inconsistent with the trait. If we were using a continuous trait, or ny trait where each phenotype can be any genotype (with varying probabilities), then we would get lod scores also at the marker locations.

Question 1 Compare the lod scores between he two component pedigrees, in each of your two output files. Recall the only difference between the two compoents is that in the second we assume we know that certain grandparents of the affected child carry the recessve disease allele.
What is the effect on the lod scores?
How does this relate to the results you found for this same data change in lm_auto in Lab-4?

Question 2 Compare the lod scores in your two separate output fles. Recall the only difference between these was a change in the allele frequencies at the markers.
What is the effect on the lod scores?
Explain this change, qualitatively, given that the marker allele associated with the disease in this family now has a higher frequency.

Write a brief paragraph giving your answers to these questions, and/or summarizing anything else you see interesting in the results.