For the last lab we have another MORGAN program, which uses MCMC to estimate lod scores across a chromosome of markers. This is the program lm_linkage. You can find out more about lm_linkage in the first part of Chapter 11 of the MORGAN Tutorial.
Note: The required parameter and pedigree files to run the examples below have been added to /datafiles/Lab4_auto on statgen.stat.washington.edu.
We will use the same example used
for lm_auto in Lab-4. If you were unlucky and
(i) Your affected inbred individual was not homozygous at at least one of
marker-2 and marker-3, and/or
(ii) Your pair of affected bilateral relatives did not have the same genotype
at at least one of marker 2 and marker 3
you may want to rerun markerdrop to see if you can have better luck.
It may take several tries-- you are basically hoping to get data where your
affected individuals are aa.
However, the frequency of aa in the population is
0.0001, and most people are AA at the trait locus.
The aa people
are almost certainly affected, but the AA people are affected with
probability 0.0001.
The inbreeding and relatedness of the affected individuals increases our
chances of finding them aa, but ...
Recall, I was lucky, with marker data
I will again show the lab by working through my own example.
To run lm_linkage you first need the imput files you had before:
This file
is almost exactly the same as the
lm_auto_3.par of Lab-4. The only differences are that I have removed
redundent statements about trait-1 and trait-2 (we will use only the full
trait data, trait-3) and that
proband gamete statements have been replaced by
statements that specify the
locations for lod score computation:
map test tloc 33 all interval proportions 0.2 0.5 0.8
map test tloc 33 external recomb frac 0.05 0.2 0.4
This says that the lod scores will be computed at 3 points in each
marker interval, at 20%, 50% and 80% of the interval, and also
at 3 points at the end, at recombination fractions 0.05, 0.2 and 0.4
before the first marker and after the last.
Now look at the output link1.out. A lot of MORGAN output is involved in it telling you what it understood you to tell it to do. This can be a bit tedious, but is well worth checking!! Most of the silly errors we have made in running MORGAN were because there was something a little bit wrong with a parameter file, and it did what the parameter file said.
The initial output is now as before, except that in addition to the marker map it prints additional maps showing the locations ("T") at which lod scores will be computed. All interval proportions and reconbination fractions are converted to centiMorgans.
Then it summarizes all the other input just as in Lab-4, and finally it runs its MCMC and prints out a table of the estimated lod scores. The table gives the cM positions (setting the first marker as the origin, since we did not tell it otherwise), and then the base-10 lod score (which is the key output), and finally an estimate Monte Carlo standard error (which you need not worry about-- except to note it should be quite small).
Note that it fnds quite a good signal (although on this one pedigree we would not get a lod score of 3!). It does not pinpoint the trait, and this is characteristic -- with few meioses, there are few recombinants that can differentiate exactly which loci the DNA of the trait appears to be segregating with. But we do get a lod score close to 1.5 from marker-2 to beyond marker-4. The highest lod score (1.81) is at marker-4.
I ran lm_linkage, by typing
% lm_linkage lm_link_1r.par > link1r.out
and looked at the output
link1r.out.
Note that the final lod scores is reduced -- it still manages a lod score of 1
at markers 2 and 3, but barely.
To examine the effect of this, I changed the marker allele frequencies
in the parameter file
lm_link_2r.par. The only change from
lm_link_1r.par s in these marker allele frequencies which are now:
set markers 1 4 5 allele freqs 0.4 0.3 0.2 0.1
set markers 2 allele freqs 0.4 0.05 0.25 0.3
set markers 3 allele freqs 0.05 0.3 0.25 0.4
That is, I have made the allele for which fred is homozygous
(allele 2 at marker-2, allele 1 at marker-3) relatively rare, and adjusted the
other frequencies so the sum is still 1.
I ran lm_linkage, by typing
% lm_linkage lm_link_2r.par > link2r.out
and looked at the output
link2r.out.
My lod scores in the region marker-2 to marker-3 are increased.
Why should I have expected this?
Note: This is important. Many of the false-positive linkage findings in the literature, especially in the case of marker data available only on affected individuals, have been due to assuming that the associated marker alleles are rare when they are not. This can happen when marker allele frequencies in the population of interest are not well studied, so that data-base frequencies from a different population are used instead.
Note: Please remember to tell me who is your inbred affected individual and who are your affected bilateral relatives, as well as any important information about them that influences the lod score.
Comment as specifically as you can on
(1) The lod scores you obtain in the first run,
(2) The reasons for any observed changes in the lod score when you
only have marker data on your affected indiviuals, and
(3) The reasons for any observed changes in the lod score when you
change the marker allele frequencies of the alleles carried by your
affected individuals at the flanking markers marker-2 and marker-3.