Stat 550 (DL): Lab 5: Computing lod scores on pedigrees
In 2006, I switched this lab from Genehunter to Merlin. Both programs, among other things, calculate LOD scores on small pedigrees when there are data at multiple marker loci, but Merlin is better software, better algorithms, more flexible, and better supported at this time.
I installed an updated version of Merlin on the biostat computers, in 2006, and again more recently. Hopefuly all the links and commands are updated, but if there is anything not right, please email me.
A disadvantage of Merlin is that it requires quite a few
input files. For the examples for this lab I have put the data files
in the subdirectory
merlin_2006. I have also made
a new
description of these files, with links to the data files and to
a description of the problem of
genetic mapping of the Werner's Syndrome disease gene.
For the first part of this lab, we will be exploring how misspecified allele
frequencies can lead to spurious evidence for linkage. You should read the
contents of the
homoApoB.dat file. It explains the situation and
summarizes the data we will be using.
In short, our data
consists of pedigree and marker information on nineteen inbred Japanese
families. When this data set was originally analyzed, the researchers used
standard Caucasian allele frequencies in their model. Incorrect allele
frequencies caused the analysis to indicate that the locus responsible for
Werner's syndrome was located on chromosome 2 near the ApoB marker.
Since that
time, the true locus has been found on chromosome 8. In the first part of this
lab, we will look at how different choices for marker allele frequencies
changes the outcome of a LOD score linkage analysis.
Copy the files from the data subdirectory
merlin_2006 into your abacus account.
Just for reference, here is a list of the (major) allele frequencies in the
three groups:
Marker allele 1 2 3 4 5 6
Standard Caucasian 0.2190 0.0776
0.4040 0.0500 0.0756
0.0378
UW Caucasian sample 0.2000 0.0333
0.4667 0.0500 0.0833
0.0667
UW Japanese sample 0.6382 0.0263
0.1250 0.1184 0.0263
0.0132
To run merlin on these data, using the first of the three
allele frequency files,
type:
% merlin -d apob3_merlin.dat -p apob3_merlin.ped -m apob3_merlin.mp -f apob3_ca.freq --model ws_merlin.model --step 20
You probably want to cut-and-paste this lengthy command! Your browser has
probably split the line, but make sure you enter it as a single command --
all on one line. Also remember we do not type the ``%''.
You will get a warning message about allele frequencies not adding to 1,
but you can ignore that.
Caution: As I just learned the hard way, if you misname your
allele frequency file it will just go ahead and estimate its own
frequencies from the data. Then, of course, you get the same lod
scores every time. So, do not totally ignore the Merlin's messages!
You should
finally get a table of lod scores. There are 4 columns: the first two
are the position and the lod score for that position. This lod score
is summed over all the pedigrees. The final two columns relate to
heterogeneity among pedigrees. If the ALPHA column is less than 1,
it indicates that the pedigrees are giving strongly conflicting information,
and the final HLOD column says what the LOD score would be if we actually
model that only a fraction ALPHA of the pedigrees are linked and the rest not.
Run merlin twice more, using the UW Caucasian frequencies
apob3_uw.freq and using the Japanese frequencies apob3_jp.freq.
Here are some questions for you to answer (and turn in):
1. Families 1, 3, 6, 7, 8, 9, 11, 15, and 17 produce
negative LOD scores while families 2, 4, 5, 10, 12,13, 14, and 16 produce
positive LOD scores. Why?
2. Families 2 and 4 share the same pedigree structure, but
family 2 produces LOD scores that are considerably lower than for family 4.
Why?
3. From the total lods scores for your 3 merlin
runs, the two using the Caucasian allele
frequencies should indicate some evidence for linkage near the ApoB marker
(LOD scores about 2.0).
Using the Japanese allele frequencies, there is no longer much support for
linkage in the region. Why?
In this part of this lab, we will reproduce the analysis that finally
located the disease locus on chromosome 8.
We have the same pedigrees as above, but now have data
on the
marker types at 13 markers along chromosome 8.
As described in the
description of these files the ones you need for this part are
To run merlin using ws_ca.freq
type:
% merlin -d ws_merlin.dat -p ws_merlin.ped -m ws_merlin.mp -f ws_ca.freq --model ws_merlin.model --grid 1 --pdf
and similarly with ws_jp.freq.
This time, instead of the --step
option (you can use "--step 3" for example,
if you prefer), I have used --grid 1 --pdf.
This means it will compute lod scores at 1cM intervals across the chromosome
of the markers, and will produce a PDF file with a plot of the lod score.
It will call this file merlin.pdf, so you probably want to rename it
before you do the second run, or it will overwrite it.
(Use % mv merlin.pdf merlin1.pdf, for example.)
Here are some questions for you to answer (and turn in):
4. Compare the lod score curves from the two analyses. You should
see that the LOD scores are fairly similar. Does this mean that the Japanese
and Caucasian allele frequencies are similar for the markers along this
chromosome? Is it possible to have some of the allele frequencies wrong, but
still end up with reasonably correct LOD scores? Comment.
We have seen that misspecified marker allele frequencies can
make the LOD score artificially high. Can misspecified marker allele
frequencies make the LOD score lower than it should be?Part 1: The ApoB marker
As described in the
description of these files the ones you need for this part are
You can see here we are mostly just specifying the data files, "-d" for the
data specification, "-p" for the actual pedigree data, "-m" for the marker map,
"-f" for the marker allele frequencies, and "--model" for the model file.
The last option "--step 20" specifies that lod scores will be computed
at 20 points between the ApoB3 marker and the dummy marker I added.
Part II: Chromosome 8
We will run merlin twice, once
with the Japanese allele frequencies (file ws_jp.freq) and once with
the Caucasian allele frequencies (file ws_ca.freq).
You probably want to cut-and-paste this lengthy command! Your browser has
probably split the line, but make sure you enter it as a single command --
all on one line. Also remember we do not type the ``%''.