Stat550 2014: Lab 3: MORGAN/kin and MORGAN/ibddrop

This lab will cover two MORGAN programs. The first is the small program, kin, which computes the kinship coefficient between specified pairs of individuals, and the inbreeding coefficient of specified individuals. The second (ibddrop) simulates the descent of DNA at linked loci down a pedigree to estimate the probabilities of more complicated patterns of ibd.

A. MORGAN/kin assignment

A1. Practising Using Kin:

We will use an example vey similar to that MORGAN 3.2 online Tutorial and downloaded examples for practice. You can find out more about kin in Chapter 4 of the online tutorial.

To run kin you will need a parameter file and a pedigree file. You may find these files here:

Parameter file jv_3rep_kin.par
Pedigree file jv_3rep.ped

(I will also put these files in the /datafiles directory on statgen.stat.washington.edu.)
Take a look at jv_3rep.ped. The comments up the top say it all. As the name suggests, this pedigree is just 3 copies of our usual JV pedigree, where I changed a few names to make it more interesting (??). The fourth column is gender, as usual. The fifth column, which we have ignored up to now, will still be ignored by kin and ibddrop. The first two parameter statements specify that there are 45 individuals in the pedigree, each of whom has the usual trio of names, followed by 2 integers. The other two parameter statements are self-explanatory, and are really just there for information, since these are the default options.

Now take a look at jv_3rep_kin.par. The first line tells the program to use its most verbose output.
The second line tells the program where to find the pedigree file. The next line requests the program to compute kinship coefficients between two pairs of individuals, the final individual (531) and her dad, and between her two parents. Another line asks for a couple of inbreeding coefficients on the second copy ("component") of the JV pedigree. You can specify as many pairs (for kinship) and individuals (for inbreeding) as you want -- the program should figure it out. However, if there is more than one component pedigree in the data file, you do need to tell it which component pedigree the individuals are in. (This should not be a problem for your own single-component example pedigrees.) Note we have not specified anything to be computed for component 3: MORGAN will give a warning on this, but just runs on what it is given.

Finally the program asks for a couple of two-locus inbreeding coefficients, which may or may not be talked about in class. It is just the probability that the maternal and paternal gametes of an individual are IBD at both of two linked loci. This probability will depend on the recombination rate between the loci, so the final line specifies the recombination values at which this two-locus inbreeding is to be computed. (This paragraph is just here for info: you can ignore this bit if you wish.)

To run the program kin on the example file, type:
% kin jv_3rep_kin.par
or, if you would like to send your output to a file, such as kin.out, type
% kin jv_3rep_kin.par > kin.out
or, if you decided to call the pedigree file something different such as my-pedfile
% kin kin.par ped my-pedfile
(The ped key on the command line overrides whatever is in the parameter file.)
Look at your output, or output file. It should be self-explanatory.

kin has been fixed to allow computation of kinship coefficient of an individual with himself. However kin is also supposed to ignore duplicate requests which it sems it no longer does. OK, always something to be fixed.

A2 Your MORGAN/kin Assignment

Create a parameter file to run kin on a copy of your own pedigree file that you made for pedcheck in Lab 2. (It is simplest to leave the pedigree size and pedigree input record ... statements in your pedigree file from when you ran pedcheck on it, as in the copy here of jv_3rep.ped, and to put the other statements into your own version of a parameter file kin.par, although in fact MORGAN does not care which statements are in which file.)

Also note that for your single-component pedigrees you just leave out the component 1 bit of the statement. You can ask for any number of pairs for kinship in one statement (as the two pairs in the example), and any number of individuals for inbreeding (as the two in the example). You should have (for each component if you have more than 1), just one kinship request statement and just one inbreeding request statement in your parameter file.

Your output should include the following:

(i) Calculations of the kinship coefficients between members of three pairs of individuals in your pedigree. Try to find pairs with different kinship coefficients, or with different relationships having the same kinship coefficient. At most one of your pairs should have kinship coefficient 0, and at least one of the pairs should be a pair of bilateral relatives other than siblings.

(ii) Calculations of the inbreeding coefficients for three non-founder individuals in your pedigree. At most one should have inbreeding coefficient 0. Try to find individuals with different inbreeding coefficients if you have them.

Turn in a sheet of paper with your results, with brief explanation of the relationships between the individuals you have chosen (for kinship) or between their parents (for inbreeding).

B. MORGAN/ibddrop assignment

B1. Practising Using ibddrop:

This one takes me longer to write and explain, than it will take you to do it!

The MORGAN ibddrop program estimates ibd probabilities by simulating descent on pedigrees. You can find out more about ibddrop in Chapter 7 of the Tutorial.

One important thing is that the ibddrop talks about marker loci and trait loci, but for ibddrop these are just locations on the chromosome -- it does not care about genetic markers or traits or genetic data for markers or traits on the individuals. We do it this way, so that the same parameter files can be used for a later MORGAN program where we do Monte Carlo realizations of ibd conditonal on marker and trait data.

We will use the same pedigree file jv_3rep.ped, which is 3 copies of the JV pedigree.

In addition to jv_3rep.ped download the following three files, to the directory from which you are running this week's lab programs:

the first ibddrop parameter file ibddrop1.par,
the second ibddrop parameter file ibddrop2.par,
and a seed file seedfile.seed.

You do not have to keep the same file names when you download them, but those are the names I will use in describing them, so you will find it easier if you do.

Random Seeds in MORGAN

In addition to the parameter and pedigree files, the programs will use the file seedfile.seed which provides a random seed to the MCMC. Since we are doing Monte Carlo, we need a seed for the random number generator -- our particular generator uses a pair of numbers in hexadecimal -- that's why they look weird. We could use (fixed) default seeds, but that is boring: we get the same answers every time, and don't see the Monte Carlo variation. Some random number generators use a random seed based on the exact time, but we don't want to do that -- in developing software especially, it is sometimes very useful to be able to rerun from the same seed to check things. We therefore read a seed from a seed file, using a parameter statement. If you look in the parameter files you will find the statement input seed file . The program will read its starting seed from the specified file. There is also an output overwrite seed file statement. If this statement in included, at the end the program outputs the seed to the named output seed file, overwriting what is there. (When you run the programs you will get a warning about the overwriting of the seed file: this is OK.) Then when another run is started, the program can use this new seed.

B1.1: First version of ibddrop

Now we are ready to start on the first ibddrop example. Look at the parameter file ibddrop1.par First we specify the pedigree file and seed files as described above. Next we tell it that it is to simulate markers (it will figure that there should be 5 markers from the marker map) and to simulate 1 tloc which it arbitrarily names "11". (Remember only the first 4 letters of each key word are significant.) A tloc is MORGAN's abbreviation for a trait locus. Note from above, to ibddrop, 5 markers and one tloc just means 6 loci, since there are no data, but it is convenient to use the same statements that we will use for other MORGAN programs.

You do not need to worry about details of maps, map specification etc. yet -- the information here is just for those who already are more familiar with genetic maps.

Next we specify the marker map, in terms of recombination fractions between adjacent markers. Since we don't give the map a gender, it assumes it is both the` male and the female marker map. Then we tell it where to put the tloc (tloc 11) relative to the markers: "marker 2 recomb frac .112". This means put the tloc to the right of marker 2, at recombination fraction .112 to marker 2. (If we wanted it to the left of all the markers, we would say "marker 0 reco frac 0.112", and in this special case this would mean the recombination fraction to marker 1 is 0.112.) In fact, I have given a very slightly different recombination fraction for the male and female meioses, just to show we can!

Next, we have to tell ibddrop what ibd patterns to score. In this version, we give it some sets of gametes, and it will score all the possible patterns of ibd among them, scoring locus by locus. (This will probably be clearer when you look at the output.) We specify a gametes by the ID "name" of the individual, and by a 0 (for maternal) or 1 (for paternal) indicator. Thus grandma 0 3v3 1 means score the ibd between the maternal gamete of grandma and the paternal gamete of 3v1. To keep things manageable, we are here scoring ibd only on the second pedigree component. We score ibd between these two gametes, and then ibd among a set of 4 gametes consisting of the two of the final individual and these two. (Recall, there will be up to 15 possible patterns among 4 ordered gametes.) Finally, we tell ibddrop how many realizations to do: that is, how many times it will simulate descent at these 6 linked loci, down the pedigree. The number 40000 specified here does not take long, and is enough to get good estimates.

Now run ibddrop by typing % ibddrop ibddrop1.par > ibd1.out
This may give you some warning messages to the screen -- maybe about seeds, but will send the main output to ibd1.out. It generates quite a bit of output, so it is probably easier to look at it in a file.

Now look at the output. A lot of MORGAN output is involved in it telling you what it understood you to tell it to do. This can be a bit tedious, but is well worth checking!! Most of the silly errors we have made in running MORGAN were because there was something a little bit wrong with a parameter file, and it did what the parameter file said.

First it prints little pictures of the maps, showing the markers and tlocs. It converts recombination fractions to centiMorgans (we will meet this very soon). Since we told it a slightly different male from female map it prints both. Next it tells us what proband gametes we told it to score, on each pedigree component -- and it will check that the specified individuals are indeed in the right component pedigree. Then it reminds us that we asked for 40000 realizations (it calls them MC iterations). This is the end of its checking of parameter statements -- and it tells us so!

Now it opens the pedigree file, and does the by now familiar checking and summary of the pedigree. Then it reopens it to start its simulations: this is because of the way pedcheck works, producing a reordered pedigree for other MORGAN programs, if necessary. Next it tells us how many meioses (remember the meiosis indicators are 0/1 "switches") it will simulate: 18 on each copy of the JV pedigree. You may find this a bit odd: there are 10 non-founders, each with a maternal and paternal meiosis. The reason there are only 18, not 20, is because it does not bother with one from any founder who has only one kid. (We may discuss this, some time.) It tells us its seeds (useful if we ever want to rerun the exact same thing), its map again (this time in recombination fractions, which is what it actually uses to simulate, and since it has found we asked for sets of 2 and of 4 gametes, it reminds us that for 2 there are 2 "patterns" (ibd or not) while for 4 there are 15.

Then, it does its stuff, and prints out the results, just counting up what it has simulated, locus by locus. The pair of individuals are cousins -- the true probability of ibd at each locus is 0.25. You will see some Monte Carlo variation, but you should see that at every locus the probability of ibd (the pattern 1 1) is about 0.25, and of non-ibd (the pattern 1 2) is about 0.75. Now look at the set of 15 patterns among the four gametes. Remember the inbreeding coefficient of the final individual is 7/64 -- how would you find the Monte Carlo estimate of this from this table? Which loci do you expect to have most similar results? Why?

B1.2 ibddrop, version 2

This version of ibddrop shows a different way of scoring the multilocus ibd. In B1.1 we scored locus-by-locus, and so ended up with probabilities that were the same (on average) at each locus, and didn't really show the dependence between linked loci.

Look at the parameter file ibddrop2.par. Down to the scoring the information is the same, although the marker map information looks a bit different. I gave the male map in (Haldane) centiMorgans rather than recombination fractions, but if fact it's the same map (22.3 cM is a recombination fraction of 0.18, and 11.15 cM is recombination fraction 0.1). The request for 40000 Monte Carlo realizations at the bottom of the file is also as before.

The difference is in the scoring patterns: now we request the program to score over overlapping windows of 3 consecutive loci. Since our loci are ordered on the chromosome as (M1,M2,T,M3,M4,M5) these windows will be (M1,M2,T), (M2,T,M3), (T,M3,M4), and (M3,M4,M5). What it will score will be Yes(1)/No(0) events, and so simplest is just to give it pairs of proband gametes and ask it to score ibd (Yes=1) or not (No =0). For simplicity, we here give it just one air of gametes, the two of the final individual fred.

This will be clearer on looking at the output, so now run ibddrop on this second parameter file by typing
% ibddrop ibddrop2.par > ibd2.out

The output consists of a list of possible ibd patterns for the three loci and four columns of relative frequencies. Remember that we are looking at 3 loci at a time. The ibd pattern 0 1 0, for example, is the event that the individual's two alleles are not ibd at the first locus, are ibd at the second, and aren't ibd at the third, where first, second, and third refer to each set of 3 loci in a window.

Here are some things you could think about to see whether the output makes sense:

1. Window 4 consists of markers 3,4 and 5, which are equally spaced. This should mean that, for this final column of relative frequencies, there should be symmetry between patterns 001 & 100, 011 & 110.

2. Is the output from this program consistent with the two-locus inbreeding probabilities you computed using kin (if you did). For example, look at the first column. Here, the recombination fraction between the first two loci is 0.18. For the simulations that ibddropgenerated, the relative frequency of fred being ibd at both of the first two loci is Pr(110) + Pr(111) = 0.0346 (for my simulations). According to kin, the probability should be 0.03455. That looks good to me.

3. One more thing: Note that, in the absence of interference, one interval with r = 0.18 is the same as two adjacent intervals with r = 0.1. In column 4, Pr(101) + Pr(111) = Pr( both ends of an interval with r = 0.18 are ibd). Also, in column 1, Pr(110) + Pr(111) = Pr( both ends of an interval with r = 0.18 are ibd). My simulations give pretty good agreement between these.

B2. Your ibddrop assignment.

Using the pedigree file you created for pedcheck in Lab-2, modify the two ibddrop parameters files to run on your pedigree. You will need to change the input pedigree file name, and the proband gamete statements. I do not think you should need to modify anything else.

B2.1 Choose the two gametes of an inbred individual, and the four gametes of two bilateral relatives who are not sibs, and run the first version of ibddrop on these two sets of gametes (size 2 and size 4). Submit the estimated ibd probability tables from your output, and comment on whether the results make sense, given the relationships of the individuals you have given it.

B2.2 Choose just the two gametes of an inbred individual, and run the second version of ibddrop to find the probabilities of autozygosity/not over windows of three loci. Submit the estimated ibd probability table from your output, and comment on whether the results make sense, given the inbreeding coefficient of your chosen individual and the various distances or recombination probabilities among the 6 genetic loci.