Stat550: 2014: Lab 4: MORGAN programs markerdrop and lm_auto

Note to me: Next time -- explicitly ask for explanations for changes in probs at linked markers as we add trait data.

Note: The required parameter and pedigree files to run the examples below have been added to /datafiles/Lab4_auto on statgen.stat.washington.edu.

The goal of this week's lab is to run the program lm_auto which simulates inheritance and estimates ibd probabilities on specified individuals with an output very similar to that of ibddrop, but now the simulation is done conditional on marker and trait data.

However, in order to be able to run this and the final linkage analysis lab on your own pedigrees, we need first to create some data that is consistent with linkage etc. We do this using the MORGAN program markerdrop. Rather than give practice example, I will walk you through what I did in my example pedigree, so that you can do parallel things with yours.

The markerdrop program

The version of marker drop we will use takes trait data, and first simulates inheritance at that locus, consistently with those data. Given the inheritance vector at the trait locus and a genetic map, it can then easily simuate inheritance at marker loci to the lect amd/or right, and then assign allelic types to the individuals in accordance with this inheritance and the given allele frequencies. You can find out more about markerdrop at Chapter 6 of the MORGAN online tutorial.

In order to make m pedigree a bit more similar to yours, I have augmented my JV pedigree with two extra founders, making the spouses 2v6 and 2v4 into siblings, so that their offspring are double first cousins. The new pedigree file is at jvped_2014.ped and there is a picture of it here.

First you will need to modify your pedigree file, adding columns for trait data. In jvped_2014.ped I added 3 columns, In the first, there are no observed trait data (all values 0). In the second, the inbred individual fred is affected (value 2) and his two parents unaffected value 1). In the third both fred and a pair of bilateral relatives (in my file grandma and 3v3) are affected (2) and several other individuals including fred's two parents are unaffected (1).

The markerdrop parameter file markerdrop.par specifies that we will simulate marker data conditional on the trait data in this third trait column ("integer 4", as the first integer is gender). It then gives a genetic map for our markers and the location of the trait locus relative to the markers (just as in ibddrop) and also now gives allele frequencies for the five 4-allele markers (the same for each marker in this example). It gives the trait parameters as a fairly rare, almost (but not absolutely) recessive trait. and then the frequencies of the two alleles at the trait locus.

Finally it also give the pattern of observed marker data we want to simlate. In my example, I have specified that the seven individuals observed in that third trait column will be the ones observed, and they will all be observed for each of their two alleles and each of the 5 markers. That is why there are 10 "1"'s after each name.

Now we are ready to run markerdrop by saying
% markerdrop markerdrop.par > markerdrop.out
Note that to run markerdrop I need a seed file such as sampler.seed which I just continued from my Lab-3.

In output file markerdrop.out is a lot of info about what it thinks it simulated, but the important part is the seven lines of marker data right at the end. These are going to be my marker data for the rest of my analyses, and I cut and paste them into another file jvped_2014.markers which I am going to use in all the other programs in Lab-4 and Lab-5. (These data ended up a little weird in some respects -- any set of enough random numbers contain some a bit weird! -- But it seems to give OK results so we go with it.)

Markerdrop assignment:

  • (1) Do the same with your pedigree file -- add the analogous three columns of data -- one with no data (all 0), one with your inbred individual affected (2) and his/her parents unaffected (1), and the third with your pair of bilateral relatives also affected, and a few other individuals unaffected (1).
  • (2) Create your own markerdrop.par file by editing the pedigree file name, and also specifying at the end the individuals on whom you want to have marker data -- maybe, like mine, the same individals observed for the third trait.
  • (3) Run the markerdrop program as above; check the output to see it is running sensibly, and create your own small marker data file by cutting and pasting the final output table, as I did to get jvped_2014.markers.

    The lm_auto program

    The lm_auto program estimates ibd probabilities in a format that looks similar to ibddrop, but rather than simply simulating descent in the pedigree, this program is using MCMC to sample descent conditional on marker data. You can find out more about lm_auto in the first part of Chapter 9 of the MORGAN online tutorial.

    As in ibddrop we need to decide what gametes we are going to score, and I chose to do the four gametes of my bilateral relatives (the double-first-cousins, grandma and 3v3), and also another set of 4 consisting of the two of my inbred individual fred, with one from each of grandma and 3v3. To check things were working I first ran ibddrop using the parameter file ibddrop_2014.par:
    % ibddrop ibddrop_2014.par > ibddrop_2014.out

    At the end of my output file ibddrop_2014.out are the two probability tables, each of (potentially) 15 states, for the two sets of 4 gametes.

    Preliminary assignment

  • Choose your sets of gametes: you could do two sets of 4, as here, or you could stick with the ones fom your Lab-3 -- the 4 gametes of your bilateral relatives, and the two in your inbred individual. If using the same ones as before, you do not need to rerun ibddrop, but it will still be useful to review your outsput, as our goal is to see how these ibd state probabilities change when we condition on trait and/or marker data.

    Now on to the lm_auto program: let's look at the first parameter file lm_auto_1.par. (In fact the only difference between the three lm_auto_n.par parameter files is the "select trait n" statement, where n is 1, 2 or 3.)

    First we tell it the input files in the usual way -- pedigree file, marker data file, and seed file. Then we specify the genetic maps, and now must also specify the allele frequencies at the marker loci and trait locus. For convenience, I have given the same 4 marker allele frequencies at each of markers 1,2,3,4,5, so I can condense it to one statement.

    Then there is a section wich looks messy, but only because I decided to put in all three traits -- in our case the trait modes are all the same -- our rare, almost recessive -- but we use the idea of different traits to accommodate our three sets of different trait data. Then we specify our sets of proband gametes, just as we did for ibddrop.

    Finally, there is some information on the MCMC, which you do not need to worry about. For those who care, the first three statements tell it how to get its initial pattern of inheritance, and how to get started. The last 5 statements tell it about how to do the MCMC. The multiple meiosis sampler is our better MCMC sampler; however, it takes longer, so I have reduced the number of MCMC scans to 10,000. Sample by scan means that it will update all the inheritance patterns in turn, before the next round of updating. The number of MC iterations is now the number of scans. Finally, our MCMC sampler is a mixture or two samplers, which we call L-sampler and M-sampler. Here we are asking it to use the L-sampler a random 20% of the time, and the remaining 80% will be M-sampler.

    Now run lm_auto on this parameter file by typing
    % lm_auto lm_auto_1.par > auto1.out
    Unlike our previous programs, this is not instantaneous -- it could take up to about 1 minute to run. Be patient!

    Now look at the output file auto1.out. The first part should be very similar to your ibddrop output. It also gives various information about what it is doing, both with regard to its data and trait information -- note it notices that in this first example is has "no trait data", but it still produced ibd probabilities conditional on marker data.

    There are various summaries/diagnosics on the MCMC, but we just focus on the last part -- the tables of ibd probabilities at each locus for each of our two gamete sets.

    Note these probabilities are quite different from the prior probabilities produced by ibddrop. Some states are excluded (prob 0) at some loci by the marker data at that locus. Because the loci are linked, neighboring loci are also affected. Probabilities are quite different at the different markers, and generally the probabilty mass is concentrated into a few states (at least if your marker data are reasonably informative, as these are).

    Now run % lm_auto lm_auto_2.par > auto2.out
    with output file auto2.out, and
    % lm_auto lm_auto_3.par > auto3.out
    with output file auto3.out.

    Each time we add more trait data, the ibd probabilities change dramtically, especially at the trait locus, but also at nearby markers.

    lm_auto assignment

  • (1) Modify the parameter files to use your own pedgree and marker data, and to score your proband gametes. Note once you have the first one, the others different only in the "select trait n" statement.

  • (2) Look at your output for the first case -- see if you can explain some of the 0's, and one case of a big change from the prior.

  • (3) Look at your output for the second two cases, and explain why the probabilities, especially at the trait locus, change as they do. Recall that your inbred individual is affected with a (close to) recessive trait.