Stat550 (DL): Lab 3: PHASE and MORGAN/kin

This week's lab will cover two programs.
Note that some write-up is requested for each program.

The program PHASE is due to Matthew Stephens, and produces estimates of haplotypes of individuals given their genotypes at mutiple loci. It uses a model for similarities among haplotypes in a population, and samples phasings of the genotypic data under this model.
PHASE is installed on the Biostat computers.
Alternatively a better version of PHASE or fastPHASE either for linux or for Windows can be downloaded from Matthew Stephens old UW download page. (You may be able to find a more recent source, but this was the clearest I found.)

The second program is a MORGAN program, kin, which computes the kinship coefficient between specified pairs of individuals, and the inbreeding coefficient of specified individuals.

If using the Biostat linux computers:
Remember that if you did not discover how to put the source ~statgen/.statgen.cshrc
into your own .cshrc (or equiv.) file, you will need to give this command each time you log on, to access the statgen programs.

A: PHASE Assignment

A1) Recall Clark's algorithm from the lectures page 1.7.3. Try applying Clark's algorithm (by hand) to estimate the haplotypes of each individual given the following genotype data on 5 SNPs in 6 individuals. I have reformatted the data, so each pair of digits is the genotype at a SNP locus, and each row is an individual.
Does the algorithm give a unique solution?
10, 10, 10, 10, 10
00, 00, 00, 11, 10
00, 00, 00, 10, 00
10, 10, 10, 11, 00
11, 11, 11, 11, 00
00, 00, 00, 10, 11

A2) Use the PHASE program to estimate the haplotype frequencies and haplotypes for each individual in the problem above. The data set in the file phase.inp is the same data as above, but in Matthew Stephens' format!
Compare your estimates with the estimates you get from Clark's algorithm.

To run PHASE:

  • 1. Go to the small PHASE example input data file phase.inp. Copy the PHASE example file phase.inp, into your b550 working directory on abacus -- or to wherever you want to run PHASE.
  • 2. run PHASE by typing, for example :
    % PHASE phase.inp phase.out
    The program prints some information to the screen and also produces several output files with names starting phase.out_... .
  • 3. examine the output in the file phase.out_pairs (which lists the most probable haplotype pairs for each individual, along with their estimated probabilities), and the file phase.out_freqs which gives estimates of the population frequencies of each haplotype. Write a couple of brief paragraphs explaining the output, and comparing the results with Clark's algorithm.

    Here are the full instructions for the PHASE software, if you need them, or are interested in understanding the input format etc. However, if you have problems, the first thing to do is to email me!
    (The PHASE instructions are still here as of May 1, 2009, but may go now that Matthew Stephens has left UW-- but they will still be somewhere on the web -- try Google if interested. Also, if you are into this area, you may wish to try the more modern fastPHASE, which is available through Mathew Stephens page at Chicago.)

    B: MORGAN/kin assignment

    B1 Practising Using Kin:

    Again, we will use the example in the MORGAN Tutorial and downloaded examples for practice. You can find out more about kin in Chapter 4 of the online tutorial.

    In the MORGAN_Examples directory you downloaded for Lab-1, there is a subdirectory IBD. Go into ("cd") this subdirectory. There is a parameter file jv_rep_kin.par which specifies the pedigree file jv_rep.ped.
    Alternatively, you may find these files here:
    Parameter file jv_rep_kin.par
    Pedigree file jv_rep.ped.


    Take a look at jv_rep.ped. As the name suggests, this pedigree is just 2 copies of our usual JV pedigree. The first two parameter statements specify that there are 30 individuals in the pedigree, each of whom has the usual trio of names, followed by 2 integers. The other two parameter statements are self-explanatory, and are really just there for information, since these are the default options.

    Now take a look at jv_rep_kin.par. The first line tells the program where to find the pedigree file. The next line requests the program to compute kinship coefficients between two pairs of individuals, the final individual (531) and her dad, and between her two parents. The next line asks for a couple of inbreeding coefficients on the second copy ("component") of the JV pedigree. You can specify as many pairs (for kinship) and individuals (for inbreeding) as you want -- the program should figure it out. However, if there is more than one component pedigree in the data file, you do need to tell it which component pedigree the individuals are in. (This should not be a problem for your own single-component example pedigrees.)

    Finally the program asks for a couple of two-locus inbreeding coefficients, which may or may not be talked about in class. It is just the probability that the maternal and paternal haplotypes of an individual are IBD at both of two linked loci. This probability will depend on the recombination rate between the loci, so the final line specifies the recombination values at which this two-locus inbreeding is to be computed. (This paragraph is just here for info: you can ignore this bit if you wish.)

    To run the program kin on the example file, type:
    % kin jv_rep_kin.par
    or, if you would like to send your output to a file, such as kin.out, type
    % kin jv_rep_kin.par > kin.out
    or, if you decided to call the pedigree file something different such as my-pedfile
    % kin kin.par ped my-pedfile
    (The ped key on the command line overrides whatever is in the parameter file.)
    Look at your output, or output file. It should be self-explanatory.

    Practice modifying the jv_rep_kin.par file to calculate an inbreeding coefficient or kinship coefficient for another individual or pair of individuals. Apparently, kin thinks it is an error to try to compute a kinship of an individual with itself: I need to fix this, but it is not yet fixed!! .

    B2 Your MORGAN/kin Assignment

    Create a parameter file to run kin on a copy of your own pedigree file that you made for pedcheck in Lab 1. (It is simplest to leave the pedigree size and pedigree input record ... statements in your pedigree file from when you ran pedcheck on it, as in the copy here of jv_rep.ped, and to put the other statements into your own version of a parameter file kin.par, although in fact MORGAN does not care which statements are in which file.)

    Also note that for your single-component pedigrees you just leave out the component 1 bit of the statement. You can ask for any number of pairs for kinship in one statement (as the two pairs in the example), and any number of individuals for inbreeding (as the two in the example). You should have (for each component if you have more than 1), just one kinship request statement and just one inbreeding request statement in your parameter file.

    Your output should include the following:

    (i) Calculations of the kinship coefficients between at least two pairs of individuals in your pedigree. One of these pairs should be a pair of bilateral relatives other than siblings.

    (ii) Calculations of the inbreeding coefficients for at least two individuals in your pedigree, at least one of whom is inbred.

    Turn in a sheet of paper with your results, with brief explanation of the relationships between the individuals you have chosen (for kinship) or between their parents (for inbreeding).