[Top] [Contents] [Index] [ ? ]

Morgan Tutorial

This is a tutorial for the MORGAN (Monte Carlo Genetic Analysis) package, focusing on how to use the provided programs.

The current version of this document is written for MORGAN Version 2.8.1.


1. Get Started


1.1 Overview of MORGAN

MORGAN (Monte Carlo Genetic Analysis) is a collection of programs and libraries developed at the University of Washington under the PANGAEA (Pedigree Analysis for Genetics and Epidemiological Attributes) umbrella. This software implements a number of methods for the analysis of data observed on members of a pedigree, with the main programs implementing Markov Chain Monte Carlo (MCMC) methods. As of the date of this tutorial, the latest MORGAN version is 2.8.1 which was released in April 2006. It is available for download through the MORGAN V2.6 home page at the Department of Statistics, University of Washington.

The MORGAN programs are grouped into four categories:

  1. Programs using deterministic algorithms: pedcheck checks for errors in pedigree structure and data format, see Checking Pedigree Validity. kin computes kinship and inbreeding coefficients for members of the pedigree, see Computing Kinship and One- or Two-Locus Inbreeding Coefficients.
  2. Programs using simple Monte Carlo techniques (by simulating data on founders and "dropping" genes down the pedigree): genedrop simulates data on a pedigree for analysis by other programs, see Simulating Marker and Trait Data in Pedigrees. markerdrop simulates marker data at loci linked to a potential trait locus, see Simulating Marker Data Conditional on Trait Data in Pedigrees. ibddrop uses Monte Carlo to estimate gene ibd (identity by descent) probabilities in the absence of data, see Estimating a priori ibd Probabilities by Monte Carlo.
  3. Programs using Markov chain Monte Carlo (MCMC) techniques: MORGAN's MCMC programs are split into two sections, "Autozyg" and "Lodscore". The Autozyg programs, lm_auto and lm_pval, estimate conditional gene ibd probabilities, see Estimating Conditional ibd Probabilities by MCMC. The Lodscore programs, lm_lods, lm_markers, lm_bayes and lm_schnell estimate multilocus LOD scores, see Estimating Location LOD Scores by MCMC. A brief introduction to the MCMC techniques employed by MORGAN can be found in Using MCMC to Estimate Parameters of Interest in Pedigree Data.
  4. Programs using EM algorithm for segregation analysis with quantitative traits: includes univar, unibig, bivar and multivar, see Polygenic Modeling of Quantitative Traits by EM Algorithm.

This tutorial is based on the computing notes of Dr. Elizabeth Thompson. Descriptions of the parameter statements are taken from the `README_userdoc' files written by Myrna Jewett. The original version of this document was written in 2002 by Michael Na Li, and revised by Myrna Jewett and Adele Mitchell in 2006. The current version is revised by Audrey Q. Fu.

Combined with hands-on examples, this tutorial gives a brief introduction to the usage of the main MORGAN programs. For further information, please refer to the MORGAN documentation and to the references cited.


1.2 Get the Tutorial

This tutorial is available on-line at

http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml#tut

Several formats of this tutorial are also available to download for off-line reading or printing.


1.3 Get and set up the examples

This tutorial assumes that someone else has already installed the MORGAN software somewhere. If not, please contact your local system administrator or download the software yourself and follow the instructions therein.

Follow the following steps to download and set up the examples:

  1. Download the examples (gzipped tar files): morgan-examples.tar.gz
  2. Unpack the examples by typing the following command in a shell window,
     
    user$ tar zxvf morgan-examples.tar.gz
    

    Or if the above command fails (you don't have GNU tar), use

     
    user$ gunzip -c morgan-examples.tar.gz | tar xvf -
    

    This will produce a `MORGAN_Examples' directory under your current directory.

    (Note: Throughout the text file and directory names are enclosed in single quotes; these single quotes are not part of the file or directory name.)

  3. Use `Makefile' to establish links under the `MORGAN_Examples' directory to the MORGAN programs. A link under the `MORGAN_Examples' directory serves as a shortcut to a MORGAN program installed elsewhere.

    Before making links, you first need to edit the `Makefile' (using your favorite text editor, for instance vi or pico) in the `morgan-examples' directory to make sure the paths to your MORGAN programs and those to the `MORGAN_Examples' directory are correct. Most often, it is necessary to change the `MORGANDIR' and `EXAMPLEDIR' statements to reflect the locations of the MORGAN files on your system and the examples, respectively. Here is the relevant part of the `Makefile',

     
    # Change the following macros to where MORGAN and the examples 
    # are installed on your system.
    # This is the only change you need to make in this file.
    
    MORGANDIR = /h4/audrey/MORGAN_V2.8.1
    EXAMPLEDIR = /h4/audrey/MORGAN_Examples
    BINDIR = /h4/audrey/bin
    
    # Note: the paths may happen to be same for MORGANDIR and EXAMPLEDIR. 
    # In general they are different: 
    #          MORGANDIR is where MORGAN is installed on your system
    #          EXAMPLEDIR is the MORGAN_Examples directory you have made
    #          BINDIR is your bin directory
    # BINDIR is needed only if you prefer to link to executables from your
    #    bin directory, rather than running from a current directory: for
    #    example, if your current directory is not in your standard path.
    

    For more information on how to use Makefile to build links, etc., you may type:

     
    make help
    

    To make symbolic links to those programs in the current directory, type

     
    make links 
    

Notes for Microsoft Windows users:

MORGAN may be (in principle) installed under Windows: executables should then be placed in the directory in which programs are to be run. See the documentation for more information. We cannot currently answer any questions regarding Windows installation. Instead, we recommend the use of a linux-system emulator such as Cygwin .


1.4 Overview of the pedigrees used in the examples

Except for some small pedagogical pedigrees for pedcheck under `Pedcheck', two main pedigree files are used to illustrate the usage of MORGAM programs.

File `jv_rep.ped', located under `IBD', is composed of two replicates of the JV pedigree. The 30-individual 5-generation JV pedigree derives from a real study of a rare recessive trait by Goddard et al (1996 AJHG 58: 1286-1302).

The other pedigree in `ped73.ped', located under `MORGAN_Examples', consists of three components and 73 individuals: component one has 47 individuals from 6 generations, component two 11 individualus from 3 generations, and component three 15 individuals from 3 generations. In general, individuals from later generations are observed. The three components are displayed in `ped47.pdf', `ped11.pdf' and `ped15.pdf'.


1.5 Future directions of MORGAN

  1. General Library functions:

    Only functions affecting general MORGAN capabilities or enabling new programs are listed here. Many other general improvements in Morgan library functions and MORGAN set-up routines have been made. Some details may be found in MORGAN web release notes, or in the README_relnotes of each MORGAN release.

    1. Hidden Markov computations:

      In MORGAN V2.8.2, forward HMM computation for multiple meioses has been replaced by a factored version (FHMM), enabling much faster exact computation on small pedigree components and multiple-meiosis sampling for larger numbers of meioses.

      Exact computation of lodscores on small pedigree components has been implemented for lm_markers: computation uses the FHMM version of the Baum algorithm.

    2. Multiple meiosis sampler:

      MM-sampler updates multiple meioses jointly and is therefore a generalization of the meiosis sampler (M-sampler). There are four types of update in MM-sampler: random meiosis update, individual update, sib update and 3-generation update. This is based on work by Liping Tong.

      For more detail on L- and M-samplers, see Using MCMC to Estimate Parameters of Interest in Pedigree Data.

    3. MCMC and pedigree peeling by component

      Up to MORGAN V2.8.2, MCMC was performed globally over pedigree components (except those small enough for exact computation). The L-sampler peeling and lod score estimation could be done either by component (using "set peeling by component") or globally (the default).

      With MORGAN V2.8.3, and specifically to accommodate the new lm_haplotype program, the preferred option is to do both MCMC and pedigree peeling (lod score estimation) by component, and to use exact computation on all sufficiently small component pedigrees. The alternative, retained so that older data sets can be rerun, is to use "set global MCMC", in which case no exact computation will be done, and MCMC will be done globally over all component pedigrees. In this case, the "set peeling by component" option is retained.

    4. Pedigree peeling for multiallelic loci with general penetrance;

      As yet, loci are either multiallelic marker loci assumed observed without error, or trait loci which may have general penetrance functions but are diallelic. In order to allow models for "non-genotypic" markers, general joint peeling programs have been implemented, based on Thompson (1976: UU Tech Rept, #6). These peeling routines are used by the lm_map program which allows for errors in marker data. They are not yet released, as they are still in process of testing: they may be released in MORGAN V2.8.3.

    5. Penetrance functions and trait models:

      In MORGAN V2.8.2, liability penetrances (previously available only for lm_bayes) have been implemented for lm_markers. Penetrances for each liability class are now read from an input file using the "input extra data file S" parameter statement.

      Additionally, an age-based penetrance function for a qualititative trait has been implemented. That is, penetrances are directly dependent on age, rather than going through a liability class specification.

      In two-locus models for a quantitative trait, penetrances may be specified as additive, with a genotypic mean for each trait genotype for each locus. Alternatively, a matrix array of 2-locus genotypic means may be specified, allowing for epistasis (see Sung & Wijsman, 2007, Human Heredity 63: 144-152.).

    6. Traits and trait loci:

      With more complex trait models, including those of lm_twoqtl (see Sung et al., 2007, Genetic Epidemiology 31: 103-114), a more general specification of traits is required. In MORGAN V3.0, completely new structures have been introduced, separating traits (phenotypes) from trait loci ("tlocs"). Traits may be affected by genotypes at several tlocs; the genotypes at a tloc may affect several traits. This more general structure is not yet released - see lm_twoqtl below.

  2. Autozyg programs:
    1. Map estimation: lm_map with and without error models:

      Some updates and corrections to the lm_map program are made in MORGAN V2.8.2. Additonally under development is a version of lm_map which allows for error in observation of marker genotypes. This version may be released with MORGAN V2.8.3.

    2. Latent p-values: (including programs using Lodscore statistics)

      The version of the program lm_pval released in MORGAN V.2.8 and subsequent, and described in this tutorial, uses the latent p-value distribution of Thompson & Geyer (2007, Biometrika). Additional programs using these ideas are under development, including programs for the distribution of latent lod scores obtained in MCMC sampling (lm_fuzlod), p-values and randomized tests based on latent lod score statistics (lm_fzplod), and randomized confidence sets for the location of a trait locus (lm_fzconf). These are working names only. The methods are described in Thompson (2007: submitted).

    3. Gold (Gold2) and Gold1 subdirectories:

      In MORGAN V2.8.3, Gold replaces the previous Gold2 subdirectory, for tests of lm_auto, lm_pval, lm_map and lm_ibdtests.

      Gold1 lm_auto tests remain temporarily, since they provides the only tests of MCMC samplers on looped pedigrees. Gold1 lm_auto gold standards were omitted from the released MORGAN V2.8.2, due to delays in checking looped pedigree peeling routines: they will be reinstated in MORGAN V2.8.3.

  3. LR_Lods programs:

    To make room for new Lodscore programs being released in MORGAN V2.8.2, 2.8.3 and 3.0, the two older programs lm_schnell and lm_lods have been moved to the new directory LR_Lods. These two programs differ in several ways from newer programs, but the principal one is that they use the methods of combining likelihood ratios (LR) along the chromosome in order to estimate lod scores (see Thompson & Guo, 1991, IMA J Math Appl in Med & Biol).

    The Gold1 subdirectory remains temporarily, since it provides the only tests of MCMC samplers on looped pedigrees. Gold1 lm_lods gold standards were omitted from the released MORGAN V2.8.2, due to delays in checking looped pedigree peeling routines: they will be reinstated in MORGAN V2.8.3.

  4. Lodscore programs:
    1. lm_multiple and lm_markers

      A new lod score calculation program lm_multiple was released in MORGAN V2.8.2 (Spring, 2006). The lm_markers program is still made as a separate executable, but is compiled as a special case of lm_multiple code. In V2.8.2, both programs perform exact lodscore computations on small pedigree components. In V2.8.3 (unreleased) this is optional: see lm_haplotype below.

      The difference between lm_multiple and lm_markers is that the new lm_multiple substitutes the old single-meoisis M-sampler updates for the new multiple-meiosis (MM) sampler that is the work of Liping Tong (Tong & Thompson, 2007, Human Heredity:in press): see above.

      lm_multiple runs with the same parameter file and other input files required by lm_markers. The output is also essentially the same as that from lm_markers.

      More information on lod score calculation programs in MORGAN V.2.8.1 (and previous) can be found in Estimating Location LOD Scores by MCMC.

    2. lm_haplotype

      The lm_haplotype program is a generalization of lm_multiple in which haplotypes of key individuals dividing the pedigree are sampled in addition to meiosis indicators. To facilitate efficient implementation of this algorithm, new peeling-by-component routines need to be implemented and checked. This program is the work of Liping Tong (Tong & Thompson, 2007, Human Heredity:in press). The program is in process of release: probably in MORGAN V2.8.3.

    3. Gold subdirectory (previously Gold1 and Gold2)

      In MORGAN V2.8.2, Gold standards for exact computation and for lm_multiple are added in the Lodscore/Gold2 subdirectory.

      In MORGAN V2.8.3, the Gold directory replaces the previous Gold2 directory. Gold directories for lm_lods and lm_schnell are moved to the new LR_Lods program directory. Thus Gold1 no longer exists in Lodscore.

  5. MORGAN 3.0: lm_twoqtl

    The new program lm_twoqtl allows two (linked or unlinked) quantitative trait loci to contribute additively or epistatistically to a single trait (see Sung et al., 2007, Genetic Epidemiology 31: 103-114).

    An unreleased version of this program exists under MORGAN V2.8, but a user-friendly version requires a new MORGAN structure in which traits are separated from trait loci or "tlocs": see above. This new structure is implemented in MORGAN V3.0. The program lm_twoqtl will be the first to be released under this structure, although eventually all current programs will be converted.


2. Common Features and File Formats

All MORGAN programs use the same command line syntax, share many statements, and use the same pedigree data format. Most of the MORGAN programs need at least two input files in order to run: one parameter file and one pedigree data file. The parameter file contains computing requests, model parameters and input/output file options. It may also contain genotype data or other information specific to a particular MORGAN program. The pedigree file contains, at minimum, information on family relationships among the individuals in the sample.

It is worth pointing out that white space in any input file is defined to be any of these characters: `,' (comma), `\t' (horizontal tab), `\v' (vertical tab), `\n' (line feed, or newline), `\f' (form-feed), `\r' (carriage return).


2.1 Command syntax

The parameter file name must be passed to MORGAN on the command line when calling the program. Other file names can be passed to MORGAN on the command line or in the parameter file. The minimum syntax to call a MORGAN program is:

 
progname parfile 

In the statement above, progname is the name of one of several MORGAN main programs, such as genedrop or lm_lods. The parfile is the name of the parameter file which must be present. For example, to run genedrop using a parameter file named `genedrop.par', the command is:

 
genedrop genedrop.par

Additional file names can be passed to MORGAN on the command line, but these file names must be accompanied by a file type to identify them. The syntax is:

 
progname parfile [filetype filename]...

Square brackets indicate optional arguments. Possible filetype options include:

  ped

Input pedigree file

  mark

Input marker data file (Note that only Autozyg programs use marker data)

  oped

Output pedigree file

  seed

Input seeds for random number generator

  oseed

Output random seeds

  oscor

Output score file

If the name for a particular file type is given both in the command line and in a parameter statement, the name in the command line takes precedence.

The programs put informational messages to stdout and error messages to stderr which default to the screen. It is possible to redirect either or both to a named file.


2.2 Parameter file

A MORGAN parameter file contains a series of statements. Many statements are common to all MORGAN programs, particularly those that define the format of the pedigree file and identify other files to be used for program input or output. Many statements are optional, with some default behavior. If statements irrelevant to the MORGAN program called by the user are included in the parameter file, those statements are ignored and a warning message is issued.

Each statement must begin on a new line and begins with one of the MORGAN statement keywords. A statement consists of any number of lines. Case is not significant for the keywords. Only the first four letters of the keywords are significant; the remainder of the word is ignored. The order of the statements does not matter. If the same statement is repeated, the last one overrides previous ones and a warning is given in the output file. A # starts a comment so that the rest of the line is ignored. Either single or double quotation marks (' or ") can be used to delimit strings such as file names. Look at the warnings issued by MORGAN to make sure the parameters are as you intended.

The most common statements are for identifying input and output files (counterparts of the command line options) and for describing the input pedigree file format.

Below is a simple parameter file, `simple.par', from the examples included with the MORGAN software.

 
input pedigree file `simple.ped'
output pedigree file `simple.out'
output pedigree chronological

A brief description of the most commonly used parameter file statement follows in the next section. For a complete and more detailed description of MORGAN statements, please see the sections of this tutorial relevant to specific MORGAN programs and the documentation that comes with MORGAN in the files `README_userdoc' in the various program subdirectories.


2.3 File identification statements

Within the parameter file, file names are delimited with single or double quotation marks (` or "). File names submitted on the command line are not delimited with quotation marks. In a parameter file, either of the two statements below would identify `pedchk.ped' as the pedigree file to be read.

 
input pedigree file "pedchk.ped"
input pedigree file `pedchk.ped'

The most commonly used file identification statements are:

input pedigree file filename

The input pedigree file is required for most programs and may be specified either in the parameter file or through command line options.

output [overwrite] pedigree file filename

The output pedigree file is required by genedrop. Other programs also check for errors in the pedigree. If there are errors that the program is able to correct or if there are requested changes to the pedigree file format, the new pedigree data is written to this file.

input marker data file filename

Marker data, such as marker allele frequencies, map distances between markers and individuals' genotypes, can be included in the parameter file itself or in a separate file, called the marker data file. This statement is used when the marker data are not included within the parameter file. The marker file contains the `set marker data' statements. Marker data are used by Autozyg programs. See section Autozyg computational parameters.

input seed file filename

This file contains statements to set random seeds for the Monte Carlo based programs. The seed file may contain multiple lines (as in the case when the input seed file is also used for the output seed file). If so, the seeds in the last line override previous ones (with warnings issued). If no seed file is named on the command line or in a parameter statement and there are no statements to set random seeds in the parameter file, default seeds (12345, 1073 (hexadecimal 0x3039, 0x431)) are used.

output [overwrite] seed file filename

The final random seeds are saved if an output seed file is named. This file could be the same as the input seed file. New entries are appended to the old file.


2.4 Pedigree file

The pedigree file may contain two sections, formatting statements and pedigree data, separated by the file separator `****'. The first section is optional; if present, it contains statements that describe the contents and format of the pedigree file, as some MORGAN users find it convenient to describe the pedigree data within the file itself. The alternative is to put these formatting statements in the parameter file.

The pedigree data begin below the file separator. Data for each individual must be placed on a separate line. Each line begins with three names, followed by integers, then real numbers. The only required fields, the three "names", are identifiers for each individual and his or her parents. Names may include up to 15 alphanumeric characters. Whitespace (comma, space, tabs, linefeed), single (') and double (") quotes, and the hash mark (#) cannot be included in names. Names longer than 15 characters are truncated to 15 characters. Pedigree founders should be given parents with names `0'.

Gender, if present, is the fourth item in each line. These three or four values may be followed by an "observed" indicator, with values of `0', indicating an unobserved individual, or `1', indicating an observed individual. The optional "observed" indicator is followed by other integers, if present, and real numbers, if present. Integers and real numbers can represent individuals' trait data.

The format of the file is flexible and is specified by the user with `input pedigree record …' statements, described in the next section.

Unlike LINKAGE format pedigree files, genotype data are not included in a MORGAN pedigree file.


2.5 Pedigree file description statements

Any of the following statements can be placed either in the parameter file or in the top section of the pedigree file, above the file separator, `****'. Most parameters have default values, in which case the statement is usually not required.

allow pedigree size N

This statement overrides the program-defined maximum pedigree size (presently 20,000 individuals).

input pedigree size N

Here, N is the number of records to be read. It may be less than the actual number of individuals in the pedigree file.

input pedigree record names 3 [integers I] [reals J]

This specifies the numbers of entries in each line of the pedigree file. There must be three names (up to 12 alphanumeric characters each) identifying an individual and his or her parents. Optional integers include gender and phenotypic or discrete trait data. Real numbers could be covariates or quantitative trait values.

input pedigree record (father mother | mother father)

This statement specifies the order of parental names. `father mother' is the default.

input pedigree record gender (present | absent)

Gender, which follows the required triplet of names, is optional. If this statement is not included, the default is `gender present'. Gender is coded as an integer, such that `1', `2' and `0' represent male, female, and unsexed, respectively.

input pedigree record observed (absent | present)

The observed indicator designates which members of the pedigree are observed and which are unobserved, indicated by `1' and `0', respectively. When the observed indicator is present, it follows gender (or parents if gender is not present). If this statement is absent, all pedigree members are assumed to be observed.

input pedigree record traits K1 K2... integers I1 I2...

This statement is needed when integer data for traits are included, and the trait values do not immediately and consecutively follow gender (if present). Use this statement to specify the correspondence between trait numbers and integers in the record.

Below are the first several lines of the sample pedigree file, `ped73.ped' in `MORGAN_Examples'.

 
input pedigree size 73
input pedigree record names 3 integers 6 reals 1

***************************************************
101 0 0 1 0 0 0 -1 -1 999.5
102 0 0 2 0 0 0 -1 -1 999.5
201 101 102 1 0 0 0 0 1 999.5

Note that genotype data are not contained in the pedigree file; genotype data, if required for the MORGAN program invoked, are contained in the parameter file or in a marker data file specified in the parameter file using the `input marker data file' statement. The second parameter statement in the file, `input pedigree record names 3 integers 9 reals 1' describes the format of the data on each line (also called a record) in the file. The first three values in each row, the names, give an individual's identification number followed by those of his or her father, then mother. Because there is no `input pedigree record gender' statement, gender is assumed to be present and to directly follow the three names. Absence of an `input pedigree record observed' statement means that all individuals are assumed to be observed. The 8 integers following gender and the real number in the final column represent trait data. Lack of an `input pedigree record traits integers' statement indicates that the first integer following gender corresponds to trait 1, the second to trait 2, etc. These (and other) parameter statement defaults apply only if there is no overriding statement in any of the parameter files used. Programs will generally provide a warning statement (coded "(W)") when default values are being used due to absence of a relevant parameter statement.


3. Checking Pedigree Validity


3.1 Introduction to pedcheck

pedcheck reads the pedigree file and checks for errors in the pedigree structure. Specifically, it checks for the following errors:

If no errors are found, pedcheck reports the number of components (connected pedigrees) found and lists for each component:

If there are changes to the file, pedcheck writes an output pedigree file. Requested changes may include reordering of the pedigree chronologically (by component, then by name), the addition of gender, the addition of an observed indicator, and reversing the order of the parental names.

Other MORGAN programs do their own pedigree checking by calling the relevant pedcheck functions, but it is still useful to do preliminary processing of data files first.


3.2 Sample pedcheck parameter file

Files for pedcheck may be found in the `Pedcheck' subdirectory of `MORGAN_Examples'. Below is the sample parameter file `check.par' for pedcheck:

 
input pedigree file `check.ped'
input pedigree size 30
input pedigree record gender absent
input pedigree record observed present
assign gender
output pedigree chronological
output pedigree file `check.oped'

The `assign gender' statement requests that pedcheck determine gender, when possible, and output that information to the output pedigree file. The gender determination is made based on the default order for the listing of parents, which is father followed by mother. Notice that individuals who are not parents (531 and 541) have missing gender, `0', in the fourth column of `check.oped'.

`output pedigree chronological' causes the pedigree to be sorted into chronological order in the output pedigree file, first by component, then by individual name. MORGAN refers to each connected pedigree (i.e., distinct family) in a file as a component. The first individual in the input listing who is not genealogically connected to individual 1 defines component 2, and the first who is not connected to either of these defines component 3, etc. Although pedcheck groups individuals by their MORGAN-assigned component numbers in the output pedigree file, it does not list the component numbers. That is, the first three columns of the output file are just as they were in the input file: individual name, father's name, mother's name.


3.3 Running pedcheck examples

Examples for the program pedcheck are under the subdirectory `Pedcheck/'. The commands using example files are listed below. Have a look inside the pedigree and parameter files, then verify that the output files are as you would expect them to be. If error messages are generated, verify that they make sense and see if you can make the necessary corrections so that pedcheck will run.

pedcheck  check.par

runs on input pedigree file `check.ped'. The pedigree contains no errors, but has no gender specified and is not in chronological order. These deviations from the standard format must be specified in the parameter file. The output pedigree file `check.oped' has gender assigned and has the members reordered. You will get an error message and the program will quit if `check.oped' already exists. If this occurs, delete the file and try again or use another output file name.

pedcheck  imp.par

runs on input pedigree file `imp.ped'. The pedigree contains an individual who is his own ancestor.

pedcheck  empty.par  ped sex.ped

runs with an empty parameter file, with input pedigree file `sex.ped' specified on the command line. What does the output say is wrong with this pedigree?

pedcheck  empty.par  ped dup.ped

runs with an empty parameter file, with input pedigree file `dup.ped' specified on the command line. What does the output say is wrong with this pedigree?


3.4 pedcheck statements

pedcheck statements apply to other MORGAN programs since the programs call the pedcheck functions first to check the pedigree file before doing computations on the pedigree data.

(assign | ignore) gender

Optional. `assign gender' causes gender to be determined by parentage, whether or not gender is included in the pedigree file. `ignore gender', causes the program to not check or assign gender. The default action is to assign gender when it is absent and to check gender if it is present.

output pedigree chronological

Optional. If this statement is present and if the input file is not in chronological order, the pedigree is sorted and written out in chronological order. The pedigree is sorted by components, and within each component, each non-founding member is preceded by her or his parents. If this statement is not given, the input order is preserved in the output file, if written. See the previous section of this chapter for further discussion of pedigree components.

output pedigree record (father mother | mother father)

Optional. This statement causes the parents to be named in the specified order. The default arrangement for each triplet of names is the input order.


4. Computing Kinship and One- or Two-Locus Inbreeding Coefficients


4.1 Introduction to kin

kin computes kinship coefficients for pairs of pedigree members. It also computes single-locus and two-locus inbreeding coefficients for members of the pedigree. Briefly, the kinship coefficient between individuals i and j is the probability that a randomly-drawn allele from i is identical by descent (ibd) to randomly-drawn allele from individual j at the same locus. A single-locus inbreeding coefficient is the probability that an individual carries two copies of a gene that are ibd, at a given autosomal locus. In other words, an individual's single-locus inbreeding coefficient is equal to the kinship coefficient of his parents, as an individual's gametes can be thought of as random draws from his parents' chromosomes. A two-locus inbreeding coefficient is the probability that an individual carries two ibd copies of a gene at each of two linked loci. kin presents two-locus inbreeding coefficients as a function of the recombination fraction between the two loci.


4.2 Sample kin parameter file

Files for kin may be found in the `IBD' subdirectory of `MORGAN_Examples'. Below is a sample kin parameter file, `jv_rep_kin.par'.

 
input pedigree file 'jv_rep.ped'
compute component 1 kinship coeff            531 431   431 432
compute component 2 inbreeding coeff         441   541
compute component 1 two-locus inbreed coeff  531
compute component 2 two-locus inbreed coeff  541
set recomb freqs .01 .04 .05 .10 .18 .30 .50 .0

The statements on lines 2 - 5 request computation of kinship coefficients for the pairs `531 431' and `431 432' from component 1, inbreeding coefficients for individuals `441' and `541' from component 2, and the two-locus inbreeding coefficient for `531' from component 1 and `541' from component 2. The two-locus inbreeding coefficient will be computed for two loci at distances specified in the `set recomb freqs' statement.


4.3 Running kin example and sample output

Under the subdirectory `IBD/', run the example above using the command below. To send the output to a file instead of the screen, include `> filename' (without quotes) after the parameter file name on the command line.

 
kin jv_rep_kin.par

Below is the relevant part of the kin output.

 
 Component 1:

    Kinship coefficients:

    531  431   .32031
    431  432   .10938


    2-locus inbreeding coefficients:
    (g4link is probability of IBD at both of 2 linked loci)

    proband  recomb   g4link
               freq     prob

        531    .000   .10938
               .010   .10234
               .040   .08386
               .050   .07849
               .100   .05660
               .180   .03455
               .300   .01910
               .500   .01196


 Component 2:

    Inbreeding coefficients:

    441   .06250
    541   .10938


    2-locus inbreeding coefficients:
    (g4link is probability of IBD at both of 2 linked loci)

    proband  recomb   g4link
               freq     prob

        541    .000   .10938
               .010   .10234
               .040   .08386
               .050   .07849
               .100   .05660
               .180   .03455
               .300   .01910
               .500   .01196

Note that when the recombination frequency is 0.0, the two-locus inbreeding coefficient is the same as the one-locus inbreeding coefficient, as there is no recombination between the loci, thus they act as a single locus. When the recombination frequency is 0.5, the two loci are independent and the two-locus inbreeding coefficient is the squared one-locus inbreeding coefficient.


4.4 kin statements

At least one of the following `compute ...' statements are required to run program kin. If there is more than one component (connected pedigree) in the file, the component number must be specified. MORGAN assigns component numbers to the connected pedigrees within the pedigree file. If your data set contains more than one component, you may first run pedcheck to determine which individuals are assigned which component numbers. pedcheck will sort by component number in the output pedigree file, although it will not list component numbers in the file. The screen output generated when running pedcheck will give component nubmers and the number of individuals in each component. Check the error and warning messages when running kin to verify that component numbers were correctly specified. The program will quit if an individual's component number is incorrectly specified in the parameter file or if there is more than one component in the data set and no component is specified.

compute [component M] kinship coefficient N1 N2...

This statement names one or more pairs of pedigree members for which the kinship coefficient is to be computed.

compute [component M] inbreeding coefficient N1...

This statement names one or more pedigree members for whom the inbreeding coefficient is to be computed.

compute [component M] two-locus inbreeding coefficient N1...

This statement requests the computation of two-locus inbreeding coefficients, i.e. the probability of ibd at both loci, for the named individual. For the recombination frequencies at which the coefficients are computed, see the following statement.

set recombination frequencies X1 X2...

Two-locus inbreeding coefficients are computed for each of the list of recombination frequencies, in the range of 0.0 to 0.5. If frequencies are not given, the default values are: 0.0, 0.01, 0.04, 0.05, 0.10, 0.18, 0.30, and 0.50.


5. Simulating Marker and Trait Data in Pedigrees


5.1 Introduction to genedrop.

genedrop simulates pedigree data for analysis by other programs. Given a genetic map, it simulates genotypes at marker loci (linked or unlinked) and the discrete genotypes and polygenic values contributing to quantitative traits. The trait loci may or may not be linked to marker maps. Thus, one or more of three kinds of loci are simulated on a chromosome: markers, traits linked to markers, and traits not linked to markers.

Using a random number generator, genedrop first assigns marker and trait genotypes and polygenic trait values to the founders. Then meiosis indicators are simulated for non-founders in chronological order, thus determining the genes inherited and the founder labels. The order of simulation of markers and traits for each individual is: first, the marker genes, if any, are simulated, in the order mapped on the chromosome. Then linked traits, if any, are simulated in map order. Finally, unlinked traits, if any, are simulated.

Because founders of a pedigree are assumed to be unrelated, a unique identifier, a founder gene label or founder label, is assigned to each of the two haploid genomes of each founder. The user may choose to include these founder labels with the output pedigree, identifying the ancestral source of each gene at each locus in non-founders.

The user may provide random number seeds for both the marker simulation and the trait simulation. This permits multiple simulations, for a pedigree, of identical marker genotypes, but with different quantitative trait values.

The population and segregation model parameters (allele frequencies, trait genotype means, additive and residual variances) may be specified by the user and take default values if not specified. Several different trait models can be specified as in the following table:

       

Equal Genotypic Means

Zero Additive Variance

        non-genetic model

YES

YES

        polygenic model

YES

NO

        major gene model

NO

YES

        mixed model

NO

NO

The trait locus must be diallelic and the trait residual variance must be greater than zero. A very small residual variance can be specified if one desires to simulate a qualitative trait.

Genetic data on all individuals may be included in the simulated pedigree, or some individuals may be specified as "missing". If any individuals are to be missing genetic data, an "observed" indicator column must be included in the pedigree file. See section Pedigree file, for details.


5.2 Sample genedrop parameter file

Files for genedrop may be found in the `Simulation' subdirectory of `MORGAN_Examples'. The example here refers to `ped73_gdrop.par'. The seed file can be accessed as through the command line or in the parameter file. If from the parameter file following statements are needed:

 
input seed file 'marker.seed'
output marker seeds only
output overwrite seed file 'marker.seed'

The above statements specify `marker.seed' as both the input and output seed file. The statement `output marker seeds only' causes only the marker seeds to be saved before exiting the program. The default is to save both marker and trait seeds. Also, the `overwrite' option in line 4 enables the program to replace the current content in the seed file with the newly generated random numbers, which may be used for simulation in the future. The seed file contains one or more statements like `set marker seeds 0xde5e8d39 0x8ec584dd'. In the example, we have chosen to access the seed file from the command line, hence the first and third lines in the above example are commented out in `ped73_gdrop.par'. See the command line used to simulate data in the next section.

With the statement `output pedigree chronological', the output pedigree will be in chronological order, a requirement for other MORGAN programs to run properly. In addition, a file name must be specified for the output pedigree file. This can be done here in the parameter file using an `output pedigree file' statement or it can be specified on the command line, as shown in the next section.

 
simulate chrom 1 markers 10 traits 1

The above statement asks genedrop to simulate ten markers loci and one trait locus on chromosome 1. If no trait locus is to be simulated, the part `traits 1' can be removed.

 
map chrom 1 marker dist  10 10 10 10 10 10 10 10 10

The above statements indicates marker maps on chromosome 1, as specified by map distances (in Haldane centiMorgans). They can also be specified by recombination fractions, e.g. (does not correspond to the above map distances):

 
map chrom 1 marker recomb fracs 0.1 0.5 0.2

Marker allele frequencies are set by the following lines:

 
set chrom 1 markers 1	freqs 0.13 0.66 0.16 0.05
set chrom 1 markers 2   freqs 0.06 0.23 0.41 0.25 0.05
set chrom 1 markers 3   freqs 0.11 0.02 0.01 0.06 0.24 0.56
set chrom 1 markers 4   freqs 0.07 0.04 0.89
set chrom 1 markers 5	freqs 0.12 0.11 0.03 0.03 0.50 0.21
set chrom 1 markers 6	freqs 0.50 0.44 0.06
set chrom 1 markers 7	freqs 0.01 0.33 0.62 0.04
set chrom 1 markers 8	freqs 0.20 0.05 0.42 0.27 0.06
set chrom 1 markers 9	freqs 0.18 0.18 0.25 0.16 0.08 0.15
set chrom 1 markers 10	freqs 0.17 0.35 0.04 0.29 0.15

In the case where several markers have the same number of alleles and allele frequencies, one can group those markers together into one line such as

 
set chrom 1 markers 11 12 13 15 freqs 0.2 0.8

although we consider it good practice to specify the frequencies separately for each marker.

The following five lines describe the trait locus. The trait locus is between markers 2 and 3 on chromosome 5, with recombination fraction 0.1127 to marker 2. The trait locus can have only two alleles; here the frequencies are 0.95 and 0.05, for alleles 1 and 2, respectively. The mean values of the trait for each trait locus genotype are on the next line. Values correspond to the (1 1), (1 2) and (2 2) genotypes, respectively. A non-zero residual variance indicates that there is an environmental component to the trait.

 
map chrom 1 trait 1 marker 5 dist 5
set trait 1 freqs 0.5 0.5

set trait 1 geno means 90 100 110
set trait 1 residual variance 25.0
set trait 1 additive variance 0.0

The final three lines of the parameter file request that founder gene (or genome) labels and latent variable values for the trait be included in the output file, and data be simulated for all (observed and unobserved) individuals. Founder gene labels indicate, for all non-founders, which founder alleles were passed to the individual. The latent trait variables mentioned in the final statement are the trait locus genotypes and the additive and residual components of the trait value. They will precede the trait value in the output file.

 
output pedigree record founder gene labels
output pedigree record trait latent variables
output pedigree record unobserved variables

5.3 Running genedrop examples and sample output

Two examples are available under the subdirectory `Simulation/'. The commands to run these examples are similar to the following:

 
genedrop ped73_gdrop.par ped ../ped73.ped seed ../marker.seed oped ../ped73_gdrop_simu.oped

Upon running the genedrop example, notice (but do not worry about) the appearance of a warning message on the screen: `Overrides previous statement of same type (W)'. The warning was triggered by the specification of `seeds' as both the input and output seed file, without inclusion of `overwrite' in the `output seed file' statement. When an overwrite is not requested, MORGAN appends the new output seeds to the existing file at the end of the run. Thus, at the next run, more than one `set marker seeds' statement exists in the seed file. The program uses only the last `set marker seeds' statement in the file and issues the warning to indicate this to the user. To avoid this warning (and an ever-growing seed file), one can access the seed file from the parameter file as at the beginning of the previous section Sample genedrop parameter file, using the `overwrite' option when outputting the seeds.

Unlike other MORGAN programs, genedrop always creates an output pedigree file, as the function of the program is to simulate marker and trait data. The output file, `ped73_gdrop_simu.dat' here, has the same structure as the input file `ped73.ped', with one individual per record (line), but the output file does not include the parameter statements at the top of the input file and contains additional columns. The first four items are the individual's name, the names of the parents and gender. If no addition output options are set, the next items are the genotypes of the markers (two items per marker) in their order on the chromosomes, and the trait values in the order of trait labels.

Since `output pedigree record trait latent variables' was included in the parameter file, the output file contains four additional columns preceding the trait value. They are: trait locus genotype (2 columns), additive component of the trait value and residual component of the trait value. In this example, everyone has a `0.000' in the additive component column, as we set the additive variance to zero in the parameter file.

Because `output pedigree record founder gene labels' is set, the founder gene lables (FGLs) for markers precede the marker genotypes and the trait FGLs precede the trait locus genotypes (only if latent variables are requested, as in this example).

Also, the `output pedigree record unobserved variables' statement is included in `gdrop.par', so an observed indicator would follow gender in the output pedigree file.


5.4 genedrop statements


5.4.1 genedrop computing requests

simulate [chromosome I] markers J [traits K]

One statement is given for each chromosome on which markers or both markers and traits are to be simulated. Only unlinked traits are simulated if no such statement is given. The `chromosome' keyword can be omitted if all markers and linked traits are on the same chromosome. The markers and traits are labeled as positive integers. The marker labels are relative to each chromosome whereas the trait labels are absolute. That is, the markers on chromosome 5 are labeled 1, 2, …, and so are the markers on chromosome 7. The traits however are labeled, for example, as 1 on chromosome 5 and as 2 on chromosome 7. There can be at most 10 traits and 500 markers.

simulate unlinked traits K1...

Optional. This statement specifies traits to be simulated which are not linked to markers, and hence, have no map specifications.


5.4.2 genedrop mapping model parameters

map [chromosome I] [gender (F | M)] marker ( [Kosambi] distances | recombination fractions | [Kosambi] positions) X1 X2...

This statement is required if simulation of more than one marker is requested. One statement is used per chromosome. This statement specifies the marker map or positions given in units of genetic distances (cM), or recombination fractions between markers. Marker map or positions can be sex-specific if gender is included in the statement. If `distances' is chosen, intermarker distances must be provided such that the number of distances is one less than the number of markers. If `positions' is chosen, the number of positions must equal the number of markers, as these are absolute positions relative to a zero point to the left of all of the markers. The Haldane mapping function is used to convert between the genetic distances and recombination fractions unless Kosambi is specified.

map [chromosome I] [gender (F | M)] traits K1 K2 ... markers J1 J2 ... ( [Kosambi] distances | recombination fractions) X1 X2...

This statement is required if simulated traits are to be linked to markers. That is, it is not required if no traits or only unlinked traits are to be simulated. The statement specifies the location of each trait locus with respect to one of the marker loci. Thus, the number of traits listed in the statement must be equal to the number of markers listed and to the number of distances (or recombination fractions) listed. The trait locus will follow the corresponding marker locus (to the right, so to speak) at the distance specified. To simulate a trait locus that precedes all marker loci, list marker `0' in the statement. For example, with `map trait 3 2 marker 6 0 distances 5 4', traits 3 and 2 will be placed 5 cM to the right of marker 6 and 4 cM to the left of marker 1, respectively.


5.4.3 genedrop population model parameters

set [chromosome I] markers K1... frequencies X1 X2...

This statement specifies markers allele frequencies. Allele frequencies for a marker should sum to between 0.9999 and 1.0001. Otherwise they are normalized. Multiple markers can be specified in a single statement if they reside on the same chromosome and have the same number of alleles with the same allele frequencies.

set traits K1... frequencies X1 X2

This statement specifies the trait allele frequencies. Allele frequencies for a trait should sum to between 0.9999 and 1.0001. Otherwise they are normalized. Multiple trait loci can be specified in a single statement if they have the same allele frequencies. Trait loci must be biallelic.

set normalized frequencies

If the set of allele frequencies for each marker and trait is to be normalized, this statement is given. Normalization of the frequencies is recommended when simulating pedigree data, but not recommended when using the other programs.

set traits K1 ... genotype means X1 X2 X3

Since two alleles are simulated for each trait locus, three means must be specified for the polygenic trait values: one each for the (1 1), the (1 2) or (2 1), and the (2 2) genotypes. The default values 0.0, 0.0, and 0.0.

set traits K1 ... additive variance X

Here we specify the genetic variance for one or more trait. One of there statements is given for each value assigned. The default variance is 0.0.

set traits K1 ... residual variance X

This statement is like the preceding one. The environmental contribution to the trait is set using this statement.


5.4.4 genedrop computational parameters

set marker seeds H1 H2

This statement initializes the seeds for the random number generator in the gene dropping algorithms. The seeds are to be positive and no greater than hexadecimal 0xFFFFFFFF, with the first seed (congruential seed) odd, and the second seed (Tausworthe seed) nonzero. If no seeds are specified for marker simulation, default seeds are used.

set trait seeds H1 H2

This statement initializes the seeds for trait simulation. If no seeds are given, the starting seeds for trait simulation are the seeds returned by the random number generator at completion of marker simulation.


5.4.5 genedrop output pedigree options

output pedigree record founder gene labels

When this option is selected, each record contains, for each locus, a pair of founder gene labels. Each founder is assigned a pair of labels, which are in the same order as the names of the parents. Then, for each locus for each descendant, founder labels are determined by the simulated meiosis indicators.

This statement is useful in cases where we want the "disease" allele in the pedigree to come from a particular founder.

output pedigree record trait latent variables

This statement requests that the quantitative trait value be output for each trait. With this option, additional items appear in the record: the genotype at each trait locus and the additive and residual component of each quantitative trait.

output pedigree record unobserved variables

When this option is not selected, unobserved individuals have the genotype at each locus represented as `0 0', the founder gene label (if requested) at each locus represented as `0 0', and each quantitative trait value given as `999'.

If this option is set, then genotypes, gene labels and trait values are output for unobserved individuals, as well as for the observed members of the pedigree. An additional data field, following the gender indicator, specifies whether or not the individual is observed. `0' indicates unobserved, `1', observed.

input pedigree record observed (absent | present)

The observed indicator is used to designate which members are observed, with '0' indicating unobserved, '1' indicating observed. When the observed indicator is present in the pedigree file, it follows gender (or parents, if gender is not present). If this statement is not given, all pedigree members are assumed to be observed. See also the next statement `assume all observed'.

If individuals are flagged, in the pedigree file, as unobserved, the default is to indicate, in the output pedigree file, the data for these individuals as missing.

assume all observed

This statement results in all members of the pedigree "observed" in the simulation, with the observed indicator, if present in the input file, ignored.


5.4.6 genedrop output seed file options

output (marker | trait) seeds only

If an output seed file is given, both ending marker and trait seeds are saved unless one or the other is requested in this statement.


6. Simulating Marker Data Conditional on Trait Data in Pedigrees


6.1 Introduction to markerdrop

markerdrop simulates marker data at markers linked to a potential trait locus. The user must specify whether marker data simulation is to be conditional on a trait model or on an inheritance pattern at the trait locus. The choice of a trait model or an inheritance pattern will dictate which additional parameter statements must (or may) be included in the parameter file.


6.2 Sample markerdrop parameter file - conditional on trait

Files for markerdrop may be found in the `Simulation' subdirectory of `MORGAN_Examples'. The sample parameter file, `ped73_mdrop_trait.par', requests simulation of marker data conditional on a trait model. The trait is assumed to be discrete when simulation is conditional on a trait model. The relevant section of the file is:

 
map trait 2 marker 5 distance 5.0 
simulate markers 10 using trait
map marker positions 10 20 30 40 50 60 70 80 90 100
set incomplete penetrances .05 .6 .95
set trait 2  freqs 0.5 0.5

set markers 1	freqs 0.13 0.66 0.16 0.05
set markers 2   freqs 0.06 0.23 0.41 0.25 0.05
.
.
.
set markers 10 data

101 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
201 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
202 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
301 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
302 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
304 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
.
.
.

The first three lines of each of the parameter files above are required for markerdrop, regardless of simulation of marker data on a trait or on an inheritance pattern. The `map trait' statement identifies the trait locus to be used in the simulation and gives its position relative to the markers on which we wish to simulate data. In this example, the trait locus follows marker 5 at a distance of 5 centiMorgans. The `simulate markers 10 using trait' statement states the number of markers to be simulated and specifies simulation using a trait model. The `map marker positions' statement specifies the spacing of the markers to be simulated.

Compared with the parameter file needed for simulation conditional on inheritance pattern (see next section), `ped73_mdrop_trait.par' contains two additional lines (lines 4 and 5), which are required for discrete traits (the default for simulation on a trait). `set incomplete penetrances' specifies the probability of exhibiting the trait for individuals with trait locus genotypes `1 1', `1 2' (or `2 1') and `2 2', respectively. `set trait ... freqs' specifies trait allele frequencies.

Starting from line 6, the statements specify allele frequencies at the first two markers and are required.

Following the statement `set markers 10 data', patterns of marker data to be simulated are given for each individual. Ten markers correspond to 10 pairs of 0's or 1's, with 0's representing "unobserved" and 1's "observed".

File `ped73_mdrop_trait.par' uses pedigree file `ped73_ph.ped'. The file format section and first few lines of the pedigree data setion of this file are below.

 
input pedigree size 73
input pedigree record names 3 integers 3

***************************************************
101 0 0 1 0 0
102 0 0 2 0 0
201 101 102 1 0 0
202 101 102 2 0 0
2010 0 0 2 0 0
301 201 2010 1 0 0
302 201 2010 2 1 1

The first three columns are indices of each individual and his/her parents, whereas the last three columns are sex (1=male, 2=female), observed status (0=unobserved, 1=observed) and affectation status (0=missing, 1=unaffected, 2=affected). The affectation status is actually specified through a statement in the parameter file:

 
input pedigree record trait 2 integer 3

This statement can be included in the pedigree file instead. But markerdrop can simulate data for markers linked to only one trait locus, as specified in the `map' statement in the parameter file.


6.3 Sample markerdrop parameter file - conditional on inheritance pattern

The sample parameter file, `ped73_mdrop_inhe.par', requests simulation of marker data conditional on an inheritance pattern. The relevant section of the file is:

 
map trait 3 marker 4 recomb frac 0.01 
simulate markers 10 using inheritance
map marker positions 10 20 30 40 50 60 70 80 90 100

set markers 1	freqs 0.13 0.66 0.16 0.05
set markers 2   freqs 0.06 0.23 0.41 0.25 0.05
.
.
.
set markers 10 data

101 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
201 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
202 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
301 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
302 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
304 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
.
.
.

Again, the first three lines of each of the parameter files above are required for markerdrop. The `map trait' statement identifies the trait locus to be used in the simulation and gives its position relative to the markers on which we wish to simulate data. In this example, the trait locus follows marker 4 with a recombination fraction of 0.01. The `simulate' statement states the number of markers to be simulated and specifies simulation using the inheritance pattern. The `map marker positions' statement specifies the spacing of the markers to be simulated. Lines 4 and 5 specify allele frequencies at the first two of the ten markers. Statement `set markers 10 data' and lines below specify the missing data pattern with 0 for missing and 1 for observed.

File `ped73_mdrop_inhe.par' uses pedigree file `ped73_mdrop_inhe.ped'. The file format section and first few lines of the pedigree data setion of this file are below.

 
input pedigree size 73
input pedigree record names 3 integers 4
input pedigree record trait 3 integer pairs 3 4

***************************************************
101 0 0 1 0 -1 -1
102 0 0 2 0 -1 -1
201 101 102 1 0 0 1
202 101 102 2 0 1 1

The first three columns are indices of individuals and their parents. The next two are sex and observation status. The last two columns are inheritance indicators with the first being the paternal ones and the second the maternal ones. A founder's inheritance indicators are `-1 -1'.

markerdrop can simulate data for markers linked to only one trait locus, as specified in the `map' statement in the parameter file.


6.4 Running markerdrop examples and sample output

To run the two examples under the subdirectory `Simulation/', one may use the following commands:

 
markerdrop ped73_mdrop_inhe.par

and

 
markerdrop ped73_mdrop_trait.par

Below is a section of `mdrop_inhe.out', the output file generated using the parameter file `ped73_mdrop_inhe.par' and the pedigree file `ped73_mdrop_inhe.ped'. Similar output would be obtained using `ped73_mdrop_trait.par'.

 
 Inter-locus distances in cM, using Haldane map function:

                            T3                                            
 ----------------------------+------------------------------------------  
   10.0   10.0   10.0    1.0    9.0   10.0   10.0   10.0   10.0   10.0    
 +------+------+------+-------------+------+------+------+------+------+  
M1     M2     M3     M4            M5     M6     M7     M8     M9     M10 


 assigned FGL in all listed individuals:
 trait locus, followed by 10 marker loci
 101  2 1  2 1  2 1  2 1  2 1  2 1  2 1  2 1  2 1  2 1  2 1
 102  4 3  4 3  4 3  4 3  4 3  4 3  4 3  4 3  4 3  4 3  4 3
 201  2 3  2 3  2 3  2 3  2 3  2 3  2 3  2 3  2 3  2 3  2 3
 202  2 4  2 3  2 3  2 3  2 4  2 4  2 4  2 4  2 4  2 4  2 4
 2010  6 5  6 5  6 5  6 5  6 5  6 5  6 5  6 5  6 5  6 5  6 5
 301  2 6  2 5  2 6  2 6  2 6  2 6  2 5  2 5  2 5  2 5  2 5
 302  2 6  3 6  2 6  2 6  2 6  2 5  2 5  2 5  2 6  2 6  2 6
 
 assigned marker genotypes in accordance with data availability:
 101  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0
 102  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0
 201  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0
 202  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0
 2010  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0
 301  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0
 302  4 2  4 3  5 6  3 3  6 1  1 2  3 3  1 4  1 6  1 4

In the output file above, the marker map is shown, as specified in the parameter file. Below the map, founder genome labels (FGL) are listed. In this section of the pedigree, individuals 101, 102 and 1020 are founders. They have been assigned two unique FGL each. One of each founder's FGL has been randomly selected to be passed to their offspring. Using the FGL, marker genotypes have been assigned to individuals on whom data were specified as available in the parameter file, individual 302 for example.


6.5 markerdrop statements


6.5.1 markerdrop computing requests

markerdrop requires one of the following two statements:

simulate markers J using trait

This statement requests the simulation of J markers conditional on a trait model. If marker data are simulated conditional on a trait model, the user must specify trait allele frequencies, genotypic penetrances and a map position for the trait locus within the parameter file. Affection status of each individual must be specified in the pedigree file following gender, if present.

simulate markers J using inheritance

This statement requests the simulation of J markers conditional on an inheritance pattern at the trait locus. If marker data are to be simulated conditional on an inheritance pattern, the user must specify a map position for the trait within the parameter file. In addition, a pair of inheritance indicators for each individual must be included in the pedigree file following gender, if present. The first of the pair describes paternal inheritance at the trait locus and the second describes maternal inheritance. Inheritance indicators are coded as `0', `1' or `-1', corresponding to segregation of the trait allele from the individual's grandmother, grandfather, or unknown, respectively. For example, `0 0' indicates that the individual inherited the alleles carried by both grandmothers at the trait locus, while `0 1' indicates inheritance of the paternal grandmother's and maternal grandfather's alleles.


6.5.2 markerdrop mapping model parameters

map [gender (F | M)] marker ( [Kosambi] distances | recombination fractions | [Kosambi] positions) X1 X2

This statement is required for markerdrop if more than one marker is to be simulated. It specifies the marker map (optionally a sex-specific map), in units of genetic distance (cM), marker position (cM), or recombination fraction. If distance is selected, markerdrop will expect one fewer values than the number of markers, as these are intermarker distances. If position is expected, the same number of values as markers will be expected, as these are the positions of the markers relative to some zero point to the left of marker 1. If Kosambi is not specified, the Haldane mapping function is used to convert between genetic distance and recombination fraction.

map [gender (F | M)] trait K marker J ( [Kosambi] distance | recombination fraction ) X

This statement is required for markerdrop; it tells the program which trait to use in the simulation of marker data and gives a location for the trait locus, either as a map distance or recombination fraction, following the marker listed in the statement. As with genedrop, to simulate a trait locus position that precedes all markers, list the marker number as `0'.


6.5.3 markerdrop population model parameters

set trait K1 frequencies X1 X2

This statement specifies trait allele frequencies. Trait must be biallelic; both allele frequencies must be listed and must sum to a value between 0.9999 and 1.0001. Otherwise markerdrop automatically normalizes the allele frequencies and issues a warning. Only one trait may be included in this statement.

set marker names N1 N2

This statement specifies marker names in the order of their position along the chromosome. Default names are marker-1, marker-2, etc.

set markers J1 … frequencies X1 X2

Marker allele frequencies are specified using this statement. A marker can have up to 100 alleles and all allele frequencies must be listed. For each marker, the allele frequencies should sum to between 0.9999 and 1.0001. Otherwise they are automatically normalized and a warning message will be issued. Multiple markers can be included in a single statement if they have the same number of alleles with the same frequencies.


6.5.4 markerdrop computational parameters

set incomplete penetrances

This statement is required for markerdrop when using a trait model or when using inheritance indicators with a discrete trait. A penetrance, the probability of expressing the trait given a particular trait locus genotype, must be specified for each of the 3 possible genotypes at the trait locus. For example `set incomplete penetrances 0.15 0.85 0.99' specifies that the probability of expressing the trait is 0.15, 0.85 and 0.99 for (1,1), (1,2) and (2,2) trait locus genotypes, respectively.

set trait data discrete

This statement is optional. A discrete trait is the default when simulating using a trait model.

As with genedrop, marker seeds and trait seeds can be specified or the default values can be used, See section genedrop computational parameters.


6.5.5 markerdrop input file options

The statements below are optional for markerdrop; they are used to indicate a change from the default order of trait values in the pedigree file. The first statement may be included if marker data are to be simulated conditional on a trait model and the second may be included if data are to be simulated conditional on an inheritance pattern.

input pedigree record traits K1 K2 K3 ... integers I1 I2 I3 ...

Unless this statement is present, the first integer following gender, if present, is assumed to be data for trait 1, the next integer for trait 2, and so on. Use this statement to specify an alternate correspondence between integer values in the record and trait numbers.

input pedigree record traits K1 K2 ... integer pairs I11 I12 I21 I22 ...

Unless this statement is present, the first two integers following gender, if present, in the pedigree file are assumed to be the inheritance indicators at the locus for trait 1. The next two integers are assumed to be the inheritance indicators at the locus for trait 2, and so on.


7. Estimating a priori ibd Probabilities by Monte Carlo


7.1 Introduction to ibddrop

ibddrop estimates probabilities of gene identity by descent, ibd, (such as kinship, inbreeding, or multi-gene identities) by Monte Carlo in the absence of data. Given the pedigree and a genetic map, ibddrop simulates meioses indicators and scores them to estimate the ibd probabilities among a set of gametes.

The simplest example of estimation of ibd probabilities among a set of gametes is the computation of an individual's inbreeding coefficient. In this example, the set of gametes in question are the maternal and paternal gametes that make up the individual. A set of two gametes can be either ibd or not-ibd. To keep track of ibd status among the gametes, we can label the paternal allele `1'. If the two alleles are ibd, the maternal allele would also be labeled `1', and the resulting ibd pattern would be `1 1'. If the two alleles are not ibd, the maternal allele would be labeled `2' and the resulting pattern would be `2 2'. The individual's inbreeding coefficient is the probability that the two alleles follow the `1 1' pattern.

If there are three gametes in the set, there are five potential ibd patterns: `1 1 1' (all three gametes are ibd), `1 1 2' (the first two are ibd and the third is not), `1 2 1' (the first and third are ibd) , `1 2 2' (the last two are ibd), and `1 2 3' (none are ibd). ibddrop can estimate probabilities of ibd patterns among up to 10 gametes in a set. ibddrop outputs a probability for each ibd pattern at each marker.

Gene identity can be scored either for each locus separately, in which patterns of identity among up to ten haplotypes can be scored, or it can be scored jointly over a moving window of several loci. If the moving window option is selected, genedrop calculates the probability that all of the gametes in the set are ibd at each locus in the window. As a result, it is then possible to determine the probability that all or some of the gametes are ibd for a particular haplotype.


7.2 Sample ibddrop parameter file

Files for ibddrop may be found in the `IBD' subdirectory of `MORGAN_Examples'. There are two sample parameter files for ibddrop; let's take a look at `jv_rep_ibd.par' first.

 
input pedigree file "jv_rep.ped"

simulate markers 5  trait 1

map          markers   distances 44.6 44.6 11.2 11.2
map trait 1  marker 2  distances 22.3

set component 1  proband gametes 331 0 333 1
set component 2  proband gametes 541 0 541 1 341 0 343 1

input seed file '../sampler.seed'

set MC iterations 20000

The parameter file specifies the pedigree file name `jv_rep.ped' and then asks for five markers and one trait locus. Since there are no data, the distinction between marker and trait doesn't mean anything - it is just a way to specify a set of loci, one of which may be unlinked. `jv_rep.ped' contains data on 30 individuals, including gender and one trait.

The two `map' statements specify the genetic map. From the first statement, the genetic distances between the markers are 44.6, 44.6, 11.2 and 11.2 centiMorgans. From the second statement, the trait lies between markers 2 and 3, at 22.3 centiMorgans with marker 2.

The `set proband gametes' statements tell ibddrop which gametes to score. Here we selected, from component 1 (the first family in the data set), the maternal (0) gamete of `331' and the paternal (1) gamete of `333'. The next statement selected four gametes to score from family 2. Note that characters are allowed in the names of individuals.

The `input seed file' statement enables the file to use the seeds from file `sampler.seed'. When the program finishes running, new seeds will be generated and appended at the end of this file. Seeds can also be set using the `set sampler seeds' statement (see ibddrop statements).

The number of Monte Carlo iterations is set to be 20,000 by the `set MC iterations' statement.

Note that if one would like to compute multilocus ibd probability, statement `set locus window' can be used to specify number of loci to score simultaneously.


7.3 Running ibddrop example and sample output

Under the subdirectory `IBD/', run the example with the command:

 
ibddrop jv_rep_ibd.par

The genetic map specified by the statements `map markers distances' and `map trait 1 marker 2 distances' is below. Note the position of the trait locus (T1) with respect to the marker loci.

 
 Distances (cM):

              T1                       
 --------------+---------------------  
   44.6   22.3   22.3   11.2   11.2    
 +------+-------------+------+------+  
M1     M2            M3     M4     M5  

Since the parameter file contains two `set proband gametes' statements, ibddrop will produce two sets of results. The results for the second component are detailed below.

 
 Summary for component 2:

    Probabilities of IBD patterns

       Proband gamete set 1:  541 0  541 1  341 0  343 1

       pattern marker-1 marker-2  trait-1 marker-3 marker-4 marker-5    label

       1 1 1 1    .0322    .0312    .0289    .0301    .0294    .0306        0
       1 1 1 2    .0291    .0294    .0295    .0284    .0299    .0295        1
       1 1 2 1    .0129    .0143    .0147    .0140    .0142    .0134        3
       1 1 2 2    .0103    .0098    .0097    .0097    .0089    .0092        4
       1 1 2 3    .0273    .0280    .0263    .0269    .0276    .0267        5
       1 2 1 1    .0654    .0638    .0634    .0652    .0633    .0649        6
       1 2 1 2    .0060    .0063    .0057    .0062    .0062    .0058        7
       1 2 1 3    .0571    .0576    .0581    .0601    .0602    .0584        8
       1 2 2 1    .0692    .0680    .0661    .0685    .0703    .0728        9
       1 2 2 2    .0483    .0527    .0488    .0493    .0486    .0485       10
       1 2 2 3    .1389    .1405    .1386    .1384    .1371    .1326       11
       1 2 3 1    .1348    .1366    .1354    .1379    .1369    .1391       12
       1 2 3 2    .0267    .0250    .0259    .0266    .0267    .0285       13
       1 2 3 3    .0949    .0945    .0980    .0949    .0968    .0975       14
       1 2 3 4    .2469    .2424    .2508    .2437    .2442    .2427       15

The probabilities are summarized by the ibd pattern. Each integer in the pattern represents one of the gametes that ibddrop was asked to score. Same numbers indicate gametes that are ibd. For instance, `1 1 1 1' means all four gametes are ibd; `1 2 1 1' means gametes 1, 3, and 4 are ibd, while gamete 2 is not ibd with the others; `1 2 3 4' means all four gametes are not ibd.

The ibd patterns are scored for each locus separately; there is a column for each of the five markers and one for the trait.

To compute multilocus ibd probabilities, say, 3 loci, use `set locus window 3' in the parameter file and re-run the example using the same command line. And the interesting part of the output is

 
 Summary for component 2:

    Probabilities of IBD patterns for windows of 3 loci

       Proband gamete set 1:  541 0  541 1

         IBD  wndw 1 wndw 2 wndw 3 wndw 4

       0 0 0   .7291  .7443  .7657  .7881
       0 0 1   .0698  .0655  .0482  .0478
       0 1 0   .0640  .0532  .0365  .0266
       0 1 1   .0279  .0252  .0369  .0284
       1 0 0   .0806  .0696  .0703  .0493
       1 0 1   .0087  .0080  .0067  .0049
       1 1 0   .0135  .0238  .0177  .0268
       1 1 1   .0063  .0105  .0180  .0281

This time, ibddrop was asked to compute ibd probabilities in windows of three loci at a time. This was done using the `set locus window' statement. Since the trait locus is unlinked to the marker loci in this example, it is placed to the left of the five marker loci on the map. Thus, the first window, `wndw 1' in the table above, includes the trait locus and the first two marker loci, `wndw 2' includes the first three marker loci, `wndw 3' includes marker loci 2, 3 and 4, etc. The values in the `ibd' column at the left of the table, represent `ibd' patterns. The pattern `0 0 0' means that the selected gametes are not ibd at the three loci in each window. The pattern `0 0 1' means that the selected gametes are not ibd at the first two loci in the window, but are ibd at the third. The values in the columns give the probability of the ibd pattern at the left for each of the four windows. For example, the probability that the maternal and paternal gametes of individual 541 are ibd at marker loci 3 and 5, but not at marker locus 4 is 0.0049.


7.4 ibddrop statements

set [component M] proband gametes N1 K1  N2 K2...

In this statement, the user specifies which gametes ibddrop is to score. Each statement must contain gametes from a single component, as the components are assumed to be independent, i.e., the probability of ibd between gametes from different components is zero. Pairs consisting of an individual's name and a meiosis indicator are listed, with `0' indicating the individual's maternal gamete and `1' indicating their paternal gamete.

In the current version of MORGAN, the number of proband gametes in a set is limited to 10.

set [chromosome I] locus window K

This statement gives the window size (number of loci) for which the multilocus ibd probabilities are scored. If no size is given, each locus is scored separately.

set sampler seeds H1 H2

This statement initializes a pair of seeds for the random number generator. The seeds must be positive and no greater than `0xFFFFFFFF', with the first seed (congruential seed) odd, and the second seed (Tausworthe seed) nonzero. If no seeds are specified, default seeds are used.

set MC iterations I

Required. This statement specifies the total number of Monte Carlo iterations.

simulate markers K1 trait 1

This statement specifies the number of markers to be simulated, as well as a linked trait. A linked or unlinked trait must be specified.

simulate unlinked trait 1

This statement specifies a trait to be simulated that is not linked to markers. Only one trait can be simulated and this trait will be placed to the left of all markers.


8. Using MCMC to Estimate Parameters of Interest in Pedigree Data


8.1 Genetic model

For MORGAN programs, genetic relationships between individuals in a data set are specified in the pedigree file. Individuals at the top of a pedigree (family), whose parents are unspecified, are the founders of the pedigree; other individuals are non-founders. By definition, founders are unrelated to one another. Descent through the pedigree of genes at marker and trait loci is tracked by meiosis indicators, also called inheritance indicators. At each locus, non-founders are assigned two 0/1 meiosis indicators, representing genes inherited from the individual's father and mother. The first indicator is coded as `0' if the non-founder inherited the gene carried by her father's mother and `1' if she inherited the gene carried by her father's father, i.e., her paternal grandmother and grandfather, respectively. The second indicator is coded as `0' if the non-founder inherited the gene carried by her mother's mother and `1' if she inherited the gene carried by her mother's father, i.e., her maternal grandmother and grandfather, respectively. We use the term gene to refer to a segment of DNA that is copied from parents to offspring, the concept captured by Mendel's term factor.

The set of all inheritance indicators is notated as S = $\{S_{i,j}\}$ , where

    $S_{i,j}$  =

0 if DNA involved in meiosis i at locus j is the gamete's parent's maternal DNA

1 if DNA involved in meiosis i at locus j is the gamete's parent's paternal DNA

$S_{\bullet j}$ is referred to as the inheritance vector at locus j. It is the list of the inheritance indicators for all meioses in the sample at a single locus (locus j). The elements of $S_{\bullet j}$ are independent of one another, as they represent the grandmaternal or grandpaternal inheritance for each gamete at the locus and meiosis events are independent of one another. $S_{i
\bullet}$ is the list of the inheritance indicators at all loci for a single meiosis (i.e., within a single gamete). The elements of $S_{i
\bullet}$ have first-order Markov dependence. That is, the probability of a `0' or `1' at locus j + 1 is a function of the the value at locus j and the recombination fraction between the loci. In other words, the value at locus j + 1 is conditionally independent of all other loci, given the value at locus j.

If inheritance indicators are known, identity by descent (ibd) is also known. If probabilities can be assigned to patterns of inheritance indicators in a pedigree, the probability that any two gametes in the pedigree are ibd can be computed.


8.2 LM-sampler

MORGAN's Autozyg and Lodscore programs use MCMC to estimate ibd probabilities and multilocus LOD scores, respectively, in pedigrees. The latent (unobserved) parameters of interest in MCMC estimation of ibd probabilities and LOD scores are the meiosis indicators at marker and/or trait loci for each non-founder in the pedigree. Observed data are trait values and unphased marker genotypes for some or all pedigree members. With unphased genotypes, it may or may not be possible to determine the grandparental source (i.e., the inheritance indicator) of each allele unambiguously. MORGAN uses MCMC to sample meiosis indicators (S) conditional on observed data (Y).

MORGAN implements two different block Gibbs samplers, a locus- and a meiosis-sampler, for sampling from S conditional on Y. Each method updates a subset, $S_u$ , of S conditional on Y and on the rest of S ($S_f$ ). The difference between the two methods is the choice of $S_u$ .

The locus-sampler (or L-sampler) chooses $S_u$ to be $S_{\bullet j}$ for some j. In other words, a single locus is selected and inheritance indicators at that locus are updated based on the genotype data at all loci and on the current realization of inheritance indicators at all loci other than j. The MORGAN user can determine whether a locus is to be selected at random each time or if loci are taken in a pre-determined random order, as described in the next section. This method is a modification of the Elston-Stewart algorithm (Elston RC and Stewart J (1971) Human Heredity 21:523-542) and it can be used whenever single locus pedigree peeling is possible. If inter-locus recombination fractions are strictly positive, the L-sampler is irreducible. On the downside, mixing is poor if loci are tightly linked.

The meiosis-sampler (or M-sampler) chooses $S_u$ to be $S_{i
\bullet}$ for some i. It is , in a sense, perpendicular to the L-sampler in that at each iteration a single meiosis is selected and inheritance indicators for that meiosis are updated conditional on the genotype data at all loci and the current realization of inheritance indicators for all other meioses. The M-sampler is a modification of the Lander-Green algorithm (Lander ES and Green P (1987) PNAS 84:2363-2367) for peeling along a chromosome using the Baum algorithm (Baum LE (1972) in O. Shea, ed., `Inequalities-III; Proceedings of the Third Symposium on Inequalities, UCLA, 1969', Academic Press, NY pp. 1-8). At each iteration, a single meiosis is randomly selected or meioses can be updated sequentially. As with locus selection in the L-sampler, MORGAN allows the user to choose the meiosis selection The M-sampler mixes well in the presence of tightly linked loci, but it can perform poorly in large pedigrees with missing data.

MORGAN's Autozyg and Lodscore programs use a combination of the L- and M-samplers, referred to as the LM-sampler. The user may choose the fraction of updates that are of each type; the default is 20% L-sampler, 80% M-sampler.

For mathematical details on the L-, M- and LM-samplers, see Thompson EA (2000) Statistical Inference from Genetic Data on Pedigrees, in `NSF-CBMS Regional Conference Series in Probability and Statistics, Volume 6. Institute of Mathematical Sciences, Beachwood OH and American Statistical Association, Alexandria VA.


8.3 MCMC parameters and options

MORGAN can obtain a starting configuration for S in one of two ways. The default method is by sequential imputation. The alternative is to contruct an L-sampler realization independently for each locus, conditional on the genotype data at that locus only (the locus-by-locus option). Sequential imputation tends to produce initial configurations that have higher conditional probabilities, but locus-by-locus sampling can sometimes reveal other modes in the complex space of S values. The MORGAN user can select the L-sampler setup method by including the `use locus-by-locus for setup' statement. If sequential imputation is selected, the user can specify the number of sequential imputation samples from which the starting configuration of meiosis indicators is to be selected, using the `use I sequential imputation realizations for setup' statement. The default is 10% of the total MC iterations.

At each MCMC iteration, MORGAN selects a locus (with L-sampler) or meiosis (with M-sampler) to update. Two different selection methods are available: sample by step and sample by scan. If `sample by scan' is chosen, all loci or meioses are updated one-at-a-time in a predetermined random order. This option is the default. If `sample by step' is chosen, a single locus or meiosis is randomly selected for updating at each iteration. The sampling method selected applies to the entire MCMC run, including burn-in, pseudo-prior computation and main iterations.

When running a MORGAN MCMC program, the user must specify the desired number of several types of iterations. For all programs, some number of initial burn-in iterations must be performed. These realizations are discarded and, if the burn-in period is sufficiently long, subsequent points will be dependent samples from the desired stationary distibution. The `set burn-in iterations' statement is used to specify the number of desired burn-in iterations, with the default value varying by program. The desired number of "main" iterations must be specified using the `set MC iterations' statement; there is no default number of main iterations. Recommended number of iterations is on the order of $10^5$ . lm_bayes performs a third type of iteration to calculate pseudo-priors. Alternatively, pseudo-priors can be read from an input file. They encourage the MC sampler to visit test positions of low conditional probability. The number of iterations for calculation of pseudo-priors is set using the `set pseudo-prior iterations' statement, or the default value of 50% of the number of main iterations can be used.

Specific Autozyg and Lodscore programs have additional parameters and options that are described in the relevant sections of the next two chapters of the tutorial.


9. Estimating Conditional ibd Probabilities by MCMC


9.1 Introduction to lm_auto and lm_pval

The MORGAN programs lm_auto and lm_pval are referred to as "Autozyg" programs, as they estimate autozygosity, or identity by descent (ibd). The Autozyg programs use MCMC to perform multipoint linkage analysis on large pedigrees where many individuals may be unobserved and exact computation is infeasible. The data are the genotypes at marker loci of observed individuals in pedigrees and affectation status (affected / unaffected / unknown) for the trait of interest. lm_auto and lm_pval estimate conditional probabilities of gene ibd states, given the trait and marker data.

lm_auto uses the LM-sampler to realize ibd configurations from their conditional distribution given the marker data. Given marker data, it estimates conditional probabilities of genome sharing patterns (gene ibd) among specified haplotypes, usually from affected individuals. The marker data are used jointly in the sampling, but the resulting ibd is scored marginally at each marker locus.

lm_pval also uses LM-sampling to provide the conditional distribution of an ibd measure given marker data. In principle it can be used to provide Monte Carlo estimates of any NPL (Non-Parametric Linkage) statistics for detecting linkage. Trait information provided to the program consists of the list of affected members of the pedigree, provided either as a list of names in the parameter file or as the phenotypic status in the pedigree file.

With lm_pval, marker data are assumed available on some pedigree members, at some of the marker loci. The distribution of the ibd measure conditional on marker data is compared to the unconditional distribution under the null hypothesis of no linkage to produce quantiles of a fuzzy p-value distribution. A fuzzy p-value distribution corrected for multiple testing is also produced, by scoring the maximum of the ibd measure over loci.


9.2 Sample lm_auto parameter file

Files for lm_auto may be found in the `IBD' subdirectory of `MORGAN_Examples'. Below is an annotated parameter file, `jv_rep_auto.par', for use with lm_auto:

 
input pedigree file 'jv_rep.ped'
input seed file '../sampler.seed'

select all markers  traits 1

map gender F  markers            distances 25.5 25.5 25.5 25.5 
map gender M  markers            distances 11.2 45.8 11.2 45.8 
map gender F  trait 1  marker 2  distances 12.8
map gender M  trait 1  marker 2  distances 5.8

set markers 1 2 3 4    freqs  .2 .2 .4 .1 .06 .04
set markers 5          freqs  .3 .2 .3 .1
set trait   1          freqs  .95 .05

set marker data 5
  333   1 3  1 3  1 3  1 3  1 3 
  331   3 4  3 4  3 4  3 4  3 4 
  334   2 3  2 3  2 3  2 3  2 3 
  431   3 4  3 4  3 4  3 4  3 4 
  531   3 3  3 3  3 3  3 3  3 3 

  343   1 3  1 3  1 3  1 3  1 3 
  341   3 5  3 5  3 5  0 0  3 3 
  344   4 6  4 6  4 6  2 4  2 4 
  441   3 4  3 4  0 0  3 4  3 4 
  541   3 3  3 3  3 3  3 3  3 3 

set window patterns 0 4
set locus window 3

set component 1 proband gametes  531 1  531 0  331 0  333 1
set component 2 proband gametes  541 1  541 0 

set L-sampler probability 0.2
set MC iterations 2000

The pedigree file specified here, `jv_rep.ped', is a 30-member, two-component pedigree in which the final individuals (named 531 and 541) have trait value `4'. All other individuals in the file have trait value `0'. Because the trait type is not specified in the parameter file via a `set trait data' statement, the trait type is assumed to be genotypic. This means that the trait locus genotype can be inferred from the trait value, i.e., there are three distinct trait values, each corresponding to a distinct genotype at the trait locus. The trait values are specified in the parameter file and are coded as `1', `3', `4' or `0', corresponding to trait locus genotypes of `1 1', `1 2' (or `2 1'), `2 2' or `missing', respectively. Unless otherwise specified, the trait value is listed after names and gender in the pedigree file.

The `map' statements specify the marker map and trait position in terms of genetic distances (centiMorgan). In this example there are five markers with sex-specific maps. The trait locus is between markers 2 and 3, with genetic distances given in terms of the marker to its left, which is marker 2. See section genedrop mapping model parameters. The first four markers each have six alleles (labeled 1-6) with frequencies 0.2, 0.2, 0.4, 0.1, 0.06 and 0.04. The fifth marker has four alleles with frequencies 0.3, 0.2, 0.3 and 0.1. The trait locus has two alleles; alleles `1' and `2' have frequencies 0.95 and 0.05, respectively. The `select' statement is analogous to genedrop's `simulate' statement (see section genedrop computing requests).

Following the `set marker data' statement are genotype data for typed individuals. The number of markers, 5, is given in the `set marker data' statement. Alternatively, genotype data could be provided to lm_auto in a separate file, specified using an `input marker data file' statement.

`set proband gametes' is the key statement for lm_auto. With it, the user specifies which haplotypes are to be scored with ibd probabilities. The statement requires a list of individual IDs and an indicator of paternal (1) or maternal (0) haplotype. For example, `531 1' refers to the paternal (1) haplotype of individual 531. In the example, the statement requests scoring both haplotypes of 531, the maternal(0) haplotype of 331 and the paternal (1) haplotype of 333. See section ibddrop statements, for more discussion of the `set proband gametes' statement.

The `set window patterns' and `set locus window' statements go together. They instruct lm_auto to compute probabilities that the gametes named in the `set proband gametes' statement have a particular ibd pattern (also called state) across several loci. In this case, we are interested in patterns `0' and/or `4' across windows of three markers at a time. Recall the output of the ibddrop program generated when using the parameter file, `ibd.par'. In the section of the program output headed "Probabilities of IBD patterns", each of the ibd patterns listed in the left-most column is associated with a label in the right-most column. The `set window patterns' statement expects one or more of these labels, instructing ibddrop to calculate probabilites of the associated pattern(s). The `set locus window' statement specifies the number of loci to be examined simultaneously, in this case 3. In the example here, window patterns `0' and `4' are requested, which correspond to ibd patterns `1 1 1 1' and `1 1 2 2', respectively. In other words, we are interested in the probability that all four of the gametes named in the `set proband gametes' statement are ibd across 3-locus windows or that the first two gametes (maternal and paternal haplotypes of individual 531) are ibd and the second two gametes (maternal haplotype of individual 331 and paternal haplotype of individual 333) are ibd, but the pairs are not ibd with one another.

As with all of MORGAN's MCMC-based programs, the user can specify the desired number of MC iterations using the `set MC iterations' statement, the desired number of burn-in iterations using `set burn-in iterations', and the probability that the L-sampler is selected instead of the M-sampler using `set L-sampler probability'. In this example, 2000 sampling iterations are to be performed, using the L-sampler 20 percent of the time. These iterations are preceded by burn-in iterations. Because the number of burn-in iterations are not specified, lm_auto will use the default value of 10 percent of the number of main iterations. In practice, one would run the MCMC sampler much longer than 2000 iterations. This is just for demonstration.


9.3 Running lm_auto example and sample output

Under the subdirectory `IBD/', run the example by typing:

 
lm_auto jv_rep_auto.par > auto.out

Below are sections of the output file `auto.out', generated by running lm_auto using the parameter file `jv_rep_auto.par'. The first table gives probabilities of gene ibd patterns for each marker and the trait locus (in the map order).

 
    Probabilities of IBD patterns

       Proband gamete set 1:  531 1  531 0  331 0  333 1

       pattern marker-1 marker-2 trt-geno marker-3 marker-4 marker-5    label

       1 1 1 1    .1985    .2960    .3480    .2920    .2430    .1840        0
       1 1 1 2    .1400    .2280    .2690    .2205    .1720    .1145        1
       1 1 2 1    .1265    .1495    .2050    .1225    .0975    .0985        3
       1 1 2 2    .0145    .0165    .0200    .0145    .0125    .0070        4
       1 1 2 3    .0340    .0350    .0515    .0255    .0220    .0130        5
       1 2 1 1    .0235    .0150    .0030    .0095    .0185    .0275        6
       1 2 1 2    .0755    .0425    .0110    .0640    .0705    .0775        7
       1 2 1 3    .0595    .0330    .0085    .0310    .0515    .0570        8
       1 2 2 1    .0380    .0260    .0030    .0275    .0300    .0435        9
       1 2 2 2    .1035    .0605    .0255    .0860    .1175    .1530       10
       1 2 2 3    .0540    .0370    .0150    .0385    .0490    .0580       11
       1 2 3 1    .0200    .0080    .0035    .0060    .0140    .0210       12
       1 2 3 2    .0495    .0255    .0155    .0385    .0645    .0700       13
       1 2 3 3    .0100    .0045    .0060    .0030    .0095    .0170       14
       1 2 3 4    .0530    .0230    .0155    .0210    .0280    .0585       15

    Probabilities of IBD for pattern set for windows of 3 loci

       Proband gamete set 1

       Pattern set:   0  4 

         IBD  wndw 1 wndw 2 wndw 3 wndw 4

       0 0 0   .5325  .4900  .4740  .5385
       0 0 1   .0975  .0870  .0625  .0685
       0 1 0   .0295  .0440  .0400  .0550
       0 1 1   .1275  .0665  .0555  .0315
       1 0 0   .0445  .0465  .1330  .1125
       1 0 1   .0130  .0085  .0240  .0250
       1 1 0   .0255  .1130  .0975  .1030
       1 1 1   .1300  .1445  .1135  .0660

Interpretation of these results is similar to that of ibddrop See section Running ibddrop example and sample output. Briefly, the probabilities are summarized by ibd pattern. A pattern is a series of integers, one representing each gamete listed in the `set proband gametes' statement. The order of gametes in the output file patterns is the same as the order in which the gametes were listed in `set proband gametes'. Numbers that are the same indicate gametes that are ibd. For instance, in the first row of the table above, the pattern is `1 1 1 1', which means that the values in row one represent probabilities that all four gametes are ibd at each marker locus and at the trait locus. Likewise, `1 2 1 1' means gametes 1, 3, and 4 are ibd while gamete 2 is not ibd with the others; `1 2 3 4' means all four gametes are not ibd.

The second table in the above output is due to the window size and ibd pattern statements in the parameter file. Its interpretation is similar to the output of ibddrop when statement `set locus window' was used, See section Running ibddrop example and sample output. In ibddrop, the values in the `IBD' column of the output indicate whether the gametes specified in the `set proband gametes' statement are all ibd (indicated by a `1') or if at least one of the gametes is not ibd with the others (indicated by a `0'). With lm_auto, the user can specify additional ibd patterns of interest. In this example, the parameter file `jv_rep_auto.par' includes the statement `set window patterns 0 4', indicating that we are interested in ibd patterns `0' and `4', corresponding to `1 1 1 1' and `1 1 2 2', respectively, as discussed in the previous section. That is, we would like to know the probability that either all four gametes are ibd or that the first two are ibd and the second two are ibd, but the pairs are not ibd with one another for each window of three loci. Consequently, interpretation of the `IBD' column of the lm_auto output is as follows. The row headed by `0 0 0' gives probabilities that the gametes do not follow either of the two ibd patterns of interest at all three loci for each window. The row headed by `0 0 1' gives probabilities that the gametes do not follow either of the two ibd patterns of interest at the first two loci in the window, but at the third loci either all four gametes are ibd or the first two are ibd and the last two are ibd, but the pairs are not ibd with one another.

In this section of the lm_auto output, the order of the marker and trait loci is the same as above. In this example, the trait locus was between markers 2 and 3. Therefore, the windows are as below:

window

loci

wndw 1

marker 1, marker 2, trait

wndw 2

marker 2, trait, marker 3

wndw 3

trait, marker 3, marker 4

wndw 4

marker 3, marker 4, marker 5


9.4 Sample lm_pval parameter file

Files for lm_pval may be found in the `Nonparametric' subdirectory of `MORGAN_Examples'. The parameter file, `ped73_pval.par' is similar to the parameter file used for lm_auto.

 
input pedigree file '../ped73_ph.ped'
input pedigree record trait 1 integer 3
set component 1 affected individuals 410 511 512
set component 2 affected individuals 1301 1302 1303 1305

input seed file '../sampler.seed'

input marker data file '../ped73.marker.missing'
select all markers

set L-sampler probability 0.2

## recommended number of iterations is on the order of 10^5
set MC iterations 2000

For lm_pval, markers but no trait locus are selected. Therefore, we do not need `map trait marker' statements and we do not include the trait locus in the `select' statement. The marker map and genotypes are in a separate file `ped73.marker.missing', included using the statement `input marker data file'. lm_pval requires identification of pedigree members affected with the disease; this is done using the `set affected individuals' statement.


9.5 Running lm_pval example and sample output

Under the subdirectory `Nonparametric/', run the lm_pval example by typing:

 
lm_pval ped73_pval.par > pval.out

A portion of the output giving fuzzy p-values is below. See `pval.out' for the entire output file.

 
 Combined distribution of fuzzy p-values, by locus:
pval maxim  marker-1  marker-2  marker-3  marker-4  marker-5  marker-6  marker-7
            marker-8  marker-9 marker-10
0.00 0.000     0.000     0.000     0.000     0.000     0.000     0.001     0.001
               0.000     0.000     0.000
0.01 0.004     0.000     0.000     0.000     0.004     0.000     0.009     0.011
               0.000     0.000     0.000
0.02 0.008     0.000     0.000     0.000     0.009     0.005     0.020     0.024
               0.005     0.005     0.005
0.03 0.011     0.000     0.000     0.000     0.016     0.019     0.033     0.036
               0.019     0.019     0.019
0.04 0.015     0.000     0.000     0.000     0.023     0.032     0.045     0.049
               0.032     0.032     0.032
0.05 0.019     0.000     0.000     0.000     0.029     0.046     0.058     0.062
               0.046     0.046     0.046

Interpretation of the output is as follows. For a certain p value, say, `0.05', the columns labelled `marker-1' through `marker-10' give the probability of obtaining a p-value less than this p value 0.05 over the sampling run at each marker, i.e., the power. The column labelled `maxim' gives the maximum over all loci. Because the trait used in simulating the marker data for this example does not have a strong signal, each marker has very little power (ranging from 0.000 to 0.062) and the multilocus power is only 1.9%.


9.6 Autozyg statements

Many of the lm_auto and lm_pval statements following are also used for the location LOD scores programs.


9.6.1 Autozyg computing requests

select [chromosome I] all markers [traits 1]

This statement selects all markers on the chromosome for the computation; if not all markers are to be used, use the next statement. If the trait for lm_auto is linked, `traits 1' is included in the statement. Omit `traits 1' for lm_pval.

select [chromosome I] markers J1 J2... [traits 1]

This alternate form of the `select markers' statement specifies a subset of the markers. As in the previous statement, specify a linked trait for lm_auto, but not for lm_pval here.

select unlinked trait K

Use this statement for lm_auto if the trait is unlinked. K is the trait number.


9.6.2 Autozyg file identification statements

All of the general MORGAN file identification statements can be used with the Autozyg programs. For a list of these statements, see section File identification statements. Some additional file identification statements are specific to Autozyg.

output scores file filename

This statement, optional for lm_auto, is used to save interim scores.

input rescue file filename

A rescue file may be used to continue an lm_auto run instead of restarting at the beginning. This file contains intermediate data, which is periodically saved when an output rescue file has been specified in a preceding run.

output rescue file filename

This statement, which is optional for lm_auto, specifies the periodic dumping of intermediate results to files that may be used to restart the program midstream. Data are written alternately to files with "1" and "2" appended to the file name.

input seed file filename

This statement is optional for both lm_auto and lm_pval. It specifies random number seeds for L- and M- samplers.


9.6.3 Autozyg pedigree file description

Both Autozyg programs use the general MORGAN pedigree file description statements; see section Pedigree file description statements.

description statements; see section Pedigree file description statements.


9.6.4 Autozyg output file description

Two output file description statements are optional for lm_auto.

output rescue data I iterations

This statement can be used to specify the frequency of dumping program data if an output rescue file is specified.

output scores every I MC iterations

If an output scores file is specified, but this statement is not present, lm_auto writes the scores file at the conclusion of the last iteration only (see `set MC iterations' statement).


9.6.5 Autozyg mapping model parameters


9.6.6 Autozyg population model parameters

set [chromosome I] marker names N1 N2...

This statement, which is optional for both lm_auto and lm_pval, specifies the names of the markers in the order of their position in the marker data file, for example, `set marker names D1S306 D1S249 D1S245'.


9.6.7 Autozyg computational parameters

set [chromosome I] markers K data N1 M11 M12 ... [N2 M21 M22 ...] ...

Individuals with at least one observed marker are named, together with their marker genotypes. The number of allele pairs for an individual is the same as the number of markers mapped on the chromosome. Marker loci not observed for an individual are given alleles `0 0'. (Those individuals with no observed markers need not be included in this statement.)

In the example, there are 5 markers mapped for the chromosome:

 
set markers 5 data   343   1 3  1 3  1 3  1 3  1 3
                     331   3 4  3 4  3 4  3 4  3 4
                     334   2 3  2 3  2 3  2 3  2 3
                     431   3 4  3 4  3 4  3 4  3 4
                     531   3 4  3 3  0 0  3 3  3 3
set [component M] [scoreset N] proband gametes N1 K1 N2 K2...

This statement is required for lm_auto. One or more scoring sets may be given for each pedigree component, where a scoring set consists of two or more haplotypes. If there is more than one set for the component, each set is assigned a number 1 or greater. The maximum number of haplotypes in each set is limited to 10, due to computer memory considerations.

Pairs of names and meiosis indicators are given, with 0 indicating maternal inheritance and 1 indicating paternal inheritance. In the example, there are two sets for the component:

 
set component 1 scoreset 2 proband gametes 531 1 531 0 331 0 331 1
set component 1 scoreset 4 proband gametes 561 1 362 0 364 1

At least one proband gamete set must be specified when running lm_auto.

set [chromosome I] locus window K

This statement is optional for lm_auto and gives the window size (number of loci) for which the multi-locus ibd probabilities are scored. If no size is given, each locus is scored separately.

set [component M] [scoreset N] window patterns L1...

This statement is a companion to `set locus window' and is required for lm_auto when the window option is chosen. It identifies the ibd patterns to be jointly scored for the proband gamete set N assigned by the `set proband gametes' statement. A prior run, with the same proband gametes, but without the window option is needed to select the ibd patterns. That is, the user is required to list ibd patterns of interest by label; the labeling of the patterns is not obvious without first running lm_auto. In the example, we were interested in ibd patterns `1 1 1 1' and `1 1 2 2', which are assigned labels `0' and `4', respectively, in the output table headed `Probabilities of IBD patterns'. One needs to run lm_auto to obtain these labels.

set trait data (genotypic | discrete | quantitative)

Trait data are specified as genotypic, discrete (phenotypic), or quantitative (continuous). With a genotypic trait, the trait locus genotype can be inferred from the trait value. There are four possible trait values: `0' = missing, `1' = homozygous for allele 1, `3' = heterozygous, and `4' = homozygous for allele 2. There are three possible trait values with a discrete (or phenotypic trait): `0' = missing, `1' = unaffected, and `2' = affected. If a discrete trait is chosen, the next statement, `set incomplete penetrances', must be included. With a quantitative trait, a missing value is denoted as a real number with integer portion `999'. For example, `999', `999.3' and `999.543' all mean `missing'. The default trait type is genotypic.

set incomplete penetrances X1 X2 X3

This statement is required for discrete trait data. Penetrances (probability of expressing the trait) are provided for the (1 1), the (1 2), and the (2 2) genotypes, respectively.

set [component M] affected individuals N1 N2 ...

lm_pval needs to know which members of the pedigree are affected with the disease: there are two ways to specify this. Affected individuals (for at least one component) may be named in this statement. Or, use the following statement if the affected individuals are to be determined from the pedigree data file.

set affected individuals trait K

Here, lm_pval is to determine the affected individuals from the trait data in the pedigree file. A trait genotypic code of 3 (genotype (1 2) or (2 1)) or 4 (genotype (2 2)) indicates an affected individual. The trait number, K, determines the position of this genotypic code in the pedigree records (see section Pedigree file description statements).


9.6.8 Autozyg MCMC parameters and options

These statements which set the parameters for the MCMC algorithms, apply to both Autozyg programs, and to the location LOD score programs, lm_lods, lm_markers, lm_bayes and lm_schnell unless otherwise noted.

use (locus-by-locus sampling | sequential imputation) for setup

There are two setup methods available to find a starting configuration for the meiosis indicators prior to the MCMC: using sequential imputation (with the trait treated as unlinked), or using locus-by-locus sampling (by assuming all markers and trait are unlinked). Sequential imputation is the default method.

use I sequential imputation realizations for setup

When sequential imputation is selected above, this statement specifies the number of sequential imputation samples from which the starting configuration of meiosis indicators is to be selected. The default is 20 iterations.

set MC iterations I

Required. It specifies the total number of main L- and M-sampling iterations. There is no default number of MCMC iterations; the total number of "main" L- and M- sampling iterations must be specified for all Autozyg programs. The total MCMC run length is the sum of the number of burn-in iterations specified by the `set burn-in iterations' statement and the number of main iterations specified in `set MC iterations'.

set burn-in iterations I

Burn-in iterations are performed initially, with the trait locus (if any) unlinked to the marker map. The default number of burn-in iterations is specific to each program.

sample by (scan | step)

By default (sample by scan), all loci (L-sampler) or all meioses (M-sampler) are updated successively in an order determined by random permutation. When sampling by step, a single locus (L-sampler) or single meiosis (M-sampler) is randomly selected for updating. lm_bayes presently samples by scan only.

set L-sampler probability X

The L-sampler probability, between 0.0 and 1.0, specifies the probability in each MCMC iteration, of locus-sampling rather than meiosis-sampling. The default is 0.0, that is, to use M-sampler only.

compute scores every I iterations

The default is to score recombinations, total log-probabilities or the Rao-Blackwellized estimator every MCMC iteration. This statement specifies the frequency with which to compute the contributions to the ibd scores or the location LOD scores.

check progress I MC iterations

Use this statement to monitor the progress of the program as it is running. It will print out the iteration number every Ith iteration.


10. Estimating ibd Based Test Statistics by MCMC


10.1 Introduction to lm_ibdtests

Program lm_ibdtests uses identity-by-descent (ibd) based and likelihood-ratio based statistics to construct linkage detection tests. The current version allows only discrete trait data (affected or unaffected or unknown phenotypic status).

The ibd scoring approach involves construction of an ibd measure (S) that is a function of the inheritance vectors and affectation status of the individuals in pedigrees. The program uses realizations of the inheritance vectors conditional only on the marker data (Y) to compute a Monte Carlo estimate of the test statistic E(S|Y). Four different ibd measures are implemented in the program. Two of these measures, Slambda and Saffunaff (developed by Saonli Basu), allows incorporation both of affected and of unaffected individuals in the analysis. The test statistic is used to test the null hypothesis of no linkage between the trait and a set of markers. For this approach, two different testing options have been implemented; one is a normality-based test and the other is a permutation test. The permutation test keeps the observed marker data unchanged and permutes the affectation status. In the normality-based test, test statistics (Spairs, for example) are computed for each realization and averaged over realizations. The program finally reports the p-values from each test at the marker loci.

A new (lambda,p) model has been implemented in lm_ibdtests. The (lambda,p) model models the trait-dependent segregation of inheritance vectors at a locus given the trait data on individuals and constructs a chi-square test for linkage detection. The (lambda,p) model incorporates both affected and unaffected individuals in the analysis. The delta model is also implemented in the program. The current version of lm_ibdtests only allows the ibd measure Spairs in the delta model set-up. The program returns the p-values of the likelihood-ratio statistics under each of these two models. (For a detailed description of the (lambda,p) and delta models, see Saonli Basu's PhD thesis; for a real data analysis using lm_ibdtests, see Sieh et al. 2005. Comparison of marker types and map assumptions using Markov chain Monte Carlo-based linkage analysis of COGA data. BMC Genetics 2005, 6 Suppl 1 S11)


10.2 Sample lm_ibdtests parameter file

Parameter files for lm_ibdtests may be found in the `Nonparametric' subdirectory of `MORGAN-examples'.

 
input pedigree file '../ped73_ph.ped'
input pedigree record trait 1 integer 3
input marker data file  '../ped73.marker.missing'

output sampler seeds only
input seed file '../sampler.seed'
output overwrite seed file '../sampler.seed'

select all markers
set affected individuals trait 1

sample by scan
set L-sampler probability 0.5
set MC iterations 3000

compute ibd statistics
set ibd measures Spairs Srobdom
set ibd tests norm permu
set ibd permutations 999

compute scores every 100 iterations

10.3 Sample lm_ibdtests output

Under the subdirectory `Nonparametric/', run the example with the following command

 
lm_ibdtests ped73_ibdt_IBD.par

Output that has test statistics and p values is shown below:

 
 ************************************
 p Value for Permutation Test for IBD
 ************************************

            pos(Haldane cM)   Spairs  Srobdom
     locus     male  female  p-value  p-value

  marker-1    0.000   0.000   0.9020   0.9300
  marker-2   10.000  10.000   0.8780   0.8450
  marker-3   20.000  20.000   0.8130   0.7800
  marker-4   30.000  30.000   0.5080   0.5190
  marker-5   40.000  40.000   0.2550   0.2480
  marker-6   50.000  50.000   0.2950   0.2510
  marker-7   60.000  60.000   0.3850   0.5090
  marker-8   70.000  70.000   0.5100   0.6660
  marker-9   80.000  80.000   0.6610   0.7750
 marker-10   90.000  90.000   0.5640   0.7470

 *******************************
 p Value for Normal Test for IBD
 *******************************

            pos(Haldane cM)
    locus     male  female   Spairs p-value  Srobdom p-value

  marker-1    0.000   0.000  -0.7843  0.7951  -0.2867  0.6167
  marker-2   10.000  10.000  -0.9574  0.8166  -0.3841  0.6567
  marker-3   20.000  20.000  -1.1825  0.8816  -0.2260  0.5692
  marker-4   30.000  30.000  -0.6437  0.7381  -0.1272  0.5552
  marker-5   40.000  40.000   0.2478  0.4103   0.0986  0.4743
  marker-6   50.000  50.000  -0.2270  0.5752  -0.3275  0.6252
  marker-7   60.000  60.000  -0.1503  0.5612  -0.3514  0.6437
  marker-8   70.000  70.000  -0.3096  0.6372  -0.3587  0.6557
  marker-9   80.000  80.000  -0.4877  0.6902  -0.2706  0.6037
 marker-10   90.000  90.000  -0.2924  0.6222  -0.1136  0.5662

10.4 lm_ibdtests statements

compute ibd statistics

Required.

set ibd measures [Spairs] [Srobdom] [Saffect] [Slambda]

Optional. lm_ibdtests uses 1 to 4 measures to perform ibd tests for linage; these are specified in the order [Spairs] [Srobdom] [Saffect] [Slambda]. Spairs, Srobdom, and Slambda may be specified for both normal and permutation tests; Saffect may not currently be specified with the normal tests option.

set ibd tests [normal] [permutation]

Optional. Normal and/or permutation tests may be specified.

set ibd permutations I

Optional. Need to be specified when the permutation test is requested through `set ibd tests'. The default is 999. It is recommended that at least 50 permutations are used.


11. Estimating Location LOD Scores by MCMC


11.1 Introduction to lm_lods, lm_markers, lm_bayes and lm_schnell

The programs lm_lods, lm_markers, lm_bayes and lm_schnell are referred to as "Lodscore" programs. The Lodscore programs use MCMC to perform multipoint linkage analysis and trait mapping on large pedigrees where many individuals may be unobserved and exact computation is infeasible. The data are the genotypes of observed individuals in the pedigree at marker loci and discrete or continuous trait data. As with exact methods of computing lod scores, the genetic model is assumed known. The only unknown parameter is the location of the trait locus. Therefore, the user is required to specify the marker locations, trait and marker allele frequencies and penetrance function. Presently, users are very limited in their choice of penetrance function, but this is currently under revision and will change in future releases of MORGAN.

lm_lods estimates location LOD scores for genotypic or discrete traits by working along the chromosome, estimating likelihood ratios between adjacent locations of the trait locus, starting from unlinked, and proceeding through the linkage group to unlinked again. We have three methods of combining these local likelihood ratios into an overall LOD score method. One reduces to an eigenvalue method used by Thompson (2000: sec 9.2, P.118). Other alternatives are simply to combine the ratios from the left, or from the right. Weighted combinations do a better job (William Stewart), but we do not pursue this here as better methods are available in lm_bayes and lm_markers.

lm_markers is an implementation of the Lange-Sobel estimator, using our LM-sampler. It is so-named because only the meiosis indicators at marker loci are sampled, and only conditional on the marker data. The Lange-Sobel estimate works reasonably well in reasonable time, provided a good MCMC sampler is used, and provided the trait data do not have strong impact on the conditional distribution of meiosis indicators. Recall that the method samples meiosis indicators conditionally only on the marker data. Because of this the method can produce quite accurate LOD scores in the absence of linkage, but can be inaccurate in estimating the strength of linkage signals. As well as producing the LOD score, our current method provides a batch-means pointwise estimate of the Monte Carlo standard error of the LOD-score estimate. lm_markers can work with genotypic, discrete or quantitative traits.

lm_bayes is an alternative method implemented for genotypic or discrete traits. The MCMC performance is better than for lm_markers, but it has other computational overheads. lm_bayes samples trait locations from a posterior distribution, and then divides it by the prior to produce the likelihood and hence the LOD score. Estimation is in two phases. A preliminary run with discrete uniform prior gives order-of-magnitude relative likelihoods. Then, using the inverse of these likelihoods as prior weights (to produce an approximately uniform posterior) a second run is made to estimate the likelihood. It is important that the initial run is long enough for all points to be sampled, and for the unlinked trait position to have a reasonable number of realizations. For locations at which LOD scores are very negative, or for the unlinked position when there is some location with strong positive LOD score this can be problematic.

Our current implementation of lm_bayes provides two LOD score estimates. The first is a crude estimate which counts realizations of locations sampled to estimate the posterior: as can be seen from the output this can be quite erratic. The Rao-Blackwellized estimator is much preferred, and produces good estimates in reasonable time.

lm_schnell uses MCMC realizations of segregation indicators, conditional on marker and quantitative trait data, to estimate local likelihood ratios between alternative hypothesized trait locations. It is based on the program SCHNELL (Single CHromosome Non-Exponential Linkage Likelihoods), originally written by Greg Snow. Because lm_schnell uses the same local-likelihood-ratio based method of lodscore estimation as lm_lods, it suffers from the same disadvantages, namely extensive MCMC requirements and frequent difficulty estimating local likelihood ratios across the positions of highly polymorphic markers. However, because lm_schnell models a quantitative, rather than qualitative trait, MCMC mixing performance should be better. Also, uniquely among our currently released programs, lm_schnell models a polygenic component in addition to the major trait locus. The sampling of this component is by single-site updating, and testing of this feature has been limited. Joint updating of polygenic values is implemented in programs under development, and lm_schnell will be improved or replaced in future releases.


11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell

Since lm_markers is our preferred program and it can analyze genotypic, discrete and quantitative traits, we will focus on the parameters files for this program. In the example package, under the `Lodscores' subdirectory, these three files run with lm_markers: `ped73_ge.par', `ped73_ph.par' and `ped73_qu.par'.

Except for the statements specifying the trait, the three parameter files share the following statements:

 
input pedigree file  '../ped73.ped'

input seed file         '../sampler.seed'
output overwrite seed file        '../sampler.seed'

input marker data file '../ped73.marker.missing'
select all markers traits 1

map trait 1 all interval proportions 0.3 0.7
map trait 1 external recomb fracts   0.05 0.15 0.3 0.4 0.45 
set trait 1  freqs 0.5 0.5

sample by scan
set burn-in iterations 150
set L-sampler probability 0.2
set MC iterations 3000
check progress MC iterations 1000

The pedigree file specified by the `input pedigree file' statement can contain multiple traits. Additional statements (see below) are needed to specify the type of trait and the column number in the pedigree file.

The marker map, allele frequencies and genotypes are given in a separate file `ped73.marker.missing', included using the `input marker data file' statement. Genotypes, missing at random at some markers for some individuals, are for only observed individuals in `ped73'. Alternatively, as demonstrated in the parameter file `ped73_qu.par', you may include marker information using statements `map marker dist' and `set markers ... freqs', leaving only marker genotypes in a separate input file.

The two `map trait' statements give trait locus positions at which the LOD scores should be calculated, for intervals between marker loci and for locations outside the range of marker loci. Between two marker loci, the positions are specified by proportions, at 30 and 70 percent of the interval (it can deal with gender-specific maps easily this way). Outside the marker map, the positions are specified explicitly in terms of recombination fractions with the nearest marker locus. Note that an external recombination fraction of 0.5 is not necessary since the likelihood of unlinked trait locus is always used as reference when computing the LOD scores.

The `set trait ... freqs' statement specifies allele frequencies at the trait locus. If the allele frequencies sum to below 1, a warning message will be issued:

 
Sum of allele frequencies is not in range .9999, 1.0001  (W)

Instead, if the allele frequencies sum to above 1.0001, the program quits and generates an error message.

The final five statements give MCMC specifications. The `sample by scan' statement asks that each iteration update all the meiosis indicators, S. The alternative `sample by step' updates one locus (L-sampler) or one meiosis (M-sampler) only in each iteration. In the `set burn-in iterations' statement, 100 burn-in iterations, with an unlinked trait, are requested. The L-sampler probability is set at 20 percent, which seems to be a good choice. For a detailed discussion of effects of varying L- to M-sampler ratio, see section 10.6 in Thompson (2000). The next statement requests 3000 MCMC iterations per test position of the trait locus. This is for demonstration purposes only. For real data analysis, use longer runs, on the order of 10^5 iterations per test position. With the last statement, the program checks the progress every 1000 iterations.

To specify different types of trait, different statements are needed. For a genotypic trait as in `ped73_ge.par', the part specifying the trait is

 
set trait data genotypic 
input pedigree record trait 1 integer 3

`ped73_ph.par' specifies a phenotpic trait as follows:

 
set trait data discrete
input pedigree record trait 1 integer 3
set incomplete penetrance 0.05 0.6 0.95

And a quantitative trait is specified in `ped73_qu.par':

 
set trait data quantitative
input pedigree record trait 1 real 1

set trait 1 genotype  mean 90.0 100.0 110.0
set trait 1 residual  variance 25.0

All three files use `input pedigree record trait' to specify which column in that file to be used as the trait value. The `set trait data' statement specifies the type of trait. Penetrances are needed for a phenotypic trait, whereas genotypic means and residual variance are needed for a quantitative trait. Additive variance may be needed to run lm_schnell and can be specified using the statement `set trait ... additive variance'. The default is 0.

Note that lm_markers can do the peeling by component using statement `set peeling by component'. Parameter file `ped73_qu_cmpt.par' has this option turned on, everything else being the same as file `ped73_qu.par'.

Since lm_bayes works for genotypic and discrete traits, parameters files `ped73_ge.par' and `ped73_ph.par' run with lm_bayes.

Parameter files for lm_lods and lm_schnell require slight modification: because MCMC is performed at each position, the number of MC iterations is much smaller in general than that used for lm_markers. In `ped73_lods_ph.par' and `ped73_schn_qu.par', the number of MC iterations is set to be 300 instead of 3000.


11.3 Running lm_lods example and sample output

Under the subdirectory `Lodscores/', run the lm_lods example on the phenotypic trait by typing:

 
lm_lods ped73_lods_ph.par

The main results of interest from lm_lods are the LOD scores which are given at the end of the output for each component (connected pedigree) at each position requested.

Since there are 73 individuals in the pedigree, it will take a while to finish. But the LOD scores from this example look like this (some outputs omitted to save space):

 
 ESTIMATED LOD SCORES

  Component   1

    The largest eigenvalue        :  1.86626

    The second largest eigenvalue :  1.57587

    Cumulative from left          :  2.21620

    Cumulative from right         :  0.45122

 LodScore estimates:

 Trait pos #     position (Haldane cM)
   or marker        male     female         eigen       left      right 

           1    -115.129   -115.129       0.04481    0.00015   -0.34546
           2     -80.472    -80.472       0.13091   -0.00410   -0.34971
           3     -45.815    -45.815       0.28046   -0.05817   -0.40378
           4     -17.834    -17.834       0.43549   -0.18552   -0.53113
           5      -5.268     -5.268       0.83469   -0.13949   -0.48510
    marker-1       0.000      0.000            NA         NA         NA
           6       3.000      3.000       1.33851    0.00175   -0.34386
           7       7.000      7.000       1.68532    0.05630   -0.28931
    marker-2      10.000     10.000            NA         NA         NA
           8      13.000     13.000       2.30193    0.17720   -0.16841
           9      17.000     17.000       2.58626    0.23244   -0.11317
    marker-3      20.000     20.000            NA         NA         NA
          10      23.000     23.000       3.46676    0.62422    0.27861
          11      27.000     27.000       3.95936    0.78769    0.44208
    marker-4      30.000     30.000            NA         NA         NA
          12      33.000     33.000       5.05419    0.97612    0.63051
          13      37.000     37.000       5.42606    1.08006    0.73445
    marker-5      40.000     40.000            NA         NA         NA
          14      43.000     43.000       6.14399    1.17323    0.82762
          15      47.000     47.000       6.39609    1.23478    0.88917
    marker-6      50.000     50.000            NA         NA         NA
          16      53.000     53.000       5.68067    1.06031    0.71470
          17      57.000     57.000       5.37402    0.98868    0.64307
    marker-7      60.000     60.000            NA         NA         NA
          18      63.000     63.000       4.32190    1.05923    0.71362
          19      67.000     67.000       3.91308    1.05841    0.71280
    marker-8      70.000     70.000            NA         NA         NA
          20      73.000     73.000       3.35744    1.01417    0.66856
          21      77.000     77.000       3.07940    1.04257    0.69696
    marker-9      80.000     80.000            NA         NA         NA
          22      83.000     83.000       2.96763    1.33124    0.98563
          23      87.000     87.000       2.79748    1.45101    1.10540
   marker-10      90.000     90.000            NA         NA         NA
          24      95.268     95.268       1.78970    1.13057    0.78496
          25     107.834    107.834       1.13899    0.95320    0.60759
          26     135.815    135.815       0.41807    0.59170    0.24609
          27     170.472    170.472       0.10067    0.43978    0.09417
          28     205.129    205.129       0.01432    0.39120    0.04559

As we mentioned earlier, there are three methods to combine the likelihood ratios (for each test position over the position to the left, and over the position to the right): the eigenvalue method, simple averaging starting from the left, and simple averaging starting from the right.

The largest real eigenvalue should, in theory, be equal to 2.0 and the eigenvector corresponding to the largest real eigenvalue is given as the LOD scores. However, when the second largest eigenvalue is very close to the largest one, the eigenvector can be very unstable and sometimes gives very bad LOD scores. When that happens, the "left" and "right" method, though, simpler, actually perform better.

The "Cumulative from left" and "Cumulative from right" values should, ideally, be one (their product is always one). Usually they are not and one can see that the LOD scores differ a lot for these three methods. This was a very short MCMC run. For longer runs, the LOD scores can be more consistent for the three methods. Nevertheless, lm_lods is now giving way to our newer method, lm_bayes.


11.4 Running lm_markers examples and sample output

Under the subdirectory `Lodscores/', four parameter files may be run with lm_markers: `ped73_ge.par' for a genotypic trait, `ped73_ph.par' for a phenotypic trait, `ped73_qu.par' and `ped73_qu_cmpt.par' both for a quantitative trait. The command to run these examples has format as the following

 
lm_markers ped73_qu.par

Here is the interesting part of the output. For each test position, we have the LOD score and the standard deviation.

 
LodScore estimates by Rao-Blackwellized computation:
 
 Trait pos #     position (Haldane cM)
   or marker        male     female       LodScore    StdErr

           1    -115.129   -115.129         0.0312    0.0006
           2     -80.472    -80.472         0.0532    0.0012
           3     -45.815    -45.815         0.0561    0.0031
           4     -17.834    -17.834        -0.0989    0.0078
           5      -5.268     -5.268        -0.3636    0.0160
    marker-1       0.000      0.000        -0.5626    0.0248
           6       3.000      3.000        -0.5639    0.0248
           7       7.000      7.000        -0.6048    0.0280
    marker-2      10.000     10.000        -0.6660    0.0316
           8      13.000     13.000        -0.6474    0.0208
           9      17.000     17.000        -0.6788    0.0133
    marker-3      20.000     20.000        -0.7628    0.0147
          10      23.000     23.000        -0.4993    0.0183
          11      27.000     27.000        -0.2416    0.0292
    marker-4      30.000     30.000        -0.0863    0.0407
          12      33.000     33.000         0.5250    0.0352
          13      37.000     37.000         0.9361    0.0275
    marker-5      40.000     40.000         1.1317    0.0182
          14      43.000     43.000         0.9977    0.0165
          15      47.000     47.000         0.7870    0.0153
    marker-6      50.000     50.000         0.5955    0.0153
          16      53.000     53.000         0.4909    0.0152
          17      57.000     57.000         0.2737    0.0151
    marker-7      60.000     60.000         0.0359    0.0181
          18      63.000     63.000        -0.2081    0.0103
          19      67.000     67.000        -0.6324    0.0066
    marker-8      70.000     70.000        -1.0908    0.0080
          20      73.000     73.000        -0.6078    0.0125
          21      77.000     77.000        -0.3453    0.0109
    marker-9      80.000     80.000        -0.2890    0.0083
          22      83.000     83.000        -0.0640    0.0064
          23      87.000     87.000         0.1769    0.0068
   marker-10      90.000     90.000         0.3237    0.0107
          24      95.268     95.268         0.4945    0.0070
          25     107.834    107.834         0.5835    0.0037
          26     135.815    135.815         0.3911    0.0014
          27     170.472    170.472         0.1810    0.0004
          28     205.129    205.129         0.0812    0.0001

By turning on the option of peeling by component, we can compute lod scores for each component separately, as well as over estimated jointly over all components. To run the example, use the following command

 
lm_markers ped73_qu_cmpt.par

11.5 Running lm_bayes examples and sample output

Under the subdirectory `Lodscores/', run the lm_bayes example by typing:

 
lm_bayes ped73_ph.par

The results from lm_bayes are the LOD scores toward the end of the output. Two methods of computing the LOD scores are available: (1) count realizations of locations sampled to estimate the posterior probability (crude) and (2) Rao-Blackwellized estimator (R-B). The latter is the preferred method.

 
 LodScore estimates:

 Trait pos #     position (Haldane cM)     pseudo     freq        LodScore    
   or marker        male     female         prior  visited     crude       R-B

           0    unlinked   unlinked      0.020178      116        NA        NA
           1    -115.129   -115.129      0.020560      100   -0.0726   -0.0080
           2     -80.472    -80.472      0.021170      143    0.0700   -0.0203
           3     -45.815    -45.815      0.023718      165    0.0828   -0.0681
           4     -17.834    -17.834      0.035627      122   -0.2250   -0.2402
           5      -5.268     -5.268      0.060088      122   -0.4520   -0.4592
    marker-1       0.000      0.000            NA       NA        NA        NA
           6       3.000      3.000      0.089923      127   -0.6097   -0.6299
           7       7.000      7.000      0.098887      150   -0.5786   -0.6763
    marker-2      10.000     10.000            NA       NA        NA        NA
           8      13.000     13.000      0.106325      102   -0.7776   -0.7095
           9      17.000     17.000      0.105926      111   -0.7393   -0.7091
    marker-3      20.000     20.000            NA       NA        NA        NA
          10      23.000     23.000      0.067969       93   -0.6234   -0.5206
          11      27.000     27.000      0.043461      105   -0.3765   -0.3216
    marker-4      30.000     30.000            NA       NA        NA        NA
          12      33.000     33.000      0.022509      118   -0.0401   -0.0447
          13      37.000     37.000      0.014511       87    0.0182    0.1340
    marker-5      40.000     40.000            NA       NA        NA        NA
          14      43.000     43.000      0.009146      157    0.4751    0.3324
          15      47.000     47.000      0.007793       82    0.2625    0.4142
    marker-6      50.000     50.000            NA       NA        NA        NA
          16      53.000     53.000      0.010758      101    0.2130    0.3376
          17      57.000     57.000      0.017022      121    0.0922    0.1807
    marker-7      60.000     60.000            NA       NA        NA        NA
          18      63.000     63.000      0.025113      117   -0.0913   -0.0129
          19      67.000     67.000      0.027450       45   -0.5449   -0.1115
    marker-8      70.000     70.000            NA       NA        NA        NA
          20      73.000     73.000      0.027063       83   -0.2729   -0.1503
          21      77.000     77.000      0.026552       67   -0.3576   -0.1337
    marker-9      80.000     80.000            NA       NA        NA        NA
          22      83.000     83.000      0.023659       96   -0.1513   -0.0742
          23      87.000     87.000      0.019262       55   -0.3039    0.0179
   marker-10      90.000     90.000            NA       NA        NA        NA
          24      95.268     95.268      0.012658       81    0.0465    0.2023
          25     107.834    107.834      0.011441       87    0.1215    0.2475
          26     135.815    135.815      0.014517       76   -0.0406    0.1441
          27     170.472    170.472      0.017645       90   -0.0520    0.0588
          28     205.129    205.129      0.019069       81   -0.1314    0.0248

11.6 Running lm_schnell example and sample output

lm_schnell runs on quantitative traits only. So under the subdirectory `Lodscores/', use the following command:

 
lm_schnell ped73_schn_qu.par

Since additive variance is not specified in the parameter file, the default value 0 is used. It takes a while to finish.

 
 ESTIMATED LOD SCORES

  Component 1

    The largest eigenvalue:          2.38058

    The second largest eigenvalue:   1.84655

    Cumulative from left  :          3.47316

    Cumulative from right :          0.28792

    LOD scores:
    position     female       male      eigen       left      right
           0 -115.12925 -115.12925    0.04135    0.01177   -0.52896
           1  -80.47190  -80.47190   -0.04939    0.04327   -0.49746
           2  -45.81454  -45.81454   -0.20917    0.10753   -0.43320
           3  -17.83375  -17.83375   -0.72080    0.20552   -0.33520
           4   -5.26803   -5.26803   -1.42463   -0.11293   -0.65366
           5    3.00000    3.00000   -1.90715   -0.16779   -0.70852
           6    7.00000    7.00000   -2.20055   -0.23907   -0.77979
           7   13.00000   13.00000   -2.78293   -0.06788   -0.60860
           8   17.00000   17.00000   -3.33181   -0.14982   -0.69055
           9   23.00000   23.00000   -4.36919    0.01748   -0.52324
          10   27.00000   27.00000   -4.61200    0.09763   -0.44309
          11   33.00000   33.00000   -4.76603    0.93494    0.39421
          12   37.00000   37.00000   -4.92469    1.19967    0.65894
          13   43.00000   43.00000   -5.47771    1.22628    0.68555
          14   47.00000   47.00000   -5.74826    1.36085    0.82013
          15   53.00000   53.00000   -6.35674    0.74257    0.20184
          16   57.00000   57.00000   -6.15321    0.48212   -0.05860
          17   63.00000   63.00000   -6.19531   -0.03028   -0.57100
          18   67.00000   67.00000   -6.27084   -0.56559   -1.10631
          19   73.00000   73.00000   -4.94832   -0.61375   -1.15447
          20   77.00000   77.00000   -4.35057   -0.29956   -0.84028
          21   83.00000   83.00000   -3.34770    0.11962   -0.42110
          22   87.00000   87.00000   -2.86960    0.22810   -0.31263
          23   95.26803   95.26803   -1.58121    0.97457    0.43385
          24  107.83375  107.83375   -0.68635    1.08899    0.54826
          25  135.81454  135.81454   -0.36267    0.83854    0.29781
          26  170.47190  170.47190   -0.24994    0.67770    0.13698
          27  205.12925  205.12925   -0.08595    0.61535    0.07462

The `eigen', `left' and `right' columns have the same interpretation as in lm_lods.


11.7 Location LOD scores statements

New statements for these programs include maps for test positions, and parameters for some additional MCMC algorithms.


11.7.1 Location LOD scores computing requests


11.7.2 Location LOD scores file identification statements

All Lodscore programs use the general MORGAN file identification statements (see section File identification statements) and the Autozyg rescue file statements (see section Autozyg file identification statements). One additional statement is optional for lm_bayes:

output Rao-Blackwellized estimates file

If this file is specified, the set of Rao-Blackwellized lod score estimates at each trait position is written at the frequency specified in the `compute scores' statement.


11.7.3 Location LOD scores pedigree file description

All Lodscore programs use the general MORGAN pedigree file description statements; See section Pedigree file description statements. One additional statement is optional for lm_markers and lm_schnell.

input pedigree record traits K1 K2... reals I1 I2...

This statement is analogous to `input pedigree record traits K1 K2... integers I1 I2...' (see section Pedigree file description statements), when the trait is quantitative, rather than discrete.


11.7.4 Location LOD scores output file description

All Lodscore programs use the Autozyg output file description statements; See section Autozyg output file description.


11.7.5 Location LOD scores mapping model parameters

map [chromosome I] [gender (M | F)] trait K all interval proportions X1 X2...

Interval proportions specify the proportional genetic distance between markers for the trial positions for the trait. The same ratios are used between each marker pair, regardless of the inter-genetic distance (in cM).

map [chromosome I] [gender (M | F)] trait K intervals J1... proportions X1 X2...

This statement specifies interval proportions, but between specific pairs of markers. Interval 1 is between markers 1 and 2, interval 2 is between markers 2 and 3, etc.

map [chromosome I] [gender (M | F)] trait K (beginning | ending | external) ([Kosambi] distances | recombination fractions) X1 X2...

This statement specifies trial trait positions on the chromosome before the first marker and/or after the last marker.


11.7.6 Location LOD scores population model parameters


11.7.7 Location LOD scores computational parameters

set pseudo-priors X1 X2 ...

This statment is optional for lm_bayes. The number of pseudo-priors is the number of trial trait positions plus one. The first pseudo-prior is for the unlinked position; this should be assigned a positive value. All other pseudo-priors must be positive or zero. The set of pseudo-priors need not be normalized.


11.7.8 Location LOD scores MCMC parameters and options

As with the Autozg programs, the number of desired MC iterations must be specified, as there is no default value.

set MC iterations I

This statement sets the total number of "main" L- and M-sampler iterations. For lm_markers, the total MCMC run length is the sum of the number of burn-in iterations and main iterations. For lm_lods and lm_schnell, the total MCMC run length is the number of burn-in iterations, plus the product of the number of test positions for the trait, (see section Location LOD scores mapping model parameters), and the number of main iterations. For lm_bayes, the total MCMC run length is the sum of the number of burn-in, pseudo-prior (see below) and main iterations.

Additional statements for lm_bayes include the following:

set pseudo-prior iterations I

Following burn-in, lm_bayes performs iterations to calculate the pseudo-priors. These pseudo-priors are used to encourage the MCMC sampler to visit test positions of low posterior probability. The default number of iterations to compute pseudo-priors is 50% of the number of main iterations specified in the `set MC iterations' statement.

set sequential imputation proposals every I iterations

This option applies to lm_bayes's pseudo-prior and main MCMC iterations. It allows the MCMC chain to "restart" every Ith iteration. Sequential imputation is used to propose potential restart configurations which are accepted/rejected with Metropolis-Hastings probability.

set test position window I

This lm_bayes statement specifies the window size for the Metropolis-Hastings algorithm. I is the number of hypothesized trait positions on either side of the current position, with equal weight given to the 2*I + 1 trait positions. The default is window size is 6.


12. Polygenic Modeling of Quantitative Traits by EM Algorithm


12.1 Introduction to PolyEM programs

PolyEM is a set of programs to evaluate the likelihood and compute MLEs for polygenic models of quantitative traits by EM algorithms. There are four main programs whose features are summarized below:

All programs can work with looped pedigrees. The exception is that looped pedigrees cannot be used for the polygenic peeling algorithm in univar. The other programs do not use polygenic peeling to evaluate the likelihood.

Only examples and statement references for multivar are given since it has the most complete features. The statements for other programs are similar with some exceptions (e.g., any statements with covariances do not apply to univar or unibig).


12.2 Sample multivar parameter file

The pedigree file for PolyEM programs is similar to that for other MORGAN programs. The first three entries in each line consist of the individual's name, father's name and mother's name. Integers starting with the fourth column (usually gender) can be fixed effects (gender, age class, etc.) or discrete phenotypes.

For quantitative traits, real numbers follow the names and integers. These real numbers represent trait measurements. Missing values are coded with integer part `999', such as 999.5 in the following example.

Here is part of the pedigree file `polyem.ped'. This file can be found in the `PolyEM' subdirectory of `MORGAN_Examples'.

 
 input pedigree size 90
 input pedigree record names 3 integers 3 reals 2
****************************************
     1     0     0     1     1     0    0.0246   -1.0125
     2     0     0     2     1     0   -0.5978    1.5963
     3     0     0     1     1     0   -0.8124    0.5662
     4     0     0     2     1     0    0.4334    1.7721
     5     1     2     1     1     0    0.1802   -1.4672
     6     1     2     1     1     0   -1.7557    0.8091
     7     3     4     2     1     0    999.5     999.5
     8     3     4     2     1     0    1.9128    0.9780
     9     0     0     2     1     0    0.9530    2.3473
     ...       

Below is the example multivar parameter file, `polyem.par'. In this example, the starting values are chosen to be close to the final values obtained from an earlier run.

 
input pedigree file `polyem.ped'

select trait 1
start trait 1 mean  0.06515

select trait 2 effects 1 2
start trait 2 mean  1.80671
start trait 2  effect 1  -0.76589  0.58353
start trait 2  effect 2  -0.99161 -0.55306  1.15920

start residual variance     1.10   0.65
start additive variance     0.037  0.0288
start residual covariance  -0.09 
start additive covariance  -0.0017

fit environmental model            # for multivar
limit EM iterations 200
output spacing 20 EM iterations

The `start trait mean' statements indicate that the first trait is the first real number in the pedigree file, with starting mean value 0.06515 and the second trait is in the second column with starting mean value 1.80671. Note that each trait is identified by a `select trait' statement, which is followed by one or more `start trait mena' statements. statement specifies that, for trait 2, two fixed effects (also called covariates) are to be modeled. The `1 2' in this statement gives fixed effect value locations starting with column 4 in the pedigree file. Important: a fixed effect location of `1' indicates that the effect value will be found in column 4. In this example, the fixed effects are to be found in columns 4 and 5. The most commonly modeled fixed effect is gender, which, if present, resides in column 4 of a MORGAN pedigree file.

In this example, the first fixed effect has two levels with starting values -0.76589 and 0.58353. The second effect has three levels with corresponding starting values specified in the `start trait' statement. The starting values represent deviations from the global mean and they are normalized so that the weighted (by the number of individuals in that level) sum is zero to achieve identifiability. The pedigree file column corresponding to a given fixed effect, may take values between zero (to indicate a missing value) and the number of levels assigned to that effect (inclusive) in the `start trait' statement. Any other values in that column will not trigger an error. multivar will simply assume that an additional level exists, but no starting value was assigned to it by the user. Be sure to read the output comments to verify that the number of levels present in the file was as intended.

Initial values for additive and residual variances and covariances are specified in the next four statements. These statements are required. In the variance statements, the number of arguments must be the same as the number of traits selected and they go in order of increasing trait number. In the covariance statements, the number of arguments must be the same as the number of pairs of traits selected. See section multivar segregation model parameters, for discussion of the order of arguments.

In the `fit environmental model' statement, we ask multivar to fit a purely environmental model, with no genetic variance. This null hypothesis model is produced in additional to the genetic/environmental model.

The final two statements specify the number of EM iterations and how often the EM estimates are printed out.


12.3 Running multivar example and sample output

Under the subdirectory `PolyEM/', run the example by typing:

 
multivar polyem.par

Toward the end of the multivar output, are the parameter estimates and the log-likelihood from the last iteration of the EM algorithm. If you chose to fit a null (purely environmental) model, using the `fit environmental model' statement, those parameter estimates and log-likelihood are also given. A likelihood ratio test can then be performed, with test statistic equal to the absolute value of 2 times the difference between the log-likelihoods of the two models. A conservative test is provided by comparing the test statistic to a chi-squared distribution, with the degrees of freedom being the difference in the numbers of estimated parameters between these two models.

 
 iteration #201:

    additive variance estimates (traits 1, 2)
        0.816    0.037
    covariances
        0.138

    residual variance estimates (traits 1, 2)
        0.223    0.610
    covariances
       -0.239
    
    trait 1
       overall mean      -0.063
    trait 2
       overall mean       1.780
       fixed effect  1   -0.717    0.546
       fixed effect  2   -1.008   -0.552    1.167

 current log-likelihood = -183.098

 estimates of environmental model

    residual variance estimates (traits 1, 2)
        1.136    0.642
    covariances
       -0.102
    
    trait 1
       overall mean       0.062
    trait 2
       overall mean       1.801
       fixed effect  1   -0.773    0.589
       fixed effect  2   -1.010   -0.553    1.170

 environmental model log-likelihood = -197.799

12.4 multivar statements

multivar can fit a polygenic model with one to five traits, which can be modeled as dependent and/or independent. It can also fit a purely environmental model with no genetic component.

The command to run multivar (unibig and bivar have the same set of options) is:

 
multivar parfile [ped pedfile] [eigen eigenfile]

where parfile is the name of the parameter file and is required. pedfile overrides the `input pedigree file' statement and eigenfile overrides the `input eigenvalue file' statement in the parameter files.


12.4.1 multivar computing requests

select trait I [effects M1...]

One `select trait' statement is required for each trait to be modeled. Up to five traits are allowed for multivar.

The trait number, I, corresponds to the column of real numbers in the pedigree file, with the first column of real numbers being trait 1, the second column trait 2 and so on.

M's are the fixed effects to be modeled for a specified trait I. They are the columns in which they appear in the pedigree file, with fixed effect 1 in the 4th column (usually gender), effect 2 in column 5, and so on.


12.4.2 multivar segregation model parameters

start trait I mean X

There is one statement per trait, specifying the starting value for the mean trait values. The PolyEM programs will compute the initial values if not given.

start trait I effect M X1 X2...

Starting values for the fixed effect levels for the traits are computed, unless specified in this statement.

start additive variances X1 X2...

The starting values for the variances of the traits are required. The number of values must be the same as the number of traits selected, in the order of increasing trait number.

start residual variances X1 X2 ...

Starting values for residual variances are also required.

start additive (covariances | correlations) X12...

Starting values for the covariances (or correlations) between the traits are required. They are given the order: X12, X13, …, X1n, X23, …, X2n, …, where Xij is the covariance for the ith and jth selected traits from the pedigree file.

start residual (covariances | correlations) X12...

See the `start additive...' statements.


12.4.3 multivar computational parameters

fit additive (covariances | correlations) X12...

This statement specifies which covariances to be estimated and which to be fixed at 0. The order of values is the same as the `start additive covariances' statement with '1' indicating a covariance to be fit and '0' a covariance to be fixed.

Note that if trait 1 is correlated with trait 3, and so is trait 2 with trait 3, the correlation between 1 and 2 cannot be zero. So we have to be a bit careful in specifying the correlation structure.

fit residual (covariances | correlations) X12...

Similar statement for residual covariances.

set eigenvalues X1 X2...

Optional. This statement is used to provide predetermined eigenvalues of the G-matrix of observed individuals, with the number of values the same as the number of observed individuals.

input eigenvalue file eigenfile

Optional. If present, it overrides the `set eigenvalues' statements.

limit breeding iterations I

This statement specifies the maximum number of breeding-values iterations. The default number currently is 20.

set breeding convergence X

This statement specifies the convergence criterion for breeding-values iterations. The default number is currently is 1.0e-8.

limit EM iterations I

This statement specifies the number of EM iterations. The default number presently is 200. There is no option to specify convergence criterion. If convergence has not been achieved, the final estimates can be used as starting values to rerun the program.


12.4.4 multivar computational options

compute eigenvalues

If this statement is present, the values given in either the eigenfile or the statement `set eigenvalues' are ignored and the eigenvalues are computed by the program. This is the default action if no eigenvalues are given.

use (full | partitioned) EM

Use this statement to choose between two iterative procedures in maximum likelihood estimates. With the `full EM' option, the fixed effects, additive and residual variance and covariance are simultaneously updated. This is the default action.

With the `partitioned EM' option, the maximization step is partitioned into two parts. The first part is to maximize the likelihood over additive and residual variances/covariances; the second part over residual variances/covariances and fixed effects. The expectation step is run after each part. Partitioned EM takes more computer time.

fit environmental model

This statement asks a purely environmental model with no genetic variances to be fit, in additional to the genetic/environmental model.


12.4.5 multivar output options

output statistics (covariances | correlations)

By default, covariances are printed out.

output final adjusted phenotypes

If this option is specified, trait values adjusted for all fixed effects are computed and output.

output spacing I EM iterations

This statement requests a print out of the EM estimates every Ith iteration. The default number is defined in the program header file.

check gmatrix

This statement requests a print out of the G matrix for observed individuals and quit without doing the likelihood computation.

check ginverse

This statement requests a print out of the G inverse matrix and the program quits unless `check eigenvalues' has also been specified.

check eigenvalues

This statement requests the program to print out of the eigenvalues, whether computed or input, and then to quit. These eigenvalues can then be used as input in subsequent runs.

check eigenvalue computation

This statements causes some comments to be printed by the function that computes the eigenvalues.

check trace

This statements requests the trace of the G-inverse matrix to be printed.


13. Estimating Genetic Map from Marker Data


13.1 Introduction to lm_map

The program lm_map finds the maximum likelihood estimate (MLE) of the marker map, estimates the variance of the MLE and tests hypotheses about the true map. All inference is based on the multilocus, unphased genotypic data of members of a pedigree.

To find the MLE, lm_map uses a hybrid of two stochastic algorithms: Monte Carlo expectation-maximization (MCEM) and stochastic approximation (SA). The number of MC iterations is used in conjunction with the map to initiate the MCEM algorithm. After a small number of MCEM steps, lm_map switches to SA. Then, a relatively large number of SA steps are used to find the MLE. Once the MLE is obtained, a large amount of MCMC is used to estimate the variance of the MLE via the Louis missing information principle. Finally, the Monte Carlo likelihood ratio formula is adapted to facilitate the estimation of the likelihood ratio test statistic for testing the simple and/or composite null hypotheses.

For more details, see Stewart, WCL and Thompson, EA (2006) Improving estimates of genetic maps: A maximum likelihood approach. Biometrics 62, 728-734.


13.2 Sample lm_map parameter file

Files for lm_map can be found in the `Map' subdirectory of `MORGAN_Examples'.

 
input pedigree file 'map.ped'
input seed file    '../sampler.seed'
output overwrite seed file   '../sampler.seed'

select all markers

map gender F mark recomb fract .12 .18
map gender M mark recomb fract .10 .20 
output maps gender averaged specific
limit recomb fracts .01

set EM convergence .2
set SA convergence .001
set ascent iterations 100
set LRT stat iterations 500
set curvature iterations 500
set gradient iterations 10

set mark 1 2 3 freqs .25 .25 .25 .25

set markers 3 data

  1  0 0  0 0  0 0
  2  0 0  0 0  0 0
  3  0 0  0 0  0 0
  4  0 0  0 0  0 0
  5  0 0  0 0  0 0
  6  0 0  0 0  0 0
  7  1 1  3 3  2 3
  8  1 2  3 4  2 3
  9  1 4  3 4  2 3
 10  1 4  4 4  2 3
 11  0 0  0 0  0 0
 12  3 4  2 4  3 3
 13  3 4  2 4  3 3
 14  0 0  0 0  0 0
 15  0 0  0 0  0 0
 16  0 0  0 0  0 0
 17  0 0  0 0  0 0
 18  0 0  0 0  0 0
 19  0 0  0 0  0 0
 20  0 0  0 0  0 0
 21  2 4  1 4  3 4
 22  2 4  1 2  2 3
 23  2 4  1 4  3 4
 24  3 4  1 3  3 4
 25  1 2  3 4  3 4
 26  3 4  1 2  4 4
 27  1 3  1 3  3 4
 28  3 4  2 4  1 4
 29  2 4  4 4  1 3
 30  2 3  4 4  2 3
 31  1 1  2 3  2 3
 32  1 3  2 2  2 4
 33  1 4  1 3  3 3
 34  3 4  1 2  3 4
 35  0 0  0 0  0 0
 36  0 0  0 0  0 0
 37  0 0  0 0  0 0
 38  0 0  0 0  0 0
 39  0 0  0 0  0 0
 40  0 0  0 0  0 0
 41  1 4  2 3  3 4
 42  2 4  2 3  3 4
 43  1 3  2 4  2 4
 44  2 4  1 4  3 4
 45  0 0  0 0  0 0
 46  2 4  1 3  1 4
 47  3 4  3 3  1 2
 48  0 0  0 0  0 0
 49  0 0  0 0  0 0
 50  0 0  0 0  0 0
 51  0 0  0 0  0 0
 52  0 0  0 0  0 0
 53  0 0  0 0  0 0
 54  0 0  0 0  0 0
 55  3 3  1 3  2 3
 56  3 4  1 3  1 4
 57  4 4  1 3  1 4
 58  1 4  1 3  2 4
 59  4 4  3 3  1 4
 60  4 4  3 3  3 4
 61  4 4  1 3  3 4
 62  1 4  1 1  4 4
 63  1 4  1 1  3 4
 64  4 4  1 3  3 4
 65  1 2  1 1  4 4
 66  1 2  1 1  1 4
 67  1 2  1 3  4 4
 68  2 2  1 2  1 4
 69  0 0  0 0  0 0
 70  0 0  0 0  0 0
 71  0 0  0 0  0 0
 72  0 0  0 0  0 0
 73  0 0  0 0  0 0
 74  0 0  0 0  0 0
 75  1 2  4 4  2 4
 76  1 2  2 4  2 4
 77  1 2  4 4  1 4
 78  1 1  3 4  2 2
 79  0 0  0 0  0 0
 80  1 4  2 3  4 4
 81  1 4  2 3  1 4
 82  0 0  0 0  0 0
 83  0 0  0 0  0 0
 84  0 0  0 0  0 0
 85  0 0  0 0  0 0
 86  0 0  0 0  0 0
 87  0 0  0 0  0 0
 88  0 0  0 0  0 0
 89  3 4  3 3  3 4
 90  1 3  1 2  1 4
 91  1 3  1 2  1 4
 92  1 1  2 4  2 4
 93  1 4  2 3  2 4
 94  1 4  2 3  1 2
 95  1 4  1 4  1 2
 96  1 4  2 4  4 4
 97  1 4  3 3  4 4
 98  1 4  2 4  1 4
 99  1 4  3 4  3 4
100  1 3  2 4  1 4
101  1 4  3 4  2 3
102  1 3  2 4  1 2

use 100 sequential imputation realizations for setup
sample by scan 
set L-sampler probability .8 
set burn-in iterations 100
set MC iterations 50 

13.3 Running lm_map example and sample output

Run the example with the following command

 
lm_map map.par
 

13.4 lm_map statements

limit recombination fraction X

Optional. The default value is .01. For markers that are separated by little or no recombination, the MCEM algorithm may yield estimated recombination fractions of zero which could lead to a severe bias in the results. As a safeguard against such events, this statement places lower (X) and upper (0.5 - X) bounds on the estimated recombination fractions of the map. The value of X cannot exceed 0.1.

output maps gender [averaged] [specific]

Optional. The default is to output both types. This statement determines what type of map will be estimated. If both options are chosen, then the likelihood ratio test statistic for testing the null hypothesis of a sex-averaged map is automatically computed.

set EM convergence X

Optional. The default value is .2. The MCEM algorithm is initiated to find a suitable starting value for the SA algorithm. The MCEM algorithm automatically terminates when the percent change in successive parameter estimates is less than X. Values of X that are less than .2 may substantially increase the total amount of CPU time.

set SA convergence X

Optional. The default value is .005. The parameter estimate obtained from the final step of the MCEM algorithm is used to initiate the SA algorithm. The SA algorithm automatically terminates when the absolute change in successive map estimates is less than X for each recombination fraction in the map.

set ascent iterations I
set LRT stat iterations I

Optional. The default value is 500. The likelihood ratio test (LRT) statistic for testing nested hypotheses is estimated from a random number of iterations with lower bound I. If only one option is used in `output maps gender', then the estimated LRT statistic compares the MLE and the map specified by the `map' statements. If both options in `output maps gender' are used, then two LRT statistics are estimated: the first compares the MLE of the sex-averaged map to the sex-averaged map based on that given by the `map' statements, while the second compares the sex-specific and sex-averaged maximum likelihood (ML) map estimates.

set curvature iterations I

Optional. The default value is 5000. In order to execute the SA algorithm it is necessary to estimate the value of the curvature of the log likelihood function at a point near the MLE. A random number of MC iterations with lower bound I are used to estimate the value of the curvature near the MLE.

set gradient iterations I

Optional. The default value is 100. In order to execute the SA algorithm it is necessary to estimate the gradient of the log likelihood function for each SA step. The gradient is estimated from a random numuber of MC iterations with lower bound I.

compute estimates I times

Optional. The default value is 1. This statement sets the number of independent MC runs used to find the (MLE). For each consecutive run of the lm_map program, the current and previous map estimates are compared. The estimate that yields the higher likelihood is retained. Over multiple independent runs, this offers some protection against convergence to local maxima.


Concept Index

Jump to:   A   B   C   D   E   F   G   H   I   K   L   M   N   O   P   Q   R   S   T   U   W  
Index Entry Section

A
a priori ibd probabilities7. Estimating a priori ibd Probabilities by Monte Carlo
additive variance5.1 Introduction to genedrop.
additive variance5.2 Sample genedrop parameter file
additive variance5.4.3 genedrop population model parameters
affected individuals9.6.7 Autozyg computational parameters
allele frequencies5.4.3 genedrop population model parameters
Autozyg computational parameters9.6.7 Autozyg computational parameters
Autozyg computing requests9.6.1 Autozyg computing requests
Autozyg file identification statements9.6.2 Autozyg file identification statements
Autozyg mapping model parameters9.6.5 Autozyg mapping model parameters
Autozyg MCMC parameters and options9.6.8 Autozyg MCMC parameters and options
Autozyg output file description9.6.4 Autozyg output file description
Autozyg pedigree file description9.6.3 Autozyg pedigree file description
Autozyg population model parameters9.6.6 Autozyg population model parameters
Autozyg programs9.1 Introduction to lm_auto and lm_pval
Autozyg statements9.6 Autozyg statements
autozygosity9.1 Introduction to lm_auto and lm_pval

B
bivar12.1 Introduction to PolyEM programs
block Gibbs sampler8.2 LM-sampler
burn-in8.3 MCMC parameters and options
burn-in9.2 Sample lm_auto parameter file
burn-in9.6.8 Autozyg MCMC parameters and options
burn-in9.6.8 Autozyg MCMC parameters and options
burn-in11.7.8 Location LOD scores MCMC parameters and options

C
command line options2.1 Command syntax
command syntax2.1 Command syntax
component3.1 Introduction to pedcheck
component3.2 Sample pedcheck parameter file
component3.4 pedcheck statements
component4.4 kin statements
conditional ibd probabilities9. Estimating Conditional ibd Probabilities by MCMC
continuous trait specification for lod score calculation11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell

D
discrete trait6.5.4 markerdrop computational parameters
discrete trait9.6.7 Autozyg computational parameters
discrete trait specification for lod score calculation11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell

E
eigenvalue method11.1 Introduction to lm_lods, lm_markers, lm_bayes and lm_schnell
eigenvalue method11.3 Running lm_lods example and sample output
EM algorithm12. Polygenic Modeling of Quantitative Traits by EM Algorithm
exact computation of lod scores1.5 Future directions of MORGAN

F
file names2.3 File identification statements
file separator2.4 Pedigree file
file type codes2.1 Command syntax
fixed effects The `select trait 2'12.2 Sample multivar parameter file
founder gene labels5.1 Introduction to genedrop.
founder gene labels5.2 Sample genedrop parameter file
founder gene labels5.4.5 genedrop output pedigree options
founder gene labels6.4 Running markerdrop examples and sample output
founder gene labels8.1 Genetic model
future directions of MORGAN1.5 Future directions of MORGAN
fuzzy p-value9.5 Running lm_pval example and sample output

G
gender2.5 Pedigree file description statements
gender3.4 pedcheck statements
gender-specific maps5.4.2 genedrop mapping model parameters
gender-specific maps9.2 Sample lm_auto parameter file
gender-specific maps11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell
genedrop computational parameters5.4.4 genedrop computational parameters
genedrop computing requests5.4.1 genedrop computing requests
genedrop introduction5.1 Introduction to genedrop.
genedrop mapping model parameters5.4.2 genedrop mapping model parameters
genedrop output pedigree options5.4.5 genedrop output pedigree options
genedrop output seed file5.4.6 genedrop output seed file options
genedrop population model parameters5.4.3 genedrop population model parameters
genedrop sample output5.3 Running genedrop examples and sample output
genedrop sample parameter file5.2 Sample genedrop parameter file
genedrop statements5.4 genedrop statements
genetic map estimation13. Estimating Genetic Map from Marker Data
genetic model8.1 Genetic model
genotypic trait9.2 Sample lm_auto parameter file
genotypic trait9.6.7 Autozyg computational parameters
genotypic trait specification for lod score calculation11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell

H
Haldane map function5.4.2 genedrop mapping model parameters
Haldane map function6.5.2 markerdrop mapping model parameters
Haldane map function6.5.2 markerdrop mapping model parameters
Haldane map function7.2 Sample ibddrop parameter file
Haldane map function11.7.5 Location LOD scores mapping model parameters
how to get the examples1.3 Get and set up the examples
how to get the tutorial1.2 Get the Tutorial

I
ibd4.1 Introduction to kin
ibd7. Estimating a priori ibd Probabilities by Monte Carlo
ibd8.1 Genetic model
ibd9. Estimating Conditional ibd Probabilities by MCMC
ibd9.1 Introduction to lm_auto and lm_pval
ibd measures10.4 lm_ibdtests statements
ibd pattern7.1 Introduction to ibddrop
ibd pattern7.3 Running ibddrop example and sample output
ibd pattern9.3 Running lm_auto example and sample output
ibd-based tests10. Estimating ibd Based Test Statistics by MCMC
ibddrop introduction7.1 Introduction to ibddrop
ibddrop sample output7.3 Running ibddrop example and sample output
ibddrop sample parameter file7.2 Sample ibddrop parameter file
ibddrop statements7.4 ibddrop statements
identity by descent7. Estimating a priori ibd Probabilities by Monte Carlo
identity by descent9. Estimating Conditional ibd Probabilities by MCMC
identity by descent9.1 Introduction to lm_auto and lm_pval
inbreeding coefficient4.1 Introduction to kin
inbreeding coefficient4.4 kin statements
incomplete penetrance6.1 Introduction to markerdrop
incomplete penetrance6.5.4 markerdrop computational parameters
incomplete penetrance9.6.7 Autozyg computational parameters
inheritance indicators5.4.5 genedrop output pedigree options
inheritance indicators6.1 Introduction to markerdrop
inheritance indicators7.1 Introduction to ibddrop
inheritance indicators7.4 ibddrop statements
inheritance indicators8.1 Genetic model
inheritance indicators9.1 Introduction to lm_auto and lm_pval
inheritance indicators9.6.8 Autozyg MCMC parameters and options
inheritance vector8.1 Genetic model

K
kin introduction4.1 Introduction to kin
kin sample output4.3 Running kin example and sample output
kin sample parameter file4.2 Sample kin parameter file
kin statements4.4 kin statements
kinship coefficient4.1 Introduction to kin
kinship coefficient4.4 kin statements
Kosambi map function5.4.2 genedrop mapping model parameters
Kosambi map function6.5.2 markerdrop mapping model parameters
Kosambi map function6.5.2 markerdrop mapping model parameters
Kosambi map function11.7.5 Location LOD scores mapping model parameters

L
L-sampler8.2 LM-sampler
L-sampler9.2 Sample lm_auto parameter file
L-sampler9.6.8 Autozyg MCMC parameters and options
L-sampler11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell
L-sampler probability8.2 LM-sampler
likelihood ratio test statistic for maps13.4 lm_map statements
LM-sampler8.2 LM-sampler
LM-sampler9.1 Introduction to lm_auto and lm_pval
lm_auto introduction9.1 Introduction to lm_auto and lm_pval
lm_auto sample output9.3 Running lm_auto example and sample output
lm_auto sample parameter file9.2 Sample lm_auto parameter file
lm_auto statements9.6 Autozyg statements
lm_bayes introduction11.1 Introduction to lm_lods, lm_markers, lm_bayes and lm_schnell
lm_bayes sample output11.5 Running lm_bayes examples and sample output
lm_bayes statements11.7 Location LOD scores statements
lm_ibdtests introduction10.1 Introduction to lm_ibdtests
lm_ibdtests sample output10.3 Sample lm_ibdtests output
lm_ibdtests statements10.4 lm_ibdtests statements
lm_lods introduction11.1 Introduction to lm_lods, lm_markers, lm_bayes and lm_schnell
lm_lods sample output11.3 Running lm_lods example and sample output
lm_lods statements11.7 Location LOD scores statements
lm_map introduction13.1 Introduction to lm_map
lm_map sample output13.3 Running lm_map example and sample output
lm_map statements13.4 lm_map statements
lm_markers introduction11.1 Introduction to lm_lods, lm_markers, lm_bayes and lm_schnell
lm_markers sample output11.4 Running lm_markers examples and sample output
lm_markers statements11.7 Location LOD scores statements
lm_multiple1.5 Future directions of MORGAN
lm_pval introduction9.1 Introduction to lm_auto and lm_pval
lm_pval sample output9.5 Running lm_pval example and sample output
lm_pval sample parameter file9.4 Sample lm_pval parameter file
lm_pval statements9.6 Autozyg statements
lm_schnell introduction11.1 Introduction to lm_lods, lm_markers, lm_bayes and lm_schnell
lm_schnell sample output11.6 Running lm_schnell example and sample output
location LOD scores computational parameters11.7.7 Location LOD scores computational parameters
location LOD scores computing requests11.7.1 Location LOD scores computing requests
location LOD scores estimates11. Estimating Location LOD Scores by MCMC
location LOD scores file identification statements11.7.2 Location LOD scores file identification statements
location LOD scores mapping model parameters11.7.5 Location LOD scores mapping model parameters
location LOD scores MCMC parameters and options11.7.8 Location LOD scores MCMC parameters and options
location LOD scores output file description11.7.4 Location LOD scores output file description
location LOD scores pedigree file description11.7.3 Location LOD scores pedigree file description
location LOD scores population model parameters11.7.6 Location LOD scores population model parameters
location LOD scores statements11.7 Location LOD scores statements
locus window9.6.7 Autozyg computational parameters
locus-by-locus sampling9.6.8 Autozyg MCMC parameters and options
locus-by-locus setup8.3 MCMC parameters and options

M
M-sampler8.2 LM-sampler
M-sampler9.2 Sample lm_auto parameter file
M-sampler9.6.8 Autozyg MCMC parameters and options
M-sampler11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell
major gene model5.1 Introduction to genedrop.
marker allele frequencies6.5.3 markerdrop population model parameters
marker data9.6.7 Autozyg computational parameters
marker data file2.3 File identification statements
marker data file9.2 Sample lm_auto parameter file
marker names6.5.3 markerdrop population model parameters
marker seeds6.5.4 markerdrop computational parameters
marker simulation using inheritance indicators6.5.1 markerdrop computing requests
marker simulation using trait6.5.1 markerdrop computing requests
markerdrop computational parameters6.5.4 markerdrop computational parameters
markerdrop computing requests6.5.1 markerdrop computing requests
markerdrop input file options6.5.5 markerdrop input file options
markerdrop introduction6.1 Introduction to markerdrop
markerdrop mapping model parameters6.5.2 markerdrop mapping model parameters
markerdrop output6.4 Running markerdrop examples and sample output
markerdrop parameter file - conditional on inheritance pattern6.3 Sample markerdrop parameter file - conditional on inheritance pattern
markerdrop parameter file - conditional on trait6.2 Sample markerdrop parameter file - conditional on trait
markerdrop population model parameters6.5.3 markerdrop population model parameters
markerdrop statements6.5 markerdrop statements
Markov chain Monte Carlo8. Using MCMC to Estimate Parameters of Interest in Pedigree Data
Markov chain Monte Carlo9.1 Introduction to lm_auto and lm_pval
Markov chain Monte Carlo11.1 Introduction to lm_lods, lm_markers, lm_bayes and lm_schnell
maximum likelihood (ML) map estimates13.4 lm_map statements
MC iterations8.3 MCMC parameters and options
MC iterations9.6.8 Autozyg MCMC parameters and options
MC iterations11.7.8 Location LOD scores MCMC parameters and options
MCMC introduction8. Using MCMC to Estimate Parameters of Interest in Pedigree Data
MCMC options8.3 MCMC parameters and options
MCMC parameters8.3 MCMC parameters and options
meiosis indicators5.4.5 genedrop output pedigree options
meiosis indicators6.1 Introduction to markerdrop
meiosis indicators7.1 Introduction to ibddrop
meiosis indicators7.4 ibddrop statements
meiosis indicators8.1 Genetic model
meiosis indicators9.1 Introduction to lm_auto and lm_pval
meiosis indicators9.6.8 Autozyg MCMC parameters and options
meiosis indicators11.1 Introduction to lm_lods, lm_markers, lm_bayes and lm_schnell
meiosis indicators11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell
missing marker data9.6.7 Autozyg computational parameters
missing quantitative trait data12.2 Sample multivar parameter file
mixed model5.1 Introduction to genedrop.
MM-sampler1.5 Future directions of MORGAN
MORGAN1.1 Overview of MORGAN
MORGAN files2.1 Command syntax
multivar12.1 Introduction to PolyEM programs
multivar computational options12.4.4 multivar computational options
multivar computational parameters12.4.3 multivar computational parameters
multivar computing requests12.4.1 multivar computing requests
multivar output options12.4.5 multivar output options
multivar sample output12.3 Running multivar example and sample output
multivar sample parameter file12.2 Sample multivar parameter file
multivar segregation model parameters12.4.2 multivar segregation model parameters
multivar statements12.4 multivar statements

N
names of individuals2.4 Pedigree file
non-genetic model5.1 Introduction to genedrop.

O
observed individuals2.4 Pedigree file
observed individuals12.4.3 multivar computational parameters
overview of MORGAN1.1 Overview of MORGAN

P
parameter file2.2 Parameter file
parameter statements2.2 Parameter file
parameter statements in pedigree files2.4 Pedigree file
pedcheck examples3.3 Running pedcheck examples
pedcheck introduction3.1 Introduction to pedcheck
pedcheck sample parameter file3.2 Sample pedcheck parameter file
pedcheck statements3.4 pedcheck statements
pedigree component3.2 Sample pedcheck parameter file
pedigree component3.4 pedcheck statements
pedigree component4.4 kin statements
pedigree file2.3 File identification statements
pedigree file2.4 Pedigree file
pedigree file descriptions2.5 Pedigree file description statements
pedigree options3.4 pedcheck statements
pedigree order3.4 pedcheck statements
pedigree record format2.5 Pedigree file description statements
pedigree record format6.5.5 markerdrop input file options
pedigree size2.5 Pedigree file description statements
pedigree trait data order2.5 Pedigree file description statements
pedigree validity3.1 Introduction to pedcheck
penetrance6.5.4 markerdrop computational parameters
phenotypic trait specification for lod score calculation11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell
PolyEM12. Polygenic Modeling of Quantitative Traits by EM Algorithm
PolyEM introduction12.1 Introduction to PolyEM programs
PolyEM statements12.4 multivar statements
polygenic model5.1 Introduction to genedrop.
polygenic model12. Polygenic Modeling of Quantitative Traits by EM Algorithm
proband gametes7.2 Sample ibddrop parameter file
proband gametes7.4 ibddrop statements
proband gametes9.2 Sample lm_auto parameter file
proband gametes9.3 Running lm_auto example and sample output
proband gametes9.6.7 Autozyg computational parameters
pseudo-prior11.7.7 Location LOD scores computational parameters
pseudo-prior iterations8.3 MCMC parameters and options

Q
quantitative trait5.1 Introduction to genedrop.
quantitative trait9.6.7 Autozyg computational parameters
quantitative trait11.7.3 Location LOD scores pedigree file description
quantitative trait12. Polygenic Modeling of Quantitative Traits by EM Algorithm
quantitative trait specification for lod score calculation11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell

R
Rao-Blackwellized estimates11.5 Running lm_bayes examples and sample output
Rao-Blackwellized estimates11.7.2 Location LOD scores file identification statements
recombination4.4 kin statements
rescue file9.6.2 Autozyg file identification statements
residual variance5.2 Sample genedrop parameter file
residual variance5.4.3 genedrop population model parameters
running genedrop examples5.3 Running genedrop examples and sample output
running ibddrop example7.3 Running ibddrop example and sample output
running kin example4.3 Running kin example and sample output
running lm_auto example9.3 Running lm_auto example and sample output
running lm_bayes examples11.5 Running lm_bayes examples and sample output
running lm_lods example11.3 Running lm_lods example and sample output
running lm_map example13.3 Running lm_map example and sample output
running lm_markers examples11.4 Running lm_markers examples and sample output
running lm_pval example9.5 Running lm_pval example and sample output
running lm_schnell example11.6 Running lm_schnell example and sample output
running markderdrop examples6.4 Running markerdrop examples and sample output
running multivar example12.3 Running multivar example and sample output

S
sample by scan8.3 MCMC parameters and options
sample by scan9.6.8 Autozyg MCMC parameters and options
sample by step8.3 MCMC parameters and options
sample by step9.6.8 Autozyg MCMC parameters and options
sample parameter file for lm_bayes11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell
sample parameter file for lm_ibdtests10.2 Sample lm_ibdtests parameter file
sample parameter file for lm_lods11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell
sample parameter file for lm_map13.2 Sample lm_map parameter file
sample parameter file for lm_markers11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell
sample parameter file for lm_schnell11.2 Sample parameter files for lm_lods, lm_markers, lm_bayes and lm_schnell
schnell11.1 Introduction to lm_lods, lm_markers, lm_bayes and lm_schnell
scores file9.6.2 Autozyg file identification statements
scoreset9.6.7 Autozyg computational parameters
scoreset9.6.7 Autozyg computational parameters
seed file2.3 File identification statements
seed file5.2 Sample genedrop parameter file
seed file7.2 Sample ibddrop parameter file
seed file9.6.2 Autozyg file identification statements
seeds for data simulation5.1 Introduction to genedrop.
seeds for data simulation5.3 Running genedrop examples and sample output
seeds for data simulation5.4.4 genedrop computational parameters
seeds for data simulation5.4.6 genedrop output seed file options
seeds for sampler7.2 Sample ibddrop parameter file
seeds for sampler7.4 ibddrop statements
sequential imputation8.3 MCMC parameters and options
sequential imputation9.6.8 Autozyg MCMC parameters and options
sequential imputation proposals11.7.8 Location LOD scores MCMC parameters and options
sex-averaged map13.4 lm_map statements
sex-specific map13.4 lm_map statements
simulating marker data5. Simulating Marker and Trait Data in Pedigrees
simulating marker data5.4.4 genedrop computational parameters
simulating trait data5. Simulating Marker and Trait Data in Pedigrees
simulating trait data5.4.4 genedrop computational parameters

T
trait data6.5.4 markerdrop computational parameters
trait data9.2 Sample lm_auto parameter file
trait data9.6.7 Autozyg computational parameters
trait frequencies6.5.3 markerdrop population model parameters
trait model6.1 Introduction to markerdrop
trait seeds6.5.4 markerdrop computational parameters
trait test positions11.7.5 Location LOD scores mapping model parameters
Tutorial for LOD score examples with `ped73.ped'1.4 Overview of the pedigrees used in the examples

U
unibig12.1 Introduction to PolyEM programs
univar12.1 Introduction to PolyEM programs
unobserved individuals2.4 Pedigree file
unobserved individuals5.1 Introduction to genedrop.
unobserved individuals5.4.5 genedrop output pedigree options

W
white space2. Common Features and File Formats
whitespace2.4 Pedigree file
window patterns9.2 Sample lm_auto parameter file
window patterns9.6.7 Autozyg computational parameters

Jump to:   A   B   C   D   E   F   G   H   I   K   L   M   N   O   P   Q   R   S   T   U   W  

Statement Index

Jump to:   A   C   F   I   L   M   O   S   U  
Index Entry Section

A
allow pedigree size2.5 Pedigree file description statements
assign gender3.4 pedcheck statements
assume all observed5.4.5 genedrop output pedigree options

C
check eigenvalue computation12.4.5 multivar output options
check eigenvalues12.4.5 multivar output options
check ginverse12.4.5 multivar output options
check gmatrix12.4.5 multivar output options
check progress MC iterations9.6.8 Autozyg MCMC parameters and options
check trace12.4.5 multivar output options
component compute two-locus inbreeding coefficient4.4 kin statements
compute component inbreeding coefficient4.4 kin statements
compute component kinship coefficient4.4 kin statements
compute eigenvalues12.4.4 multivar computational options
compute estimates times13.4 lm_map statements
compute ibd statistics10.4 lm_ibdtests statements
compute inbreeding coefficient4.4 kin statements
compute kinship coefficient4.4 kin statements
compute scores every iterations9.6.8 Autozyg MCMC parameters and options
compute two-locus inbreeding coefficient4.4 kin statements

F
fit additive correlations12.4.3 multivar computational parameters
fit additive covariances12.4.3 multivar computational parameters
fit environmental model12.4.4 multivar computational options
fit residual correlations12.4.3 multivar computational parameters
fit residual covariances12.4.3 multivar computational parameters

I
ignore gender3.4 pedcheck statements
input eigenvalue file12.4.3 multivar computational parameters
input marker data file2.1 Command syntax
input marker data file2.3 File identification statements
input pedigree file2.1 Command syntax
input pedigree file2.3 File identification statements
input pedigree record father mother2.5 Pedigree file description statements
input pedigree record gender absent2.5 Pedigree file description statements
input pedigree record gender present2.5 Pedigree file description statements
input pedigree record mother father2.5 Pedigree file description statements
input pedigree record names 32.5 Pedigree file description statements
input pedigree record names 3 integers2.5 Pedigree file description statements
input pedigree record names 3 integers reals2.5 Pedigree file description statements
input pedigree record observed2.5 Pedigree file description statements
input pedigree record observed2.5 Pedigree file description statements
input pedigree record observed absent5.4.5 genedrop output pedigree options
input pedigree record observed present5.4.5 genedrop output pedigree options
input pedigree record traits integer pairs6.5.5 markerdrop input file options
input pedigree record traits integers2.5 Pedigree file description statements
input pedigree record traits integers6.5.5 markerdrop input file options
input pedigree record traits reals11.7.3 Location LOD scores pedigree file description
input pedigree size2.5 Pedigree file description statements
input rescue file9.6.2 Autozyg file identification statements
input seed file2.1 Command syntax
input seed file2.3 File identification statements
input seet file9.6.2 Autozyg file identification statements

L
limit breeding iterations12.4.3 multivar computational parameters
limit EM iterations12.4.3 multivar computational parameters
limit recombination fraction13.4 lm_map statements

M
map chromosome gender marker distances5.4.2 genedrop mapping model parameters
map chromosome gender marker Kosambi distances5.4.2 genedrop mapping model parameters
map chromosome gender marker Kosambi positions5.4.2 genedrop mapping model parameters
map chromosome gender marker positions5.4.2 genedrop mapping model parameters
map chromosome gender marker recombination fractions5.4.2 genedrop mapping model parameters
map chromosome gender trait all interval proportions11.7.5 Location LOD scores mapping model parameters
map chromosome gender trait beginning distance11.7.5 Location LOD scores mapping model parameters
map chromosome gender trait beginning Kosambi distance11.7.5 Location LOD scores mapping model parameters
map chromosome gender trait beginning recombination fractions11.7.5 Location LOD scores mapping model parameters
map chromosome gender trait ending distance11.7.5 Location LOD scores mapping model parameters
map chromosome gender trait ending Kosambi distance11.7.5 Location LOD scores mapping model parameters
map chromosome gender trait ending recombination fractions11.7.5 Location LOD scores mapping model parameters
map chromosome gender trait external distance11.7.5 Location LOD scores mapping model parameters
map chromosome gender trait external Kosambi distance11.7.5 Location LOD scores mapping model parameters
map chromosome gender trait external recombination fractions11.7.5 Location LOD scores mapping model parameters
map chromosome gender trait intervals proportions11.7.5 Location LOD scores mapping model parameters
map chromosome gender traits markers distances5.4.2 genedrop mapping model parameters
map chromosome gender traits markers Kosambi distances5.4.2 genedrop mapping model parameters
map chromosome gender traits markers recombination fractions5.4.2 genedrop mapping model parameters
map chromosome marker distances5.4.2 genedrop mapping model parameters
map chromosome marker Kosambi distances5.4.2 genedrop mapping model parameters
map chromosome marker Kosambi positions5.4.2 genedrop mapping model parameters
map chromosome marker positions5.4.2 genedrop mapping model parameters
map chromosome marker recombination fractions5.4.2 genedrop mapping model parameters
map chromosome trait all interval proportions11.7.5 Location LOD scores mapping model parameters
map chromosome trait beginning distance11.7.5 Location LOD scores mapping model parameters
map chromosome trait beginning Kosambi distance11.7.5 Location LOD scores mapping model parameters
map chromosome trait beginning recombination fractions11.7.5 Location LOD scores mapping model parameters
map chromosome trait ending distance11.7.5 Location LOD scores mapping model parameters
map chromosome trait ending Kosambi distance11.7.5 Location LOD scores mapping model parameters
map chromosome trait ending recombination fractions11.7.5 Location LOD scores mapping model parameters
map chromosome trait external distance11.7.5 Location LOD scores mapping model parameters
map chromosome trait external Kosambi distance11.7.5 Location LOD scores mapping model parameters
map chromosome trait external recombination fractions11.7.5 Location LOD scores mapping model parameters
map chromosome trait intervals proportions11.7.5 Location LOD scores mapping model parameters
map chromosome traits markers distances5.4.2 genedrop mapping model parameters
map chromosome traits markers Kosambi distances5.4.2 genedrop mapping model parameters
map chromosome traits markers recombination fractions5.4.2 genedrop mapping model parameters
map gender marker distance6.5.2 markerdrop mapping model parameters
map gender marker distances5.4.2 genedrop mapping model parameters
map gender marker Kosambi distance6.5.2 markerdrop mapping model parameters
map gender marker Kosambi distances5.4.2 genedrop mapping model parameters
map gender marker Kosambi position6.5.2 markerdrop mapping model parameters
map gender marker Kosambi positions5.4.2 genedrop mapping model parameters
map gender marker position6.5.2 markerdrop mapping model parameters
map gender marker positions5.4.2 genedrop mapping model parameters
map gender marker recombination fraction6.5.2 markerdrop mapping model parameters
map gender marker recombination fractions5.4.2 genedrop mapping model parameters
map gender trait all interval proportions11.7.5 Location LOD scores mapping model parameters
map gender trait beginning distance11.7.5 Location LOD scores mapping model parameters
map gender trait beginning recombination fractions11.7.5 Location LOD scores mapping model parameters
map gender trait ending distance11.7.5 Location LOD scores mapping model parameters
map gender trait ending recombination fractions11.7.5 Location LOD scores mapping model parameters
map gender trait external distance11.7.5 Location LOD scores mapping model parameters
map gender trait external Kosambi distance11.7.5 Location LOD scores mapping model parameters
map gender trait external recombination fractions11.7.5 Location LOD scores mapping model parameters
map gender trait intervals proportions11.7.5 Location LOD scores mapping model parameters
map gender trait Kosambi beginning distance11.7.5 Location LOD scores mapping model parameters
map gender trait Kosambi ending distance11.7.5 Location LOD scores mapping model parameters
map gender trait marker distance6.5.2 markerdrop mapping model parameters
map gender trait marker Kosambi distance6.5.2 markerdrop mapping model parameters
map gender trait marker recombination fraction6.5.2 markerdrop mapping model parameters
map gender traits markers distances5.4.2 genedrop mapping model parameters
map gender traits markers Kosambi distances5.4.2 genedrop mapping model parameters
map gender traits markers recombination fractions5.4.2 genedrop mapping model parameters
map marker distance6.5.2 markerdrop mapping model parameters
map marker distances5.4.2 genedrop mapping model parameters
map marker Kosambi distance6.5.2 markerdrop mapping model parameters
map marker Kosambi distances5.4.2 genedrop mapping model parameters
map marker Kosambi position6.5.2 markerdrop mapping model parameters
map marker Kosambi positions5.4.2 genedrop mapping model parameters
map marker position6.5.2 markerdrop mapping model parameters
map marker positions5.4.2 genedrop mapping model parameters
map marker recombination fraction6.5.2 markerdrop mapping model parameters
map marker recombination fractions5.4.2 genedrop mapping model parameters
map trait all interval proportions11.7.5 Location LOD scores mapping model parameters
map trait beginning distance11.7.5 Location LOD scores mapping model parameters
map trait beginning Kosambi distance11.7.5 Location LOD scores mapping model parameters
map trait beginning recombination fractions11.7.5 Location LOD scores mapping model parameters
map trait ending distance11.7.5 Location LOD scores mapping model parameters
map trait ending Kosambi distance11.7.5 Location LOD scores mapping model parameters
map trait ending recombination fractions11.7.5 Location LOD scores mapping model parameters
map trait external distance11.7.5 Location LOD scores mapping model parameters
map trait external Kosambi distance11.7.5 Location LOD scores mapping model parameters
map trait external recombination fractions11.7.5 Location LOD scores mapping model parameters
map trait intervals proportions11.7.5 Location LOD scores mapping model parameters
map trait marker distance6.5.2 markerdrop mapping model parameters
map trait marker Kosambi distance6.5.2 markerdrop mapping model parameters
map trait marker recombination fraction6.5.2 markerdrop mapping model parameters
map traits markers distances5.4.2 genedrop mapping model parameters
map traits markers Kosambi distances5.4.2 genedrop mapping model parameters
map traits markers recombination fractions5.4.2 genedrop mapping model parameters

O
output final adjusted phenotypes12.4.5 multivar output options
output maps gender13.4 lm_map statements
output marker seeds only5.4.6 genedrop output seed file options
output overwrite pedigree file2.3 File identification statements
output overwrite seed file2.3 File identification statements
output pedigree chronological3.4 pedcheck statements
output pedigree file2.1 Command syntax
output pedigree file2.3 File identification statements
output pedigree record father mother3.4 pedcheck statements
output pedigree record founder gene labels5.4.5 genedrop output pedigree options
output pedigree record mother father3.4 pedcheck statements
output pedigree record trait latent variables5.4.5 genedrop output pedigree options
output pedigree record unobserved variables5.4.5 genedrop output pedigree options
output Rao-Blackwellized estimates file11.7.2 Location LOD scores file identification statements
output rescue data iterations9.6.4 Autozyg output file description
output rescue file9.6.2 Autozyg file identification statements
output score file2.1 Command syntax
output scores every MC iterations9.6.4 Autozyg output file description
output scores file9.6.2 Autozyg file identification statements
output seed file2.1 Command syntax
output seed file2.3 File identification statements
output spacing EM iterations12.4.5 multivar output options
output statistics correlations12.4.5 multivar output options
output statistics covariances12.4.5 multivar output options
output trait seeds only5.4.6 genedrop output seed file options

S
sample by scan9.6.8 Autozyg MCMC parameters and options
sample by step9.6.8 Autozyg MCMC parameters and options
select all markers9.6.1 Autozyg computing requests
select all markers traits9.6.1 Autozyg computing requests
select chromosome all markers9.6.1 Autozyg computing requests
select chromosome all markers traits9.6.1 Autozyg computing requests
select chromosome markers9.6.1 Autozyg computing requests
select chromosome markers traits9.6.1 Autozyg computing requests
select markers9.6.1 Autozyg computing requests
select markers traits9.6.1 Autozyg computing requests
select trait12.4.1 multivar computing requests
select trait effects12.4.1 multivar computing requests
select unlinked trait9.6.1 Autozyg computing requests
set affected individuals9.6.7 Autozyg computational parameters
set affected individuals trait9.6.7 Autozyg computational parameters
set ascent iterations13.4 lm_map statements
set breeding convergence12.4.3 multivar computational parameters
set burn-in iterations9.6.8 Autozyg MCMC parameters and options
set chromosome locus window7.4 ibddrop statements
set chromosome locus window9.6.7 Autozyg computational parameters
set chromosome marker names9.6.6 Autozyg population model parameters
set chromosome markers frequencies5.4.3 genedrop population model parameters
set component affected individuals9.6.7 Autozyg computational parameters
set component proband gametes7.4 ibddrop statements
set component proband gametes9.6.7 Autozyg computational parameters
set component scoreset proband gametes9.6.7 Autozyg computational parameters
set component scoreset window patterns9.6.7 Autozyg computational parameters
set component window patterns9.6.7 Autozyg computational parameters
set curvature iterations13.4 lm_map statements
set eigenvalues12.4.3 multivar computational parameters
set EM convergence13.4 lm_map statements
set gradient iterations13.4 lm_map statements
set ibd measures10.4 lm_ibdtests statements
set ibd permutations10.4 lm_ibdtests statements
set ibd tests10.4 lm_ibdtests statements
set incomplete penetrances6.5.4 markerdrop computational parameters
set incomplete penetrances9.6.7 Autozyg computational parameters
set L-sampler probability9.6.8 Autozyg MCMC parameters and options
set locus window7.4 ibddrop statements
set locus window9.6.7 Autozyg computational parameters
set LRT stat iterations13.4 lm_map statements
set marker data9.6.7 Autozyg computational parameters
set marker frequencies6.5.3 markerdrop population model parameters
set marker names6.5.3 markerdrop population model parameters
set marker names9.6.6 Autozyg population model parameters
set marker seeds5.4.4 genedrop computational parameters
set markers frequencies5.4.3 genedrop population model parameters
set MC iterations7.4 ibddrop statements
set MC iterations9.6.8 Autozyg MCMC parameters and options
set MC iterations11.7.8 Location LOD scores MCMC parameters and options
set normalized frequencies5.4.3 genedrop population model parameters
set proband gametes7.4 ibddrop statements
set proband gametes9.6.7 Autozyg computational parameters
set pseudo-prior iterations11.7.8 Location LOD scores MCMC parameters and options
set pseudo-priors11.7.7 Location LOD scores computational parameters
set recombination frequencies4.4 kin statements
set SA convergence13.4 lm_map statements
set sampler seeds7.4 ibddrop statements
set scoreset proband gametes9.6.7 Autozyg computational parameters
set scoreset window patterns9.6.7 Autozyg computational parameters
set sequential imputation proposals every iterations11.7.8 Location LOD scores MCMC parameters and options
set test position window11.7.8 Location LOD scores MCMC parameters and options
set trait data discrete6.5.4 markerdrop computational parameters
set trait data discrete9.6.7 Autozyg computational parameters
set trait data genotypic9.6.7 Autozyg computational parameters
set trait data quantitative9.6.7 Autozyg computational parameters
set trait frequencies6.5.3 markerdrop population model parameters
set trait seeds5.4.4 genedrop computational parameters
set traits additive variance5.4.3 genedrop population model parameters
set traits frequencies5.4.3 genedrop population model parameters
set traits genotype means5.4.3 genedrop population model parameters
set traits residual variance5.4.3 genedrop population model parameters
set window patterns9.6.7 Autozyg computational parameters
simulate chromosome markers traits5.4.1 genedrop computing requests
simulate markers5.4.1 genedrop computing requests
simulate markers trait7.4 ibddrop statements
simulate markers traits5.4.1 genedrop computing requests
simulate markers using inheritance6.5.1 markerdrop computing requests
simulate markers using trait6.5.1 markerdrop computing requests
simulate unlinked trait7.4 ibddrop statements
simulate unlinked traits5.4.1 genedrop computing requests
start additive correlations12.4.2 multivar segregation model parameters
start additive covariances12.4.2 multivar segregation model parameters
start additive variances12.4.2 multivar segregation model parameters
start residual correlations12.4.2 multivar segregation model parameters
start residual covariances12.4.2 multivar segregation model parameters
start residual variances12.4.2 multivar segregation model parameters
start trait effect12.4.2 multivar segregation model parameters
start trait mean12.4.2 multivar segregation model parameters

U
use full EM12.4.4 multivar computational options
use locus-by-locus sampling for setup9.6.8 Autozyg MCMC parameters and options
use partitioned EM12.4.4 multivar computational options
use sequential imputation for setup9.6.8 Autozyg MCMC parameters and options
use sequential imputation realizations for setup9.6.8 Autozyg MCMC parameters and options

Jump to:   A   C   F   I   L   M   O   S   U  

[Top] [Contents] [Index] [ ? ]

About This Document

This document was generated on August, 11 2007 using texi2html 1.76.

The buttons in the navigation panels have the following meaning:

Button Name Go to From 1.2.3 go to
[ < ] Back previous section in reading order 1.2.2
[ > ] Forward next section in reading order 1.2.4
[ << ] FastBack beginning of this chapter or previous chapter 1
[ Up ] Up up section 1.2
[ >> ] FastForward next chapter 2
[Top] Top cover (top) of document  
[Contents] Contents table of contents  
[Index] Index index  
[ ? ] About about (help)  

where the Example assumes that the current position is at Subsubsection One-Two-Three of a document of the following structure:


This document was generated on August, 11 2007 using texi2html 1.76.