[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.1 Introduction to lm_map | ||
14.2 Sample lm_map parameter file | ||
14.3 Running lm_map with genotypic data | ||
14.4 Running lm_map with phenotypic data | ||
14.5 lm_map statements |
See Concept Index for: genetic map estimation.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_map
See References, for details of the cited papers.
The program lm_map
finds the maximum likelihood estimate (MLE) of the
marker map, estimates the
(statistical not Monte Carlo) variance of the MLE,
and tests hypotheses about the
true map. All inference is based on the analysis of multilocus marker data
obtained from some (possibly all) members of a set of independent families
(pedigree components).
To find the MLE, lm_map uses either Monte Carlo expectation-maximization (MCEM)
or a hybrid of MCEM and stochastic approximation (SA). In either case, the user
must supply an initial map estimate, and an initial Monte Carlo (MC) sample size
for the MCEM algorithm. The MCEM sample size is automatically increased with
each successive step of the algorithm, and only a small number of MCEM steps are
needed to estimate the MLE. If the hybrid option is chosen, lm_map
uses
the MCEM estimate to seed the SA algorithm. Then, a relatively large number of
SA steps are used to estimate the MLE with greater precision.
Once the MLE is obtained, a long Markov chain is used to estimate the variance of the MLE. Finally, a slight adaptation of the MC likelihood ratio formula is used to estimate the likelihood ratio test (LRT) statistics for testing the simple and/or composite null hypotheses. For more details, see [ST06].
See Concept Index for:
lm_map
introduction.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_map
parameter fileThe two sample parameter files for lm_map
can be found in the directory
‘MORGAN_Examples/Map’. The two files are ‘map_G.par’ and
‘map_P.par’, along with the corresponding marker data files
‘map_G.markers’ and ‘map_P.markers’. Thus there are two examples, one
for genotypic markers (G) and one for phenotypic markers (P).
‘G’ denotes that
marker genotypes are observed without error.
‘P’ denotes the possibility of
error, so that the observed marker phenotype is not the same as the underlying
true marker genotype. This example uses the pedigree file ‘map.ped’,
but different marker data files depending on the choice of ‘P’ or
‘G’.
‘map_G.par’ and ‘map_P.par’ have the following statements in common:
input pedigree file './map.ped' input marker data file './map_G.markers' # or './map_P.markers' for 'map_P.par' select all markers set marker 1 2 3 allele freqs .2 .2 .2 .2 .2 set marker names DS123 DS456 DS789 map gender F marker recomb fract .18 .18 # true F map (cM): 20 20 map gender M marker recomb fract .08 .08 # true M map (cM): 10 10 limit recomb fracts .001 use sequential imputation for setup use 100 sequential imputation realizations for setup set burn-in iterations 100 sample by scan set L-sampler probability .8 set MC iterations 50 # The initial number of MCMC scans per step limit EM iterations 10 # The total number of MCEM steps |
As seen in previous examples, the ‘select all markers’ statement instructs the program to use all markers on the chromosome for computation. The alternative is to use only selected markers for computation, which can be achieved by using the ‘select markers’ statement (see Autozyg computing requests). The ‘set marker 1 2 3 allele freqs .2 .2 .2 .2’ statement specifies the marker allele frequencies for markers 1, 2, and 3. This statement, as constructed, requires markers 1, 2, and 3 to each have five alleles with frequencies of 0.2 for each allele. If the number of alleles per marker varies from marker to marker, or if the allele frequencies vary from marker to marker, a separate ‘set marker freqs’ statement is needed for each marker (see markerdrop population model parameters). The ‘set marker names’ statement overrides the default behavior, which labels markers consecutively: marker-1, marker-2, etc.
The two ‘map gender marker recomb fract’ statements specify the marker map in terms of recombination fractions. This is the initial starting estimate of the map.
The ‘limit recomb fracts 0.001’ statement is optional and places lower and upper bounds on the estimated recombination fractions of the map. For markers that are separated by little or no recombination, the MCEM algorithm may yield estimated recombination fractions of zero which could lead to a severe bias in the results. As a safeguard against such events, this statement places a lower bound 0.001 and an upper bound 0.5 - 0.001 on the estimated recombination fractions of the map.
The statement ‘use sequential imputation for setup’ instructs lm_map
to initialize the set of maternal and paternal meiosis indicators for all
members of the pedigree who are not founders; this is done prior to the Monte
Carlo simulation. The default behavior is specified in this statement, with the
alternative being to ‘use locus-by-locus sampling for setup’. The
statement ‘use 100 sequential imputation realizations for setup’ is
optional and modifies the default behavior for setup by sequential imputation
(which is 10% of the MC iterations). The next three lines in the parameter
files contain statements introduced in the Autozyg
examples of this
tutorial. For explanation of ‘set burn-in iterations’, ‘sample by
scan’, and ‘set L-sampler probability’ see
Autozyg MCMC parameters and options. The statement ‘set MC
iterations 50’ indicates how many MC iterations are to be performed at each
EM iteration. The statement ‘limit EM iterations’ was introduced in the
multivar
example and puts an upper bound on the number of MCEM
iterations.
Now we’ll take a look at the remaining statements in ‘map_G.par’:
output maps gender averaged specific set map estimation model with no mistyping set EM convergence .01 use MCEM and SA for maximization set SA curvature iterations 10 set SA ascent iterations 10 set SA gradient iterations 10 set SA convergence .001 |
The ‘output maps gender averaged specific’ statement specifies the type of
map to be estimated by lm_map
. In this example, the default behavior is
specified, which instructs lm_map
to automatically compute the likelihood
ratio test statistic for testing the null hypothesis of a sex-averaged map. The
statement ‘set map estimation model with no mistyping’ instructs
lm_map
to assume that the genotypes are observed without error. The
‘set EM convergence’ statement instructs lm_map
to stop the MCEM
algorithm if all recombination fraction updates are within 0.01 of their
previous values.
The statement ‘use MCEM and SA for maximization’ in
‘map_G.par’ instructs lm_map
to
attempt to refine its MCEM-based estimate of the MLE by performing additional SA
steps. The alternative is to ‘use MCEM only for maximization’, with no
further refining. There are several statements that allow additional control of
the SA algorithm. First, an estimate of the curvature of the likelihood is
needed to initiate the SA algorithm. The statement ‘set SA curvature
iterations 10’ instructs lm_map
to use at least 10 MCMC realizations to
estimate the curvature of the likelihood. Also, lm_map
will not initiate
the SA algorithm with a step that decreases likelihood. So, when the SA
algorithm is used for refining the likelihood estimate, the statement ‘set
SA ascent iterations 10’ instructs lm_map
to use at least 10 MCMC
realizations to determine whether a proposed first step increases the
likelihood. The SA algorithm also requires an estimate of the gradient of the
likelihood at each SA step. The statement ‘set SA gradient iterations 10’
instructs lm_map
to use at least 10 MCMC realizations to estimate the
gradient of the likelihood. Finally, the map estimate obtained from the final
step of the MCEM algorithm is used to seed the SA algorithm. The ‘set SA
convergence 0.001’ statement instructs lm_map
to terminate the SA
algorithm when the absolute change in successive map estimates is less than
0.001 for each recombination fraction in the map.
The file ‘map_P.par’ shows some different Monte Carlo and estimation options. Here are the remaining statements in that file:
output maps gender averaged set map estimation model with mistyping set genotyping error rate .02 use MCEM only for maximization |
In this parameter file, a gender averaged map is specified by using the
‘output maps gender averaged’ statement. Unlike in the previous parameter
file, ‘map_P.par’ does not assume the genotypes are recorded without error;
this is indicated by the statement ‘set map estimation model with
mistyping’. When ‘with mistyping’ is chosen, one has the option of
specifying an estimate of the error rate with the statement ‘set genotyping
error rate E’. In this example, the error rate is set at 0.02. Finally,
the statement ‘use MCEM only for maximization’ instructs lm_map
not
to use the SA algorithm to further refine the MCEM-based estimate of the MLE.
Since the SA algorithm will not be used, none of the ‘SA’ statements are
used in ‘map_P.par’.
See Concept Index for:
sample parameter file for lm_map
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_map
with genotypic dataRun the genotypic example in the ‘Map’ subdirectory of the ‘MORGAN-examples’ directory with the following command
./lm_map map_G.par |
The lm_map
program is one of the more computationally intensive
MORGAN programs. Even running this small example takes about 30
seconds to run (depending on the computer used, of course). Again,
different random seeds will result in different outputs with each run.
Here is the output from one of the runs: The maximum likelihood estimates (MLEs) of marker map recombination frequencies are given for each marker interval and for male and female meioses. Also given is the estimated variance-covariance matrix of the MLEs. Of course, the MLE will not be identical to the true parameter value, but the variance-covariance matrix gives an estimate of the precision. The ‘effective number of meioses’ is also a measure of this precision, giving the number of fully informative meioses required for the same precision of the MLEs.
MAXIMUM LIKELIHOOD ESTIMATES Interval Female (RF) Male (RF) -------- ----------- --------- 1 0.2231 0.0264 2 0.2745 0.0801 ESTIMATED VARIANCE OF SEX-SPECIFIC MAP [F1,M1,F2,M2,... x F1,M1,F2,M2,...] 0.004402 -0.000034 -0.000422 -0.000040 -0.000034 0.000621 0.000075 -0.000025 -0.000422 0.000075 0.006546 -0.000136 -0.000040 -0.000025 -0.000136 0.001482 EFFECTIVE NUMBER OF MEIOSES Interval Female Male -------- -------- ------- 1: 40 42 2: 31 50 |
See Concept Index for:
running lm_map
with genotypic data,
lm_map
sample output for genotypic data.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_map
with phenotypic data./lm_map map_P.par |
Running this example takes a noticeable amount of time. Given are the MLEs of the sex-averaged recombination frequency in each of the two marker intervals and of the mistyping (error) rate. Also given is the estimated variance-covariance matrix of these MLEs and the effective number of meioses (see the previous section).
MAXIMUM LIKELIHOOD ESTIMATES Interval Sex-Averaged (RF) -------- ---------------- 1 0.1510 2 0.1787 MISTYPING RATE: 1.479401% ESTIMATED VARIANCE OF (MAP,MISTYPING RATE) SEX-AVERAGED ------------------------------------------------------ 0.001432 -0.000482 0.000007 -0.000482 0.001517 -0.000016 0.000007 -0.000016 0.000072 |
Following this section, there is a table of the estimated error probability for each individual at each marker. From your output you should see that the program detects errors in individual 32 and individual 49 for marker-3 and in individual 90 for marker-1. Some other instances of data with low (non-error) probability also show non-zero estimated probability of error. The exact values of these probabilities will depend on the random seeds used in the run.
See Concept Index for:
running lm_map
with phenotypic data,
lm_map
sample output for phenotypic data.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_map
statementslimit recombination fractions L
This statement is optional and places lower and upper bounds on the estimated recombination fractions of the map. For markers that are separated by little or no recombination, the MCEM algorithm may yield estimated recombination fractions of zero which could lead to a severe bias in the results. As a safeguard against such events, this statement places a lower bound L and an upper bound 0.5 - L on the estimated recombination fractions of the map.
output maps gender [averaged] [specific]
This statement specifies the type of map to be estimated. The default behavior is to select both options, which instructs lm_map to automatically compute the likelihood ratio test statistic for testing the null hypothesis of a sex-averaged map.
use MCEM and SA for maximization
If the statement ‘use MCEM only for maximization’ is replaced by this
statement, lm_map
will attempt to refine its MCEM based estimate of the
MLE by performing additional SA steps.
set EM convergence X
The MCEM algorithm is used to find a suitable starting value for the SA algorithm. The MCEM algorithm terminates when the percent change in successive parameter estimates is less than X. The default value of X is 0.2: smaller values may substantially increase the total CPU time.
set genotyping error rate E
When the statement ‘set map estimation with mistyping’ is used, the genotype observations are assumed to have an associated error rate. This statement allows for the specification of the ‘mistyping’ rate.
set SA curvature iterations I
An estimate of the curvature of the likelihood is needed to initiate the SA
algorithm. This statement tells lm_map
to use at least I MCMC
realizations to estimate the curvature of the likelihood. The curvature is only
estimated once.
set SA ascent iterations I
lm_map
will not initiate the SA algorithm with a step that decreases the
likelihood. This statement tells lm_map
to use at least I MCMC
realizations to determine whether a proposed first step increases the
likelihood.
set SA gradient iterations I
If SA is initiated, this tells lm_map
to use at least I MCMC
realizations to estimate the gradient of the likelihood. An estimate of the
gradient is needed for each SA step.
set SA convergence R
The SA algorithm is terminated, if all recombination fraction updates are within R of their previous values. In addition, the maximum possible runtime for the SA algorithm is proportional to the total runtime of the MCEM algorithm.
set map estimation (with | with no) mistyping
This statement can be used to specify whether or not errors were made during the observation of genotype. If ‘with no’ is selected, the genotypes are assumed to have been observed without error. If ‘with’ is selected, the genotype observations are assumed to have some error associated with them, which can be specified using the ‘set genotyping error rate’ statement.
set LRT statistics iterations I
This statement tells lm_map
to use at least I MCMC realizations to
estimate the LRT statistics. If only one option is used in ‘output maps
gender ...’, then the estimated LRT statistic compares the MLE to the initial
map. Otherwise, two LRT statistics are estimated. The first compares the MLE of
the sex-averaged map to the initial sex-averaged map, while the second compares
the MLE of the sex-specific map to the MLE of sex-averaged map.
compute estimates I times
This statement tells lm_map
to conduct its entire analysis I times,
and to report the map with the highest likelihood. While this statement offers
some protection against convergence to local modes, the default value is 1.
See Concept Index for:
lm_map
statements.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.