[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
See Concept Index for: location lod scores estimates.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_linkage
, lm_bayes
, lm_twoqtl
, gl_lods
and base_trait_lods
The programs lm_linkage
, lm_bayes
, gl_lods
and lm_twoqtl
are
referred to as ‘Lodscore’ programs.
The program lm_linkage
replaces
the two programs lm_markers
and lm_multiple
of pre-2011
versions of MORGAN. As of 2013, the program lm_twoqtl
remains a beta test version, but the new program gl_lods
is now
fully implemented and documented.
The Lodscore programs use MCMC to perform multipoint linkage analysis and trait mapping on large pedigrees where many individuals may be unobserved and exact computation is infeasible. The data are the genotypes of observed individuals in the pedigree at marker loci and discrete or continuous trait data. As with exact methods of computing lod scores, the genetic model is assumed known. The only unknown parameter is the location of the trait locus. Therefore, the user is required to specify the marker locations, trait and marker allele frequencies and penetrance function. Presently, users are limited in their choice of penetrance function, but this is under revision and will change in future releases of MORGAN.
lm_linkage
is an implementation of the Lange-Sobel
estimator, using either the single- or multiple-meiosis LM-sampler:
See Single and multiple meiosis LM-samplers.
The
Lange-Sobel estimate works reasonably well in reasonable time, provided a good
MCMC sampler is used, and provided the trait data do not have strong impact on
the conditional distribution of meiosis indicators.
The lm_linkage
program samples only the meiosis indicators at
marker loci, and only conditional on the marker data.
Even when the trait inheritance information is strong,
the method can produce quite accurate lod scores in the absence of linkage,
but it can be inaccurate in estimating the strength of linkage signals.
As well as
producing the lod score, our current
implementation provides a batch-means pointwise
estimate of the Monte Carlo standard error of the lod-score estimate.
lm_linkage
can work with genotypic, discrete or quantitative traits.
lm_linkage
combines the earlier programs lm_markers
and
lm_multiple
.
The original lm_multiple
program and multiple-meiosis
sampler are the work
of Liping Tong [TT08].
As well as allowing use of either the single- or multiple-meiosis
LM-sampler, the lm_linkage
program
optionally perform exact lodscore computations on small pedigree components,
and
includes better exact computation and pedigree peeling options for use
in the lod score estimator (see Exact HMM computations).
lm_bayes
is an alternative method implemented for genotypic or discrete
traits. The MCMC performance is better than for the old
lm_markers
program, but it has
other computational overheads. lm_bayes
samples trait locations from a
posterior distribution, and then divides it by the prior to produce the
likelihood and hence the lod score. Estimation is in two phases. A preliminary
run with discrete uniform prior gives order-of-magnitude relative likelihoods.
Then, using the inverse of these likelihoods as prior weights of a
‘pseudo-prior’ distribution. Using this ‘pseudo-prior’
a second run is made to estimate the likelihood.
The purpose of the ‘pseudo-prior’ is to produce an
approximately uniform posterior, so that likelihoods will be well estimated
at all test positions.
It is important that the initial run is long enough for all test positions
to be sampled, and for the unlinked trait position to have a reasonable number
of realizations. For locations at which lod scores are very negative, or for the
unlinked position when there is some linked
location with strong positive lod score,
this can be problematic.
Our current implementation of lm_bayes
provides two lod score estimates.
The first is a crude estimate which counts realizations of locations sampled to
estimate the posterior: as can be seen from the output this can be quite
erratic. The Rao-Blackwellized estimator is much preferred, and produces good
estimates in reasonable time. The lm_bayes
program is the work of
Andrew George [GT03,GWT05].
The beta-test program lm_twoqtl
does parametric linkage analysis for a quantitative trait model having
one or two linked QTL and a polygenic component. Each QTL has two alleles
with three
different genotypic means.
The Normally distributed polygenic component does not include dominance,
and the environmental contribution is has a Normal distribution
with mean zero and uncorrelated among individuals.
The program output consists of
MCMC-based lod score estimates
of the joint locations of the one or two contributing QTL.
As of 2011, the program uses the same MCMC options as lm_linkage
for sampling descent at marker loci conditional on marker data. Conditionally
on these realizations the program then uses exact computation (on very small
pedigrees) or an additional level of Monte Carlo to estimate the relevant
lod score contributions. The original versions of the lm_twoqtl
were the work on YunJu Sung [STW07,SW07].
The program gl_lods
computes lod score contributions for
a discrete or
continuous trait given a set of ibd_graphs across the chromosome, produced
by gl_auto
:
See Introduction to lm_auto gl_auto and lm_pval.
If the gl_auto
run uses the
‘set MCMC markers only’
option, then the overall lod score computed by gl_lods
is identical to
that produces by lm_linkage
when the same MCMC options are used in the
in gl_auto
and in lm_linkage
.
gl_lods
many of
the same trait definition and mapping request
parameter statements as lm_linkage
(Location lod scores statements). However its input consists
of ibd_graphs and an individuals file; there are no marker data
or pedigree data.
For further information on the motivation for splitting of the
lm_linkage
lod score
computation into the generation of marker-based ibd graphs (using
gl_auto
) followed by trait-likelihood computation on the ibd graphs:
See Parameter files for the gl_lods program. See also [Tho11].
Because the gl_lods
program
computes log-likelihood contributions for given
IBD, and does not use a pedigree, there is no way to compute the usual
normalizing factor of the pedigree-based probability of observed trait data.
This must be supplied to the program, and may be computed via the
base_trait_lods
program, provided a pedigree and the same trait model
and data are used. An alternative approach using permutation of trait
data conditional on the locus-specific IBD was developed by Chris Glazner
[GT15] and has been implemented as an option in gl_lods
.
For quantitative data using variance
component models, we are investigating the used of the Kullback Leibler
information (expected lod score) conditional on the locus-specific IBD as
a locus-specific normalizing factor.
See References, for details of the cited papers.
See Concept Index for:
Markov chain Monte Carlo,
lm_linkage
introduction,
lm_bayes
introduction,
lm_twoqtl
introduction,
meiosis indicators,
multiple meiosis sampler.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_linkage
and lm_bayes
There are three example parameter files in the ‘Lodscores’ subdirectory:
‘ped73_ge.par’, ‘ped73_ph.par’ and ‘ped73_qu.par’. These files
are examples of how to analyze genotypic, discrete (phenotypic), and
quantitative (continuous) traits,
respectively. Each of these files is written for use with lm_linkage
since this
is our preferred program and can analyze genotypic, discrete, and
quantitative traits. The program lm_bayes
will run with the same
parameter files ‘ped73_ge.par’ and ‘ped73_ph.par’, but will adopt
defaults for several statements specific to this program and will generate
warning for others not implemented for lm_bayes
. If lm_bayes
is run using ‘ped73_qu.par’, all statements regarding quantitative
traits will be ignored, and the program will use default genotypic
data.
The marker and MCMC information is very similar for all three parameter files. For ‘ped73_qu.par’ it is as follows:
set printlevel 5 input pedigree file '../ped73.ped' input marker data file '../ped73.marker.missing' input seed file '../sampler.seed' output overwrite seed file '../sampler.seed' set trait 1 data quantitative input pedigree record trait 1 real 1 select all markers select trait 1 set trait 1 tloc 1 map test tloc 1 all interval proportions 0.3 0.7 map test tloc 1 external recomb fracts 0.05 0.15 0.3 0.4 0.45 sample by scan set L-sampler probability 0.2 set burn-in iterations 150 set MC iterations 3000 check progress MC iterations 1000 set global MCMC use single meiosis sampler |
The pedigree file specified by the ‘input pedigree file’ statement can contain multiple traits. As discussed in previous sections, the marker map, allele frequencies and genotypes can be contained in the parameter file or in a separate file specified by the ‘input marker data file’ statement as in the example above.
As in other programs, the trait data are included in the pedigree file. The ‘select trait’ statement tells the program which trait in this file is to be analyzed, and the ‘input pedigree record trait’ indicates where the data are to be found, while the ‘set trait ...tloc...’ statement connects the trait with a specific tloc for this analysis.
The two ‘map test tloc’ statements give trait locus test positions at which the lod scores should be calculated. When the trait locus is located between two markers, the position is specified in terms of the proportional genetic distance between the two markers (this option makes handling gender-specific maps easy). In this example, the test trait positions are specified to be at 30 and 70 percent of the interval. The second ‘map test tloc’ statement allows test trait locus positions located before the first marker or after the last marker to be specified; the positions are specified explicitly in terms of recombination fractions (or genetic distances) with the nearest marker locus. Note that an external recombination fraction of 0.5 is not necessary since the likelihood of an unlinked trait locus is always used as a reference when computing the lod scores.
The final seven statements give MCMC specifications. The ‘sample by scan’ statement instructs the program to update all the meiosis indicators, S, at each iteration, in an order determined by random permutation. The alternative ‘sample by step’ updates only one locus (L-sampler) or only one meiosis (M-sampler) in each iteration. The ‘sample by scan’ statement is the default and strongly recommended. The L-sampler probability is set at 20 percent, which is often a good choice. For a detailed discussion of effects of varying L- to M-sampler ratio, see section 10.6 in [Tho00].
In the ‘set burn-in iterations’ statement, 150 burn-in iterations, are requested. The next statement requests 3000 MCMC iterations; for each realized set of marker-location inheritance vectors the trait-likelihood contribution will be computed at each test position of the trait locus. This is for demonstration purposes only. For real data analyses, use longer runs, on the order of 10^5 MCMC iterations. The last statement in this group tells the program to report progress every 1000 iterations.
Although the lm_linkage
program
can use the multiple-meiosis sampler, and this is recommended, the final
two statements here specify ‘set global MCMC’ and
‘use single meiosis sampler’.
Thus, for this example, the
single-meiosis sampler will be used (as in the old lm_markers
program)
and MCMC will be performed globally over all pedigree components,
rather than component-by-component. This provides an example of how
these options may be used for compatibility with older examples.
As an example of MORGAN parameter statement flexibility we give also a part of a version of the above parameter file in which lines are broken, words mis-spelled, and lower/upper case used arbitrarily::
# This version of ped73_qu.par is designed to show the flexibility # of MORGAN parameter statements set prinlev 5 inPU Pedi file '../ped73.ped' input mark data FILE '../ped73.marker.missing' input seed file '../sampler.seed' outp over seed file '../sampler.seed' set trait data 1 quant Input pedigree Recod trait 1 real 1 sele traix 1 set trait 1 tloc 1 map test tloc 1 all intevl props 0.3 0.7 map test tloc 1 exte reco frac 0.05 0.15 0.3 0.4 0.45 seLect all markers samp BY scan set L-sam prob 0.2 set 3000 mc ITER set burn-in iter 150 check progss 1000 MC iters set glob mcmc USE singe meio samp |
Note that statements may be split over lines, that only the first four characters of each word is required, that upper and lower case are not distinguished, and that in some statements (for example, the ‘check progress’ and ‘set MC iterations’ statements) even the location of the integer count within the statement is flexible.
For more details of the MCMC specifications see MCMC parameter statements.
Specifying Trait Data Type
Trait data type is set by using the ‘set trait data’ statement. Recall that the ‘input pedigree record trait’ statement must be used to specify which column in the file is to be used as the trait value (see Pedigree file description statements). The three trait data types discussed in this example are implemented by including the following statements in the parameter file discussed above. Note the trait and numbers are arbitrary, but the connection must be made consistently through the file.
‘ped73_ge.par’ specifies a genotypic trait with the following statements:
set trait 3 data genotypic input pedigree record trait 3 integer 3 select trait 3 set trait 3 tloc 1 set tloc 1 allele freqs 0.4 0.6 |
‘ped73_ph.par’ specifies a phenotypic trait with the following statements:
set trait 2 data discrete input pedigree record trait 2 integer 4 select trait 2 set traits 2 tlocs 1 set traits 2 for tlocs 1 incomplete penetrance 0.05 0.6 0.95 set tlocs 1 allele freqs 0.4 0.6 |
Recall that for discrete data, one must specify the penetrances (see Autozyg computational parameters).
‘ped73_qu.par’ specifies a quantitative trait with the following statements:
set trait 1 data quantitative input pedigree record trait 1 real 1 select trait 1 set trait 1 tloc 1 set trait 1 for tlocs 1 genotype mean 90.0 100.0 110.0 set trait 1 residual variance 25.0 set tloc 1 allele freqs 0.4 0.6 |
When using a quantitative trait, genotypic means and residual variance must be specified. Additive variance can be specified with the statement ‘set trait ... additive variance’. The default value is zero.
The ‘set tloc ... allele freqs’ statement specifies allele frequencies at the trait locus. If the allele frequencies sum to less than 1, a warning message will be issued:
Sum of allele frequencies is not in range .9999, 1.0001 (W) |
If the allele frequencies sum to above 1.0001, the program quits and generates an error message.
Below is a summary of the trait data types accepted for each program:
Genotypic ped73_ge.par | Phenotypic ped73_ph.par | Quantitative ped73_qu.par | |
lm_linkage | Yes | Yes | Yes |
lm_bayes | Yes | Yes | No |
Additional penetrance options are available for liability classes and age-of-onset data.
An example is given in the file ‘xact_ph_liab.par’ in the ‘Lodscore’ Gold standards. The statement gives the liability class specific penetrance matrix. The penetrances are for the 11, 12, 22 genotypes. Morgan will set the penetrances for genotype 21 equal to that for 12.
set trait data 1 discrete with liability input pedigree record trait 1 integer pairs 8 11 set 3 liability classes with penetrances 0.025 0.325 0.325 0.150 0.625 0.625 0.350 0.950 0.950 |
An example is given in the file ‘xact_ph_age.par’ in the ‘Lodscore’ Gold standards:
input pedigree record trait 1 integer pairs 8 10 # For using ages of onset. set trait 1 data discrete with covariate set trait 1 for tloc 1 genotypic means 90.0 70.0 45.5 set standard deviations for genotypes 11.0 15.5 20.5 set width of ages of onset window 2.0 |
See Concept Index for:
sample parameter file for lm_linkage
,
sample parameter file for lm_bayes
,
gender–specific maps,
meiosis indicators,
L-sampler,
M-sampler,
multiple meiosis sampler,
genotypic trait specification,
phenotypic trait specification,
discrete trait specification,
quantitative trait specification,
continuous trait specification,
liability class penetrances,
age-of-onset penetrances.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_linkage
examples and sample outputlm_linkage
can be run with all three parameter files in the
‘Lodscores/’ subdirectory. As usual, the syntax for running the
program is:
./lm_linkage <parameter file> |
This section describes the output obtained by using the parameter file ‘ped73_qu.par’. To run the example, type:
./lm_linkage ped73_qu.par |
The interesting part of the output is the LodScore estimates. For each test position, we have the estimated lod score and the estimated Monte Carlo standard error.
LodScore estimates by Rao-Blackwellized computation: Trait pos # position (Haldane cM) or marker male female LodScore StdErr 1 -115.129 -115.129 0.0303 0.0005 2 -80.472 -80.472 0.0558 0.0012 3 -45.815 -45.815 0.0779 0.0031 4 -17.834 -17.834 -0.0306 0.0080 5 -5.268 -5.268 -0.2811 0.0142 marker-1 0.000 0.000 -0.4986 0.0195 6 3.000 3.000 -0.4469 0.0141 7 7.000 7.000 -0.4342 0.0230 marker-2 10.000 10.000 -0.4605 0.0363 8 13.000 13.000 -0.4254 0.0247 9 17.000 17.000 -0.4454 0.0209 marker-3 20.000 20.000 -0.5301 0.0197 10 23.000 23.000 -0.3174 0.0211 11 27.000 27.000 -0.1176 0.0233 marker-4 30.000 30.000 -0.0052 0.0259 12 33.000 33.000 0.5058 0.0208 13 37.000 37.000 0.8794 0.0159 marker-5 40.000 40.000 1.0772 0.0138 14 43.000 43.000 0.9832 0.0156 15 47.000 47.000 0.8432 0.0213 marker-6 50.000 50.000 0.7210 0.0252 16 53.000 53.000 0.6558 0.0256 17 57.000 57.000 0.5140 0.0271 marker-7 60.000 60.000 0.3522 0.0288 18 63.000 63.000 0.0113 0.0225 19 67.000 67.000 -0.5473 0.0123 marker-8 70.000 70.000 -0.9543 0.0095 20 73.000 73.000 -0.4578 0.0212 21 77.000 77.000 -0.1866 0.0178 marker-9 80.000 80.000 -0.1135 0.0116 22 83.000 83.000 0.0888 0.0091 23 87.000 87.000 0.3132 0.0064 marker-10 90.000 90.000 0.4544 0.0071 24 95.268 95.268 0.6010 0.0046 25 107.834 107.834 0.6423 0.0028 26 135.815 135.815 0.4017 0.0011 27 170.472 170.472 0.1762 0.0003 28 205.129 205.129 0.0758 0.0001 |
For more information regarding the MCMC parameters and diagnostic output, See MCMC computational options.
See Concept Index for:
running lm_linkage
examples,
lm_linkage
sample output.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_bayes
examples and sample outputUnder the subdirectory ‘Lodscores/’, run the lm_bayes
example on
the discrete (phenotypic) trait data by typing:
./lm_bayes ped73_ph.par |
The results from lm_bayes
are the lod scores toward the end of the
output. Two estimates of the lod scores are provided: (1) count
realizations of locations sampled to estimate the posterior probability
(‘crude’)
and (2) Rao-Blackwellized estimator (‘R-B’).
Both are provided for comparison,
but the latter should be more accurate.
LodScore estimates: Trait pos # position (Haldane cM) pseudo freq LodScore or marker male female prior visited crude R-B 0 unlinked unlinked 0.025023 94 NA NA 1 -115.129 -115.129 0.025276 66 -0.1580 -0.0046 2 -80.472 -80.472 0.025727 77 -0.0987 -0.0125 3 -45.815 -45.815 0.027843 96 -0.0372 -0.0473 4 -17.834 -17.834 0.037973 71 -0.3030 -0.1825 5 -5.268 -5.268 0.057289 96 -0.3506 -0.3583 marker-1 0.000 0.000 NA NA NA NA 6 3.000 3.000 0.078826 89 -0.5221 -0.4919 7 7.000 7.000 0.086379 88 -0.5667 -0.5255 marker-2 10.000 10.000 NA NA NA NA 8 13.000 13.000 0.092502 87 -0.6014 -0.5456 9 17.000 17.000 0.090858 94 -0.5600 -0.5386 marker-3 20.000 20.000 NA NA NA NA 10 23.000 23.000 0.063483 109 -0.3400 -0.3738 11 27.000 27.000 0.044111 103 -0.2065 -0.2086 marker-4 30.000 30.000 NA NA NA NA 12 33.000 33.000 0.026053 114 0.0663 0.0203 13 37.000 37.000 0.018403 103 0.1731 0.1698 marker-5 40.000 40.000 NA NA NA NA 14 43.000 43.000 0.011818 100 0.3527 0.3585 15 47.000 47.000 0.009347 90 0.4088 0.4600 marker-6 50.000 50.000 NA NA NA NA 16 53.000 53.000 0.010351 121 0.4930 0.4236 17 57.000 57.000 0.014614 121 0.3432 0.2804 marker-7 60.000 60.000 NA NA NA NA 18 63.000 63.000 0.023348 96 0.0392 0.0769 19 67.000 67.000 0.030506 123 0.0307 -0.0412 marker-8 70.000 70.000 NA NA NA NA 20 73.000 73.000 0.033357 136 0.0356 -0.0903 21 77.000 77.000 0.030400 124 0.0358 -0.0514 marker-9 80.000 80.000 NA NA NA NA 22 83.000 83.000 0.024811 96 0.0128 0.0282 23 87.000 87.000 0.019535 144 0.2928 0.1160 marker-10 90.000 90.000 NA NA NA NA 24 95.268 95.268 0.013755 110 0.3281 0.2561 25 107.834 107.834 0.013714 125 0.3849 0.2600 26 135.815 135.815 0.018372 132 0.2816 0.1339 27 170.472 170.472 0.022361 108 0.1091 0.0489 28 205.129 205.129 0.023966 87 -0.0149 0.0188 |
Note that lm_bayes
does not provide lod scores at the marker locations.
See Concept Index for:
running lm_bayes
examples,
lm_bayes
sample output,
Rao-Blackwellized estimates.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_twoqtl
examples and sample outputThe program lm_twoqtl
remains beta test, so that instead of examples
in ‘MORGAN_Examples’ we describe the gold standard example in
the main MORGAN source directory in subdirectory
‘Lodscore/Gold’. However, much work has
been done on lm_twoqtl
, so that its marker-based MCMC is now as for
lm_linkage
.
Four gold-standard
examples of the lm_twoqtl
parameter files and output may be
found in the ‘Lodscore/Gold’ directory of
MORGAN.
Examples 1 and 3 are for a single trait locus, and 2 and 4 use two QTL.
Examples 1 and 2 use exact computation for the trait lod score contributions
on these very small examples. Examples 3 and 4 use Monte Carlo.
We will use example 4 in this tutorial description, since this is the most
general and novel.
To create other examples,
copy one of these files and replace the parameters in the
file with those that you want to specify.
The various trait-model options for lm_twoqtl
are summarized
in the following table:
Additive Genetic Variance: | Zero | Positive | ||
Number of QTL: 1 | one locus | one locus plus polygene | ||
2 | two loci | two loci plus polygene |
Trait models can be any of the above four entries. However, for a one-locus
trait model with no polygenic component, the program lm_linkage
will provide more accurate results more quickly.
The lod score is estimated on a one-dimensional grid of points for one QTL, and a two-dimensional grid of points for two QTL. In the future the new parameter statement
map [chromosome I] test tlocs L1 L2 jointly at markers J11 J12 ... |
will allow two-locus lod score programs that provide lod scores at arbitrary pairs of marker positions.
The content of file ‘twoqtl4.par’ (reordered slightly for clarity) is:
use single meiosis sampler # Select the MCMC sampler to be used. set printlevel 5 # Include everything in the output file. set sampler seeds 0x53f78285 0xdfbca001 set trait seeds 0x53f78285 0xdfbca001 input marker data file `./twoqtl.markers' input pedigree file `./twoqtl.ped' output extra file `./twoqtl_batch4' select all markers select trait 1 set trait 1 multiple tlocs 1 2 set tloc 1 allele freqs 0.1 0.9 set tloc 2 allele freqs 0.3 0.7 # requests for grid of tloc positions for lod scores map test tloc 1 all interval proportions 0.5 map test tloc 1 external recomb fracts 0.3 map test tloc 2 all interval proportions 0.5 map test tloc 2 no default external positions # standard MCMC requests use sequential imputation for setup use 100 sequential imputation realizations for setup sample by scan set L-sampler probability 0.2 set burn-in iterations 10 set MC iterations 60 compute scores every 10 iterations # lodscore scoring requests set 3 batches MC variance estimation check progress 20 MC iterations # For clarity, the wording `MC' has been removed from the following three parameter statements: # MC in parameter statements now refers only to MCMC. use Monte Carlo summation for trait simulate 5 IBD realizations for trait use multiplier 1 realization for null # quantitative trait model specification set trait 1 data quantitative set trait 1 for tloc 1 genotype mean -2.0 0.0 2.0 set trait 1 for tloc 2 genotype mean -3.0 0.0 3.0 set trait 1 residual variance 1.0 set trait 1 additive variance 1.0 |
Note the number of MCMC scans (60) is very small, as also are the number
of Monte Carlo realizations to be used in evaluating the trait likelihood
contributions (5). Additionally, only every 10th MCMC scan is used
for computing lod score contributions. This is reasonable, in that for
lm_twoqtl
lod score computation is computationally intensive, so
that the standard procedure of scoring every scan is not efficient.
However, with only 60 total scans, this means lod scores are based on only
6 realizations of inheritance conditional on the marker data.
The example is for illustrative purposes only; in
real examples much more Monte Carlo would be required both in the
marker-based MCMC and for estimating trait contributions to
each score.
Most statements are as for earlier lod-score programs and can be found in the Statement Index. The statements included in this example that require additional comment are
use single meiosis sampler
The old single-meiosis sampler is
specified for consistency with earlier results; in practice the
multiple meiosis sampler
is preferred.
set trait 1 multiple tlocs 1 2
This statement specifies that trait 1 is contributed to by both tloc 1 and tloc 2.
set trait 1 for tloc 1 genotype mean -2.0 0.0 2.0
set trait 1 for tloc 2 genotype mean -3.0 0.0 3.0
The genotypic means for tlocs 1 and are set separately, which will imply their additive contribution to trait 1. If the tlocs are not to contribute additively, the user should instead use the statement
set trait 1 for tlocs 1 2 joint genotype means ...
followed by 9 genotype means for the two tloc genotype combinations.
output extra file ‘./twoqtl_batch4’
If an
‘extra file’ is specified, it
is used by lm_twoqtl
for output of batched
means used in variance estimation. Most users will not require this file,
although it can be used in MCMC diagnostics.
set 3 batches MC variance estimation
In this minimal example, the 6 scored realizations are divided into 3 batches, each of size 2. Again, real examples would use much larger number of realizations, and likely the default number of batches (20).
use Monte Carlo summation for trait
simulate 5 IBD realizations for trait
In this example, Monte Carlo summation is to be used for evaluating each trait-locus likelihood contributions conditional on marker-based realizations of inheritance, and for each such realizations there will be 5 realizations of trait allele descent.
use multiplier 1 realization for null
In this example the same number of realizations will be used to evaluate the marginal probability of trait data as are used for each lodscore grid point. In real examples, it may be advisable to increase this ratio, to obtain an accurate base level for the lodscore estimate.
The default procedure of estimation of lod scores on each of the two components separately is used. These are then summed, giving the final concluding output for this example:
# Lod score estimates and MC sd for entire pedigree: # Index TLoc1 TLoc2 LodScore StdErr 1 -45.815 -45.815 1.4058 0.7605 2 -45.815 0.000 0.3872 0.6212 3 -45.815 10.000 0.4379 0.6232 4 -45.815 20.000 0.6616 0.6856 5 -45.815 65.815 0.4661 0.5916 6 0.000 -45.815 0.3503 0.6529 7 0.000 0.000 0.1343 0.6034 8 0.000 10.000 -0.0129 0.5600 9 0.000 20.000 1.0364 0.5772 10 0.000 65.815 1.1077 0.7724 11 10.000 -45.815 0.0983 0.6902 12 10.000 0.000 0.0461 0.5605 13 10.000 10.000 0.8585 0.6088 14 10.000 20.000 1.2174 0.5866 15 10.000 65.815 0.3512 0.7539 16 20.000 -45.815 1.5180 0.6479 17 20.000 0.000 0.7387 0.6567 18 20.000 10.000 0.9330 0.7020 19 20.000 20.000 1.2473 0.6162 20 20.000 65.815 1.3465 0.7664 21 65.815 -45.815 -0.3066 0.7764 22 65.815 0.000 -0.1259 0.6895 23 65.815 10.000 0.5714 0.7124 24 65.815 20.000 1.3537 0.6550 25 65.815 65.815 0.5343 0.7140 |
These results consist of base-10 lodscore estimates with MCMC standard deviations, estimated at the requested grid of test positions.
See Concept Index for:
running lm_twoqtl
examples,
lm_twoqtl
sample output,
map test tlocs jointly at markers.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
gl_lods
programSee References for details of the cited papers.
The program gl_lods
computes lod score contributions for
a discrete or
continuous trait given a set of ibd_graphs across the chromosome, produced
by gl_auto
:
See Introduction to lm_auto gl_auto and lm_pval.
If the gl_auto
run uses the
‘set MCMC markers only’
option, then the overall lod score computed by gl_lods
is identical to
that produced by lm_linkage
when the same MCMC options are used in the
in gl_auto
and in lm_linkage
.
gl_lods
many of
the same trait definition and mapping request
parameter statements as lm_linkage
(Location lod scores statements). However its input consists
of ibd_graphs and an individuals file; there are no marker data
or pedigree data.
The goal of using gl_auto
and gl_lods
is to separate the
lod score computation from the marker-based MCMC that produces realizations
of the inheritance vectors at loci across the chromosome. The input to
gl_lods
consists of these realizations,
in the format of gl_auto
output
compressed ibd_graphs, a specification of a trait model, and a list
of individuals with their trait data.
gl_lods
computes lod score contributions
for a specified trait model and data on specified individuals, directly
using the ibd graphs on these individuals. See also [Tho11].
From MORGAN 3.2.
the gl_lods
program is no longer beta-test; parameter statements,
capabilities, and examples have been significantly updated.
The separation of lod-score computation and marker-based MCMC has several advantages:
Examples for gl_lods
are still under development, but we describe
here two examples based on the gl_lods
gold standards in the ‘Lodscore/Gold’ subdirectory of
MORGAN. In the case of the first ped47_...
example
the files are only in that directory and can be run there as
‘make gold.6’. The other example simCmps_fgl_...
is both
included as a gold standard, and has also been added to the
‘MORGAN_Examples’ files.
ped47
example is a single pedigree component.
The following shows the
parameter file ‘ped47_gl_lods.par’.
The trait model specifications for the two cases of a discrete and
a continuous trait are given in supplementary parameter files below.
# ped47 main parameter file for gl_lods program. set printlevel 5 # Include everything in the output file. input individuals file "./ped47.ind" input extra file "./ped47_fgl.oscor" output overwrite extra file "./ped47_fgl.reduce" # Use the overwrite option to avoid appending multiple outputs. # The number of ibdgraphs provides the number of replicates in gl_auto file. set maximum 1000 ibdgraphs per component # This map test tloc statement indicates at which markers lodscores are to be # calculated. map test tloc 11 at markers 2 4 5 8 10 select trait 1 set trait 1 tloc 11 set tloc 11 allele freqs 0.5 0.5 |
Note that the program uses an individuals file, not a pedigree file. The format is similar, but mother and father identifiers are replaced by a component number (in this case ‘1’ for all individuals). See Creating an individuals file. Note that the component number is referred to as a "name", so that the columns of integers and reals in the file correspond to those in an analogous pedigree file. The individuals file ‘ped47.ind’ is as follows:
# Lines or parts of lines preceded by a hash sign, such as this one, # are treated as comments by the software. # This individuals file contains a subset of the 47 individuals in the gl_auto # output file; the subset must include all those whose trait data are # to be analyzed -- it may include additional individuals. input pedigree size 42 # pedigree size must be the number of individuals in the fgl-graphs of the # gl_auto output file. # The number read here may be only a subset; if so make sure file ends # with the last individual. input individuals record names 2 integers 7 reals 1 # The first item is individual's "name" which is a character string. # The second of these is individual's component. # # There follow 7 integers # The first of these (third item) is dummy gender (0) # The next is an "observed" indicator used only by genedrop. # The next is a trait genotype: # 0 is unobserved. 1 is genotype (1,1); 3 is (1,2), 4 is (2,2) # The next is a trait phenotype: # 0 is unobserved. 1 is unaffected, 2 is affected # The next 2 code trait-locus inheritance patterns, used by 1 version of # markerdrop. # The final is a dummy trait indicating data for reduced fgl file # (but it is not required/used by the gl_lods program) # # Finally there is one real (read as double); this is a quantitative trait. # Numbers with integer part 999 code for unobserved. # # In fact, many individuals with no data are dropped; # This individuals file has only 35 individuals. # Since 35<42 (the requested pedigree size), all will be read). *************************************************** 302 1 1 1 3 2 1 1 1 105.945 306 1 2 1 3 1 1 1 1 99.822 307 1 0 1 4 2 1 1 1 111.696 308 1 0 0 0 0 1 1 0 999.5 <lines omitted here> 5080 1 0 0 0 0 -1 -1 0 999.5 601 1 0 1 4 2 1 1 1 112.285 5160 1 0 1 3 1 -1 -1 0 97.043 5150 1 0 1 3 1 -1 -1 1 97.043 602 1 0 1 4 2 0 1 1 105.991 |
Note that the input/output extra file is used to process the file of IBD
graphs that were produced (normally by gl_auto
) in the analysis
of marker data. In this example the file is a
gl_auto
output scores file,
input here as an extra
file:
‘ped47_fgl.oscor’.
This file contains 100 ibd graphs on 47 individuals:
101, 102, 201, 202, 2010, 301, 302, 304, 2020, 305, 306, 307, 308, 3010, 404, 3040, 405, 406, 407, 408, 3050, 410, 411, 412, 3080, 414, 415, 416, 4040, 505, 506, 507, 508, 4050, 509, 510, 511, 512, 4080, 513, 514, 515, 516, 5080, 601, 5150, 602. |
One reason for including this example is to show how the program deals with discrepancies between the list of individuals in the individuals file and those in ibd graphs file. Comparing with the 35-member individuals file:
There will remain 27 individuals who will be in the reduced ibd graph file
created by gl_lods
.
Finally we require a trait-model specification; lod scores are computed
under this model. Note that the input pedigree record
statement
is used for the individuals files just as it would be for a pedigree
file. The example file ‘ped47_D.par’ provides
a provides the model for a discrete trait:
output overwrite scores file "./ped47_gl_D.scores" set trait 1 data discrete input pedigree record trait 1 integer 4 set trait 1 for tloc 11 incomplete penetrances 0.1 0.6 0.9 |
The file also specifies the output scores file ’‘./ped47_gl_D.scores’’: the log-likelihood contributions from each IBD graph will be saved to this file.
Alternatively, the example file ‘ped46_Q.par’ provides the models for a quantitative trait:
output overwrite scores file "./ped47_gl_Q.scores" set trait 1 data quantitative input pedigree record trait 1 real 1 set trait 1 for tloc 11 genotype means 90.0 100.0 110.0 set trait 1 residual variance 25.0 set number of permutations 5 |
This example differs from the previous one, in that a permutation-based scheme is used to provide a locus-specific base lod score conditional on the IBD at that locus: see [GT15].
gl_lods
will also run on data sets containing multiple
components. Provided the collection of ibd graphs are generated
by component, the gl_lods
program will compute a lod score
for each component, as does lm_linkage
. This example is based on a
pedigree file and data originally used as an lm_linkage
gold standard.
It consists of three almost identical pedigree components with very
similar data. The date in ‘SimCmps.ind’ used as a gold standard in the
MORGAN ‘Lodscore/Gold’ directory differs slightly from that
in the same filename under ‘MORGAN_Examples/Lodscores’, and so also
therefore do the output lod scores, but functionally the two are identical.
The parameter file for
gl_lods
is as follows:
set printlevel 5 input individuals file "./simCmps.ind" # files for processing IBD graphs input extra file "./simCmps_fgl.oscor" output overwrite extra file "./simCmps_fgl.reduce" # File for outputting the log-likelihood contributions output overwrite scores file "./simCmps_gl.scores" # The number of ibdgraphs provides the number of replicates in the gl_auto # output scores file. set maximum 100 ibdgraphs per component # This map test tloc statement indicates at which markers lodscores are to be # calculated. map test tloc 11 at markers 1 3 5 6 9 10 select trait 1 set trait 1 data quantitative input pedigree record trait 1 reals 1 set trait 1 tloc 11 set tloc 11 allele freqs 0.3 0.7 set trait 1 for tloc 11 genotype means 485.0 500.0 515.0 set trait 1 residual variance 10.0 set trait 1 additive variance 10.0 # Provide externally computed base log likelihoods for each component. set base log likelihoods -24.56078 -26.30071 -24.85028 |
The parameter statements are mostly standard specifications familiar
from lm_linkage
. There are two main differences:
(1) A file of ibd graphs and an individuals file are specified rather
than pedigree file and marker data, and (2) in order for gl_lods
to
convert its log-likelihood contributions to an estimated lod score a
"base log likelihood" must be provided for each component. This should
be the probability of the trait data under the model; this probability
forms the denominator of the lod score. It may be computed using the
base_trait_lods
program: See Running the base_trait_lods program.
Similarly to the previous example, the simCmps.ind
file contains
the (potential) trait
data on 156 individuals (3 sets of 52);
in fact only a total of 55 are observed for
the quantitative trait.
The file simCmp_fgl.oscor
contains
100 ibd graphs on the 159 individuals of the original lm_linkage
simCmps example; this includes the 156 individuals of the current example.
Note that the ordering of the
IBD graphs is by component– first 100 graphs for the first component
simL_..
, then 100 for simJ_..
, and finally 100 for
simK_..
.
See Concept Index for: ibd graph, base log likelihood, individuals file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
gl_lods
examples and sample outputped47
example, which exists only
in the gold standards files. The gl_lods
gold standards may
be run as make gold.6
in ‘MORGAN/Lodscore/Gold’ or the
output ped47_gl_lods_[D/Q].gold
files may be consulted.
We describe here the output file ‘ped47_gl_lods_D.gold’.
The program first summarizes the input information in the usual way:
Trait data are discrete Discrete data at trait locus are to be read from integer 4 in each individual's record Incomplete penetrances for the trait are: genotype code penetrance -------- ---- ---------- (1 1) 0 0.100 (1 2), (2 1) 1 0.600 (2 2) 2 0.900 Number of IBD graphs requested is 1000 Number of batches for MC variance estimation is 20 === Checking of parameter statements completed === |
Note that the batches for MC variance is the default as no value
was provided in the parameter file: this statement is not used
by gl_lods
.
Next the information from the individuals file is checked, analogous to pedigree file checking. Since there is no pedigree, only any input genders will be scored. Also note that we specified (up to) 42 individuals, but there are only 35 in the file.
Opened input individuals file "./ped47.ind" 35 individuals read 42 individuals expected (W) Component information: size females males unsexed first 35 1 1 33 302 === Checking of individuals file completed === |
Then follows the usual MORGAN summary of the phenotypes of individuals:
Opened input individuals file "./ped47.ind" Pedigree records contain: names, 7 integers, 1 real number Trait data: Component 1: phenotype 2: 302 307 406 407 408 411 414 416 505 507 508 511 512 513 514 516 601 602 phenotype 1: 306 404 410 412 415 506 509 510 5160 5150 |
Finally it reports on the processing of the ibd graphs, including the result of the IBDgraph determination of equivalence classes. It also find the max number of FGL it will need: 26 in this example. It also provides ‘NLoci’; one more than the number of locations for which log-likelihoods will be computed.
Number of individuals in IBD graphs file is 47 1000 graphs were requested, but only 100 were given (W) Observed individual 5160 is not in the input IBD graphs file: The trait data on 5160 will be ignored (W) Number of individuals in nghd structure is 35 Have alloced gen_pen nFGL=26, NLoci=6 Opened output scores file "./ped47_gl_D.scores" ====== Computing LodScore for component 1 ====== Number of observed individuals in each IBD graph for component 1 is 27 Grouped 1292 locations into 988 equivalence classes Log likelihoods for 100 ibd graphs at markers 2 4 5 8 10 will be printed to the output scores file |
Then the program produces the log-likelihood contributions at each of the 5 marker locations for each of the 100 reduced ibd graphs. Note the final warning:
gl_lods is not able to provide lod scores unless the user provides a base log likelihood using the "set base log likelihood X1 ..." parameter statement (W) |
In order to produce lod scores, rather than only log-likelihoods a base
value is needed – corresponding to the marginal probability of trait data
in the normal lod score computation. Since gl_lods
does not have
pedigree information, it is unable to compute this value, and it must be
supplied via a parameter statement. It can be computed by running
base_trait_lods
.
Note also that the log-likelihood contributions
are base-e. The base log-likelihood to be input, and the final lod scores
produced, are base-10.
For the quantitative trait data and model, the gold-standard output file is ‘ped47_gl_lods_Q.gold’. The format of this file is identical to that for the discrete trait, except that now the data and trait model are for a quantitative trait. Additionally this example uses a permutation-based approach to produce the analogue of a lod score; see [GT15]. The gold standard uses only 5 permutations; in practice many more would be used.
Permutation lod scores will be computed Version 1 of permutation lods Component 1 lod scores found by gl_lods: Marker number LodScore StdErr marker-2 2.0274 marker-4 2.7382 marker-5 3.8031 marker-8 1.5405 marker-10 2.4196 Version 2 of permutation lods Component 1 lod scores found by gl_lods: Marker number LodScore StdErr marker-2 1.3984 marker-4 1.9594 marker-5 3.3204 marker-8 1.2037 marker-10 2.0212 |
This permutation approach is still in process of testing: two alternative estimates are presented. Note that although this approach produces a “LodScore” the interpretation is not as for a classical lod score; see [GT15]
gold.6
gold standard,
but may also be run in the
Lodscore subdirectory of ‘MORGAN_Examples’ using the command
(Note that the ‘MORGAN_Examples’ version is from MORGAN version
3.3: it is slightly modified in the gold standards for version 3.3.2)
./gl_lods simCmps_gl_lods.par > simCmps_gl.out |
The results in the output file ‘simCmps_gl.out’ are very similar to those for the previous example. The section summarizing the ibd graphs is
Opened input extra file "./simCmps_fgl.oscor" Number of individuals in IBD graphs file is 159 Number of individuals in nghd structure is 156 Have alloced gen_pen nFGL=96, NLoci=7 Opened output scores file "./simCmps_gl.scores" |
Then for each component the log-likelihoods are computed, but this time base log-likelihoods were provided in the parameter file. The log-likelihoods are therefore converted to estimated lod scores at the 6 requested marker positions. For component 1:
====== Computing LodScore for component 1 ====== Number of observed individuals in each IBD graph for component 1 is 19 Grouped 1260 locations into 544 equivalence classes Lod scores for 100 ibd graphs at markers: ....... ====== Computing LodScore for component 1 ====== Number of observed individuals in each IBD graph for component 1 is 18 Grouped 1260 locations into 536 equivalence classes Log likelihoods for 100 ibd graphs at markers 1 3 5 6 9 10 will be printed to the output scores file Component 1 lod scores found by gl_lods: Marker number LodScore StdErr marker-1 -6.9215 marker-3 -7.2357 marker-5 3.1253 marker-6 3.4153 marker-9 -4.8130 marker-10 -4.7849 |
This is then repeated for components 2 and 3.
There is
no Monte Carlo within this non-permutation version of gl_lods
.
However,
even if the
three pedigree components were identical,
there is Monte Carlo variation in the
generation of the ibd graphs by the gl_auto
program.
See Concept Index for: base log likelihood.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
base_trait_lods
programThe
base_trait_lods
program provides the
probability of trait data on a pedigree. (Previously, this information had
to be extracted from a dummy run of the lm_linkage
program.) The output
base log-likelihoods may be input to gl_lods
, enabling it to compute
a classical normalized lod score;
See Parameter files for the gl_lods program.
Typically the computation will be an exact single-locus
computation by pedigree peeling. However, in the event the pedigree is too
large or complex for exact computation a Monte Carlo alternative is provided.
An example of each is provided in the ‘Lodscore/Gold’ directory, and
they may be run as make gold.7
.
The input for exact computation required any a pedigree, trait data, and a specification of the trait model
input pedigree file "./xact.ped" output scores file "./bt_lods_pp.scores" input pedigree record trait 1 integer 8 set trait 1 data discrete select trait 1 set trait 1 tloc 15 set tloc 15 allele freqs 0.5 0.5 set trait 1 for tloc 15 incomplete penetrances 0.05 0.6 0.95 |
The output is self-explanatory: a base log likelihood is computed for each pedigree component. In this example there are two components, and the base log-likelihoods are also printed to the output scores file:
1 202 -3.326957 2 408 -1.563501 |
Note the name of an index individual is also given, so that components can be correctly identified. The base log-likelihoods can now be read directly from this file into the g_lods program.
Descent is simulated on the pedigree, and the realized IBD used to compute
a Monte Carlo estimate of the base log-likelihood using the gen_pen
routine. To avoid problems with underflow on large pedigrees,
some preliminary “pseudo-prior”
iterations are used to provide a crude base estimate.
This is then used in each term, and factored out at the end of the
computation. The only difference in the parameter file is that the
specified sampler seeds are used, and there are two Monte Carlo requests:
set sampler seeds 0x53f78285 0xdfbca001 # Replace by seed file for real runs. # input seed file "sampler.seed" # output overwrite seed file "sampler.seed" # Monte Carlo setup and requests set MC realizations 3000 set pseudo_prior iterations 150 |
The output is again self-explanatory. Note that, in addition to a Monte Carlo standard error, the 5th and 95th percentiles of the log-likelihood contributions are provided. once the distribution of contributions is often highly skewed, the quantiles often provide a better measure of precision than does the standard error.
====== Base trait LodScore for component 1 estimated using Monte Carlo ====== Now starting 150 preliminary realizations Crude log-likelihood from preliminary realizations = -8.047451e+00 Beginning 3000 realizations of Monte Carlo Monte Carlo realizations completed Number of Monte Carlo Monte Carlo 5th % 95th % realizations lod score StdErr quantile quantile 3000 -3.3298 0.0075 -4.0353 -2.8788 |
The estimated base log-likelihoods are printed to the output scores file as before. Note that in this very small example, there is almost no difference between the exact base log-likelihoods and the Monte Carlo estimates. However, in general this will not be so; where exact computation is feasible it is to be preferred.
1 202 -3.329781 2 408 -1.565421 |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
New statements for these programs include maps for test positions, and parameters for some additional MCMC algorithms.
See Concept Index for:
location lod scores statements,
gl_lods
statements,
lm_linkage
statements,
lm_bayes
statements.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
See Concept Index for: location lod scores computing requests.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
All Lodscore programs use the general MORGAN file identification statements (see File identification statements) and the Autozyg rescue file statements (see Autozyg file identification statements).
One additional statement is optional for lm_bayes
:
output Rao-Blackwellized estimates file
If this file is specified, the set of Rao-Blackwellized lod score estimates at each trait position is written at the frequency specified in the ‘compute scores’ statement.
The same standard out file specifications are available for lm_twoqtl
.
In particular:
output extra file
This statement is used by lm_twoqtl
to output batch mean estimates
used in computing the estimated Monte Carlo standard error.
See Concept Index for: location lod scores file identification statements, Rao-Blackwellized estimates.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Most Lodscore programs use the general MORGAN
pedigree file description
statements (see Pedigree file description statements).
However the gl_lods
program uses an individuals file
rather than a pedigree file (see Individuals file).
The following statement used by gl_lods
is the individuals-file analogue of
the corresponding pedigree file statement:
input individuals record names 2 [integers I] [reals J]
The two "names" consist of the name and component number of the individual.
In addition, there is a
statement optional for lm_linkage
and gl_lods
:
input pedigree record traits K1 K2 … reals X1 X2 …
This statement is analogous to ‘input pedigree record traits K1 K2 … integers I1 I2 …’ (see Pedigree file description statements) when the trait is quantitative, rather than discrete. It allows the file locations of multiple traits to be defined, although only one can be selected for any analysis.
See Concept Index for: location lod scores pedigree file description, quantitative trait.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
All Lodscore programs use the Autozyg output file description statements; See Autozyg output file description.
See Concept Index for: location lod scores output file description.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following statements describe the hypothesized trait locus (tloc) positions which are to be ‘tested’. That is, these are the positions at which lod scores will be computed.
map [chromosome I] [gender (M | F)] test tloc L1 all interval proportions X1 X2 …
Interval proportions specify the proportional genetic distance between markers for the trial positions for the test trait locus. The same ratios are used between each marker pair, regardless of the inter-genetic distance (in cM).
map [chromosome I] [gender (M | F)] test tloc L1 intervals J1 … proportions X1 X2 …
This statement specifies interval proportions, but between specific pairs of markers. Interval 1 is between markers 1 and 2, interval 2 is between markers 2 and 3, etc.
map [chromosome I] [gender (M | F)] test tloc L1 (beginning | ending | external) ([Kosambi] distances | recombination fractions) X1 X2 …
This statement specifies trial trait positions on the chromosome before the first marker and/or after the last marker.
map test tlocs L1 ... no default [interval proportions| external positions]
This pair of statements is used to eliminate computation of lod scores at default interval and/or external positions on the active chromosome.
map [chromosome I] test tloc L at markers J1 ...
This statement (new with MORGAN 3.0) is increasingly used with denser SNP marker data. If used, lod scores will be computed only at the positions of the specific markers. Note the marker indexing is by the count in the marker data file, not by selected marker.
map [chromosome I] test tlocs L1 L2 jointly at markers J11 J12 ...
This statement (not yet implemented) will allow two-locus lod score programs
such as lm_twoqtl
to compute lod scores only at any specified
combination of marker positions rather than, as currently, on a grid.
See Concept Index for: location lod scores mapping model parameters, trait test positions, map function.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
See Concept Index for: location lod scores population model parameters.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following additional statements are specific to lod score computations:
set maximum I ibdgraphs per component
This statement is used by gl_lods
which needs
to know the upper limit of the number of ibd graphs per pedigree
component provided as input.
set base log likelihood X1 X2 ...
This statement is used by the program gl_lods
. The log of the
marginal probability of the traits data on each pedigree component
must be given to enable gl_lods
to normalize its lod scores.
These may be computed using the
base_trait_lods
program: See Running the base_trait_lods program.
set number of permutations N
This statement may be used by the program gl_lods
. At each location
at which a log-likelihood is computed given the realized IBD, trait-data
permutation provides a normalizing null term conditional on that same IBD
structure. This permutation approach is still being studied; see [GT15].
simulate N ibd realizations
This statement may be used by the program base_trait_lods
, which
provides a probability of trait data on a defined pedigree, under a specified
trait model. If the pedigree is too complex for single-locus exact
pedigree peeling, a Monte Carlo estimate may be obtained by realizing IBD
on the pedigree. This statement provides the number of realizations to be
used.
set pseudo-priors X1 X2 ...
This statement is optional for lm_bayes
. The number of pseudo-priors is
the number of test trait locus positions plus one. The first pseudo-prior is
for the unlinked position; this should be assigned a positive value. All other
pseudo-priors must be positive or zero. The set of pseudo-priors need not be
normalized.
This statement is also optional for base_trait_lods
.
If the base_trait_lods
base log-likelihood is to be estimated by
Monte Carlo then this statement gives the number of realizations on which
a crude estimate is made. This is then used in the main iterations to
lessen the chance of over/under-flow
set I batches MC variance estimation
This statement is optional for lm_linkage
, lm_twoqtl
and gl_lods
.
These programs batch scored realizations in order to provide a
Monte Carlo estimate of the standard deviation in estimating the lod
score. This statement determines the batch size, and hence the number
of batches. By default it is determined such that there are 20 batches.
The following additional statements are specific to tloc specification
and likelihood computation for the program lm_twoqtl
.
set traits K1 ... multiple tlocs L1 ...
This statement is used by the lm_twoqtl
program to specify the
tlocs L1... that contribute to each trait K1...
A statement may be provided for each separate trait.
However, the lm_twoqtl
program expects selection of one trait
with either one or two contributing tlocs.
set trait K1 for tlocs L1 L2 joint genotype means X11 X12 X13 X21 X22 X23 X31 X32 X33
This statement specifies the 9 genotypic means (3x3 matrix) for tlocs L1 and L2 in contributing to trait K1. The first index on X refers to the L1 genotype and the second to L2.
use [exact|MC] summation for trait
This statement specifies whether exact or Monte Carlo (MC) will be used
by lm_twoqtl
for computation of the trait contribution to the
lod score. Exact summation can be used only on pedigrees with six or fewer
founders.
simulate I IBD realizations for trait
If Monte Carlo summation is to be used, this statement specifies the number of realizations of tloc inheritance realizations to be used. If Monte Carlo summation is not to be used, this statement is ignored.
use multiplier I realizations for null
This statement specifies the number of time as many realizations are to be used in estimating the base-line unlinked lod-score. To obtain accurate lod-score estimates it is important this value is accurate, and it may therefore be advisable to use more realizations.
See Concept Index for: location lod scores computational parameters, pseudo-prior iterations.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Please see that section for details regarding:
use (locus-by-locus sampling | sequential imputation) for setup use I sequential imputation realizations for setup set MC iterations I set burn-in iterations I sample by (scan | step) set L-sampler probability X check progress I MC iterations |
As with the Autozyg programs, the number of desired MC iterations must be specified, as there is no default value.
set MC iterations I
This statement sets the total number of
‘main’ L- and M-sampler iterations. For
lm_linkage
, the total MCMC run length is the sum of the number of burn-in
iterations and main iterations. For lm_bayes
, the total MCMC run length
is the sum of the number of burn-in, pseudo-prior (see below) and main
iterations.
Additional statements for lm_bayes
include the following:
set pseudo-prior iterations I
Following burn-in, lm_bayes
performs iterations to calculate the
pseudo-priors. These pseudo-priors are used to encourage the MCMC sampler to
visit test positions of low posterior probability. The default number of
iterations to compute pseudo-priors is 50% of the number of main iterations
specified in the ‘set MC iterations’ statement.
set sequential imputation proposals every I iterations
This option applies to lm_bayes
’s pseudo-prior and main MCMC iterations.
It allows the MCMC chain to “restart” every Ith iteration. Sequential
imputation is used to propose potential restart configurations which are
accepted/rejected with Metropolis-Hastings probability.
set test position window I
This lm_bayes
statement specifies the window size for the
proposed tloc position update in the
Metropolis-Hastings algorithm. I is the number of hypothesized trait
positions on either side of the current position, with equal weight given to the
2*I + 1 trait positions. The default is window size is 6.
See Concept Index for: location lod scores MCMC parameters and options, MC iterations, burn-in, sequential imputation proposals.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.