[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
See Concept Index for: location lod scores estimates.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_linkage
, lm_bayes
, lm_twoqtl
, and gl_lods
The programs lm_linkage
, lm_bayes
, and lm_twoqtl
are
referred to as `Lodscore' programs.
The program lm_linkage
replaces
the two programs lm_markers
and lm_multiple
of pre-2011
versions of MORGAN. As of 2011, the program lm_twoqtl
remains a beta test version.
The Lodscore programs use MCMC to perform multipoint linkage analysis and trait mapping on large pedigrees where many individuals may be unobserved and exact computation is infeasible. The data are the genotypes of observed individuals in the pedigree at marker loci and discrete or continuous trait data. As with exact methods of computing lod scores, the genetic model is assumed known. The only unknown parameter is the location of the trait locus. Therefore, the user is required to specify the marker locations, trait and marker allele frequencies and penetrance function. Presently, users are limited in their choice of penetrance function, but this is under revision and will change in future releases of MORGAN.
lm_linkage
is an implementation of the Lange-Sobel
estimator, using either the single- or multiple-meiosis LM-sampler:
See Single and multiple meiosis LM-samplers.
The
Lange-Sobel estimate works reasonably well in reasonable time, provided a good
MCMC sampler is used, and provided the trait data do not have strong impact on
the conditional distribution of meiosis indicators.
The lm_linkage
program samples only the meiosis indicators at
marker loci, and only conditional on the marker data.
Even when the trait inheritance information is strong,
the method can produce quite accurate lod scores in the absence of linkage,
but it can be inaccurate in estimating the strength of linkage signals.
As well as
producing the lod score, our current
implementation provides a batch-means pointwise
estimate of the Monte Carlo standard error of the lod-score estimate.
lm_linkage
can work with genotypic, discrete or quantitative traits.
lm_linkage
combines the earlier programs lm_markers
and
lm_multiple
.
The original lm_multiple
program and multiple-meiosis
sampler are the work
of Liping Tong [TT08].
As well as allowing use of either the single- or multiple-meiosis
LM-sampler, the lm_linkage
program
optionally perform exact lodscore computations on small pedigree components,
and
includes better exact computation and pedigree peeling options for use
in the lod score estimator (see Exact HMM computations).
lm_bayes
is an alternative method implemented for genotypic or discrete
traits. The MCMC performance is better than for the old
lm_markers
program, but it has
other computational overheads. lm_bayes
samples trait locations from a
posterior distribution, and then divides it by the prior to produce the
likelihood and hence the lod score. Estimation is in two phases. A preliminary
run with discrete uniform prior gives order-of-magnitude relative likelihoods.
Then, using the inverse of these likelihoods as prior weights of a
`pseudo-prior' distribution. Using this `pseudo-prior'
a second run is made to estimate the likelihood.
The purpose of the `pseudo-prior' is to produce an
approximately uniform posterior, so that likellihoods will be well estimated
at all test positions.
It is important that the initial run is long enough for all test positions
to be sampled, and for the unlinked trait position to have a reasonable number
of realizations. For locations at which lod scores are very negative, or for the
unlinked position when there is some linked
location with strong positive lod score,
this can be problematic.
Our current implementation of lm_bayes
provides two lod score estimates.
The first is a crude estimate which counts realizations of locations sampled to
estimate the posterior: as can be seen from the output this can be quite
erratic. The Rao-Blackwellized estimator is much preferred, and produces good
estimates in reasonable time. The lm_bayes
program is the work of
Andrew George [GT03,GWT05].
The beta-test program lm_twoqtl
does parametric linkage analysis for a quantitative trait model having
one or two linked QTL and a polygenic component. Each QTL is diallelic with 3
different genotypic means.
The Normally distributed polygenic component does not include dominance,
and the environmental contribution is has a Normal distribution
with mean zero and uncorrelated among individuals.
The program output consists of
MCMC-based lod score estimates
of the joint locations of the one or two contributing QTL.
As of 2011, the program uses the same MCMC options as lm_linkage
for sampling descent at marker loci conditonal on marker data. Conditionally
on these realizations the program then uses exact computation (on very small
pedigrees) or an additional level of Monte Carlo to estimate the relevant
lod score contributions. The original versions of the lm_twoqtl
were the work on YunJu Sung [STW07,SW07].
The beta-test program gl_lods
computes lod score contributions for
a discrete or
continuous trait given a set of ibd_graphs across the chromosome, produced
by gl_auto
:
See Introduction to lm_auto gl_auto and lm_pval.
If the gl_auto
run uses the
`set MCMC markers only'
option, then the overall lod score computed by gl_lods
is identical to
that produces by lm_linkage
when the same MCMC options are used in the
in gl_auto
and in lm_linkage
.
gl_lods
uses the same parameter statements as lm_linkage
( Location lod scores statements), but
ignores some input statements and uses others in a non-standard way.
For further information on the motivation for splitting of the
lm_linkage
lod score
computation into the generation of marker-based ibd graphs (using
gl_auto
) followed by trait-likelihood computation on the ibd graphs:
See Parameter files for the gl_lods program. See also [Tho11].
See References, for details of the cited papers.
See Concept Index for:
Markov chain Monte Carlo,
lm_linkage
introduction,
lm_bayes
introduction,
lm_twoqtl
introduction,
meiosis indicators,
multiple meiosis sampler.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_linkage
and lm_bayes
There are three example parameter files in the `Lodscores' subdirectory:
`ped73_ge.par', `ped73_ph.par' and `ped73_qu.par'. These files
are examples of how to analyze genotypic, discrete (phenotypic), and
quantitative (continuous) traits,
respectively. Each of these files is written for use with lm_linkage
since this
is our preferred program and can analyze genotypic, discrete, and
quantitative traits. The program lm_bayes
will run with the same
parameter files `ped73_ge.par' and `ped73_ph.par', but will adopt
defaults for several statements specific to this program and will generate
warning for others not implemented for lm_bayes
. If lm_bayes
is run using `ped73_qu.par', all statements regarding quantitative
traits will be igmored, and the program will use default genotypic
data.
The marker and MCMC information is very similar for all three parameter files. For `ped73_qu.par' it is as follows:
set printlevel 5 input pedigree file '../ped73.ped' input marker data file '../ped73.marker.missing' input seed file '../sampler.seed' output overwrite seed file '../sampler.seed' set trait 1 data quantitative input pedigree record trait 1 real 1 select all markers select trait 1 set trait 1 tloc 1 map test tloc 1 all interval proportions 0.3 0.7 map test tloc 1 external recomb fracts 0.05 0.15 0.3 0.4 0.45 sample by scan set L-sampler probability 0.2 set burn-in iterations 150 set MC iterations 3000 check progress MC iterations 1000 set global MCMC use single meiosis sampler |
The pedigree file specified by the `input pedigree file' statement can contain multiple traits. As discussed in previous sections, the marker map, allele frequencies and genotypes can be contained in the parameter file or in a separate file specified by the `input marker data file' statement as in the example above.
As in other programs, the trait data are included in the pedigree file. The `select trait' statement tells the program which trait in this file is to be analyzed, and the `input pedigree record trait' indicates where the data are to be found, while the `set trait ...tloc...' statement connects the trait with a specific tloc for this analysis.
The two `map test tloc' statements give trait locus test positions at which the lod scores should be calculated. When the trait locus is located between two markers, the position is specified in terms of the proportional genetic distance between the two markers (this option makes handling gender-specific maps easy). In this example, the test trait positions are specified to be at 30 and 70 percent of the interval. The second `map test tloc' statement allows test trait locus positions located before the first marker or after the last marker to be specified; the postitions are specified explicitly in terms of recombination fractions (or genetic distances) with the nearest marker locus. Note that an external recombination fraction of 0.5 is not necessary since the likelihood of an unlinked trait locus is always used as a reference when computing the lod scores.
The final seven statements give MCMC specifications. The `sample by scan' statement instructs the program to update all the meiosis indicators, S, at each iteration, in an order determined by random permutation. The alternative `sample by step' updates only one locus (L-sampler) or only one meiosis (M-sampler) in each iteration. The `sample by scan' statement is the default and strongly recommended. The L-sampler probability is set at 20 percent, which is often a good choice. For a detailed discussion of effects of varying L- to M-sampler ratio, see section 10.6 in [Tho00].
In the `set burn-in iterations' statement, 150 burn-in iterations, are requested. The next statement requests 3000 MCMC iterations; for each realized set of marker-location inheritance vectors the trait-likelihood contribution will be computed at each test position of the trait locus. This is for demonstration purposes only. For real data analyses, use longer runs, on the order of 10^5 MCMC iterations. The last statement in this group tells the program to report progress every 1000 iterations.
Although the lm_linkage
program
can use the multiple-meiosis sampler, and this is recommended, the final
two statements here specify `set global MCMC' and
`use single meiosis sampler'.
Thus, for this example, the
single-meiosis sampler will be used (as in the old lm_markers
program)
and MCMC will be performed globally over all pedigree components,
rather than component-by-component. This provides an example of how
these options may be used for compatibility with older examples.
For more details of the MCMC specifications see MCMC parameter statements.
Specifying Trait Data Type
Trait data type is set by using the `set trait data' statement. Recall that the `input pedigree record trait' statement must be used to specify which column in the file is to be used as the trait value (see Pedigree file description statements). The three trait data types discussed in this example are implemented by including the following statements in the parameter file discussed above. Note the trait and numbers are arbitrary, but the connection must be made consistently through the file.
`ped73_ge.par' specifies a genotypic trait with the following statements:
set trait 3 data genotypic input pedigree record trait 3 integer 3 select trait 3 set trait 3 tloc 1 set tloc 1 allele freqs 0.4 0.6 |
`ped73_ph.par' specifies a phenotypic trait with the following statements:
set trait 2 data discrete input pedigree record trait 2 integer 4 select trait 2 set traits 2 tlocs 1 set traits 2 for tlocs 1 incomplete penetrance 0.05 0.6 0.95 set tlocs 1 allele freqs 0.4 0.6 |
Recall that for discrete data, one must specify the penetrances (see Autozyg computational parameters).
`ped73_qu.par' specifies a quantitative trait with the following statements:
set trait 1 data quantitative input pedigree record trait 1 real 1 select trait 1 set trait 1 tloc 1 set trait 1 for tlocs 1 genotype mean 90.0 100.0 110.0 set trait 1 residual variance 25.0 set tloc 1 allele freqs 0.4 0.6 |
When using a quantitative trait, genotypic means and residual variance must be specified. Additive variance can be specified with the statement `set trait ... additive variance'. The default value is zero. The `set tloc ... allele freqs' statement specifies allele frequencies at the trait locus. If the allele frequencies sum to less than 1, a warning message will be issued:
Sum of allele frequencies is not in range .9999, 1.0001 (W) |
If the allele frequencies sum to above 1.0001, the program quits and generates an error message.
Below is a summary of the trait data types accepted for each program:
Genotypic ped73_ge.par | Phenotypic ped73_ph.par | Quantitative ped73_qu.par | |
lm_linkage | Yes | Yes | Yes |
lm_bayes | Yes | Yes | No |
See Concept Index for:
sample parameter file for lm_linkage
,
sample parameter file for lm_bayes
,
gender--specific maps,
meiosis indicators,
L-sampler,
M-sampler,
multiple meiosis sampler,
genotypic trait specification for lod score calculation,
phenotypic trait specification for lod score calculation,
discrete trait specification for lod score calculation,
quantitative trait specification for lod score calculation,
continuous trait specification for lod score calculation.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_linkage
examples and sample output
lm_linkage
can be run with all three parameter files in the
`Lodscores/' subdirectory. As usual, the syntax for running the
program is:
./lm_linkage <parameter file> |
This section describes the output obtained by using the parameter file `ped73_qu.par'. To run the example, type:
./lm_linkage ped73_qu.par |
The interesting part of the output is the LodScore estimates. For each test position, we have the estimated lod score and the estimated Monte Carlo standard error.
LodScore estimates by Rao-Blackwellized computation: Trait pos # position (Haldane cM) or marker male female LodScore StdErr 1 -115.129 -115.129 0.0303 0.0005 2 -80.472 -80.472 0.0558 0.0012 3 -45.815 -45.815 0.0779 0.0031 4 -17.834 -17.834 -0.0306 0.0080 5 -5.268 -5.268 -0.2811 0.0142 marker-1 0.000 0.000 -0.4986 0.0195 6 3.000 3.000 -0.4469 0.0141 7 7.000 7.000 -0.4342 0.0230 marker-2 10.000 10.000 -0.4605 0.0363 8 13.000 13.000 -0.4254 0.0247 9 17.000 17.000 -0.4454 0.0209 marker-3 20.000 20.000 -0.5301 0.0197 10 23.000 23.000 -0.3174 0.0211 11 27.000 27.000 -0.1176 0.0233 marker-4 30.000 30.000 -0.0052 0.0259 12 33.000 33.000 0.5058 0.0208 13 37.000 37.000 0.8794 0.0159 marker-5 40.000 40.000 1.0772 0.0138 14 43.000 43.000 0.9832 0.0156 15 47.000 47.000 0.8432 0.0213 marker-6 50.000 50.000 0.7210 0.0252 16 53.000 53.000 0.6558 0.0256 17 57.000 57.000 0.5140 0.0271 marker-7 60.000 60.000 0.3522 0.0288 18 63.000 63.000 0.0113 0.0225 19 67.000 67.000 -0.5473 0.0123 marker-8 70.000 70.000 -0.9543 0.0095 20 73.000 73.000 -0.4578 0.0212 21 77.000 77.000 -0.1866 0.0178 marker-9 80.000 80.000 -0.1135 0.0116 22 83.000 83.000 0.0888 0.0091 23 87.000 87.000 0.3132 0.0064 marker-10 90.000 90.000 0.4544 0.0071 24 95.268 95.268 0.6010 0.0046 25 107.834 107.834 0.6423 0.0028 26 135.815 135.815 0.4017 0.0011 27 170.472 170.472 0.1762 0.0003 28 205.129 205.129 0.0758 0.0001 |
For more information regarding the MCMC parameters and diagnostic output, See MCMC computational options.
See Concept Index for:
running lm_linkage
examples,
lm_linkage
sample output.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_bayes
examples and sample output
Under the subdirectory `Lodscores/', run the lm_bayes
example on
the discrete (phenotypic) trait data by typing:
./lm_bayes ped73_ph.par |
The results from lm_bayes
are the lod scores toward the end of the
output. Two estimates of the lod scores are provided: (1) count
realizations of locations sampled to estimate the posterior probability
(`crude')
and (2) Rao-Blackwellized estimator (`R-B').
Both are provided for comparison,
but the latter should be more accurate.
LodScore estimates: Trait pos # position (Haldane cM) pseudo freq LodScore or marker male female prior visited crude R-B 0 unlinked unlinked 0.025023 94 NA NA 1 -115.129 -115.129 0.025276 66 -0.1580 -0.0046 2 -80.472 -80.472 0.025727 77 -0.0987 -0.0125 3 -45.815 -45.815 0.027843 96 -0.0372 -0.0473 4 -17.834 -17.834 0.037973 71 -0.3030 -0.1825 5 -5.268 -5.268 0.057289 96 -0.3506 -0.3583 marker-1 0.000 0.000 NA NA NA NA 6 3.000 3.000 0.078826 89 -0.5221 -0.4919 7 7.000 7.000 0.086379 88 -0.5667 -0.5255 marker-2 10.000 10.000 NA NA NA NA 8 13.000 13.000 0.092502 87 -0.6014 -0.5456 9 17.000 17.000 0.090858 94 -0.5600 -0.5386 marker-3 20.000 20.000 NA NA NA NA 10 23.000 23.000 0.063483 109 -0.3400 -0.3738 11 27.000 27.000 0.044111 103 -0.2065 -0.2086 marker-4 30.000 30.000 NA NA NA NA 12 33.000 33.000 0.026053 114 0.0663 0.0203 13 37.000 37.000 0.018403 103 0.1731 0.1698 marker-5 40.000 40.000 NA NA NA NA 14 43.000 43.000 0.011818 100 0.3527 0.3585 15 47.000 47.000 0.009347 90 0.4088 0.4600 marker-6 50.000 50.000 NA NA NA NA 16 53.000 53.000 0.010351 121 0.4930 0.4236 17 57.000 57.000 0.014614 121 0.3432 0.2804 marker-7 60.000 60.000 NA NA NA NA 18 63.000 63.000 0.023348 96 0.0392 0.0769 19 67.000 67.000 0.030506 123 0.0307 -0.0412 marker-8 70.000 70.000 NA NA NA NA 20 73.000 73.000 0.033357 136 0.0356 -0.0903 21 77.000 77.000 0.030400 124 0.0358 -0.0514 marker-9 80.000 80.000 NA NA NA NA 22 83.000 83.000 0.024811 96 0.0128 0.0282 23 87.000 87.000 0.019535 144 0.2928 0.1160 marker-10 90.000 90.000 NA NA NA NA 24 95.268 95.268 0.013755 110 0.3281 0.2561 25 107.834 107.834 0.013714 125 0.3849 0.2600 26 135.815 135.815 0.018372 132 0.2816 0.1339 27 170.472 170.472 0.022361 108 0.1091 0.0489 28 205.129 205.129 0.023966 87 -0.0149 0.0188 |
Note that lm_bayes
does not provide lod scores at the marker locations.
See Concept Index for:
running lm_bayes
examples,
lm_bayes
sample output,
Rao-Blackwellized estimates.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_twoqtl
examples and sample output
The program lm_twoqtl
remains beta test, so that instead of examples
in `MORGAN_V30_Examples' we describe the gold standard example in
the main MORGAN source directory in subdirectory
`Lodscore/Gold'. However, much work has
been done on lm_twoqtl
, so that its marker-based MCMC is now as for
lm_linkage
.
Four gold-standard
examples of the lm_twoqtl
parameter files and output may be
found in the `Lodscore/Gold' directory of
MORGAN.
Examples 1 and 3 are for a single trait locus, and 2 and 4 use two QTL.
Examples 1 and 2 use exact computation for the trait lod score contributions
on these very small examples. Examples 3 and 4 use Monte Carlo.
We will use example 4 in this tutorial description, since this is the most
general and novel.
To create other examples,
copy one of these files and replace the parameters in the
file with those that you want to specify.
The various trait-model options for lm_twoqtl
are summarized
in the following table:
Additive Genetic Variance: | Zero | Positive | ||
Number of QTL: 1 | one locus | one locus plus polygene | ||
2 | two loci | two loci plus polygene |
Trait models can be any of the above four entries. However, for a one-locus
trait model with no polygenic component, the program lm_linkage
will provide more accurate results more quickly.
The lod score is estimated on a one-dimensional grid of points for one QTL, and a two-dimensional grid of points for two QTL. In the future the new parameter statement
map [chromosome I] test tlocs L1 L2 jointly at markers J11 J12 ... |
The content of file `twoqtl4.par' (reordered slightly for clarity) is:
use single meiosis sampler # Select the MCMC sampler to be used. set printlevel 5 # Include everything in the output file. set sampler seeds 0x53f78285 0xdfbca001 set trait seeds 0x53f78285 0xdfbca001 input marker data file `./twoqtl.markers' input pedigree file `./twoqtl.ped' output extra file `./twoqtl_batch4' select all markers select trait 1 set trait 1 multiple tlocs 1 2 set tloc 1 allele freqs 0.1 0.9 set tloc 2 allele freqs 0.3 0.7 # requests for grid of tloc positions for lod scores map test tloc 1 all interval proportions 0.5 map test tloc 1 external recomb fracts 0.3 map test tloc 2 all interval proportions 0.5 map test tloc 2 no default external positions # standard MCMC requests use sequential imputation for setup use 100 sequential imputation realizations for setup sample by scan set L-sampler probability 0.2 set burn-in iterations 10 set MC iterations 60 compute scores every 10 iterations # lodscore scoring rquests set 3 batches MC variance estimation check progress 20 MC iterations use MC summation for trait use 5 MC realizations for trait use multiplier 1 MC realization for null # quantitative trait model specification set trait 1 data quantitative set trait 1 for tloc 1 genotype mean -2.0 0.0 2.0 set trait 1 for tloc 2 genotype mean -3.0 0.0 3.0 set trait 1 residual variance 1.0 set trait 1 additive variance 1.0 |
Note the number of MCMC scans (60) is very small, as also are the number
of Monte Carlo realizations to be used in evaluating the trait likelihood
contributions (5). Additionally, only every 10th MCMC scan is used
for computing lod score contributions. This is reasonable, in that for
lm_twoqtl
lod score computation is computationally intensive, so
that the standard procedure of scoring every scan is not efficient.
However, with only 60 total scans, this means lod scores are based on only
6 realizations of inheritance conditional on the marker data.
The example is for illustrative purposes only; in
real examples much more Monte Carlo would be required both in the
marker-based MCMC and for estimating trait contributions to
each score.
Most statements are as for earlier lod-score programs and can be found in the Statement Index. The statements included in this example that require additional comment are
use single meiosis sampler
multiple meiosis sampler
is preferred.
set trait 1 multiple tlocs 1 2
set trait 1 for tloc 1 genotype mean -2.0 0.0 2.0
set trait 1 for tloc 2 genotype mean -3.0 0.0 3.0
set trait 1 for tlocs 1 2 joint genotype means ...
followed by 9 genotype means for the two tloc genotype combinations.
output extra file `./twoqtl_batch4'
lm_twoqtl
for output of batched
means used in variance estimation. Most users will not require this file,
although it can be used in MCMC diagnostics.
set 3 batches MC variance estimation
use MC summation for trait
use 5 MC realizations for trait
use multiplier 1 MC realization for null
The default procedure of estimation of lod scores on each of the two components separately is used. These are then summed, giving the final concluding output for this example:
# Lod score estimates and MC sd for entire pedigree: # Index TLoc1 TLoc2 LodScore StdErr 1 -45.815 -45.815 1.4058 0.7605 2 -45.815 0.000 0.3872 0.6212 3 -45.815 10.000 0.4379 0.6232 4 -45.815 20.000 0.6616 0.6856 5 -45.815 65.815 0.4661 0.5916 6 0.000 -45.815 0.3503 0.6529 7 0.000 0.000 0.1343 0.6034 8 0.000 10.000 -0.0129 0.5600 9 0.000 20.000 1.0364 0.5772 10 0.000 65.815 1.1077 0.7724 11 10.000 -45.815 0.0983 0.6902 12 10.000 0.000 0.0461 0.5605 13 10.000 10.000 0.8585 0.6088 14 10.000 20.000 1.2174 0.5866 15 10.000 65.815 0.3512 0.7539 16 20.000 -45.815 1.5180 0.6479 17 20.000 0.000 0.7387 0.6567 18 20.000 10.000 0.9330 0.7020 19 20.000 20.000 1.2473 0.6162 20 20.000 65.815 1.3465 0.7664 21 65.815 -45.815 -0.3066 0.7764 22 65.815 0.000 -0.1259 0.6895 23 65.815 10.000 0.5714 0.7124 24 65.815 20.000 1.3537 0.6550 25 65.815 65.815 0.5343 0.7140 |
These results consist of base-10 lodscore estimates with MCMC standard deviations, estimated at the requested grid of test positions.
See Concept Index for:
running lm_twoqtl
examples,
lm_twoqtl
sample output,
map test tlocs jointly at markers.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
gl_lods
program
The beta-test program gl_lods
computes lod score contributions for
a discrete or
continuous trait given a set of ibd_graphs across the chromosome, produced
by gl_auto
:
See Introduction to lm_auto gl_auto and lm_pval.
If the gl_auto
run uses the
`set MCMC markers only'
option, then the overall lod score computed by gl_lods
is identical to
that produces by lm_linkage
when the same MCMC options are used in the
in gl_auto
and in lm_linkage
.
gl_lods
uses the same parameter statements as lm_linkage
( Location lod scores statements), but
ignores some input statements and uses others in a non-standard way.
Basically the goal of using gl_auto
and gl_lods
is to separate the
lod score computation from the marker-based MCMC that produces realizations
of the inheritance vectors at loci across the chromosome. The input to
gl_lods
consists of these realizations,
in the format of gl_auto
output
compressed ibd_graphs, a specification of a trait model, and a list
of individuals with their trait data. The additional information provided
in the gl_lods
parameter files is to insure compatibility with other
MORGAN programs, specifically lm_linkage
. Some
lm_linkage
parameter statements are
used by gl_lods
in a non-standard way. Others are dummy statements.
The separation of lod-score computation and marker-based MCMC has several advantages:
gl_lods
PROGAM)
gl_lods
uses no marker data or pedigree data, using
instead the already generated ibd graph output of gl_auto
. While
a `pedigree file' is included to satisfy
MORGAN requirements, and to
provide the trait data, the pedigree structure information is
not used by gl_lods
and can be completely `dummy';
an example is given below.
As for other beta-test programs, we describe here the details of the gold standard example in the `Lodscore/Gold' subdirectory of MORGAN. The files mentioned below are in that directory and the gold standard may be run from that directory as `make gold.6'.
The following shows the general section of the parameter file `ped47_gl_lods.par':
set printlevel 5 # Include everything in the output file. # The MORGAN pedigree file provides the individual trait data but is # otherwise "dummy" Likewise a few dummy marker statements are required. input pedigree file 'ped47_dummy.ped' input marker data file 'ped47_dummy.markers' input pedigree size 42 # (at least) the number in the dummy pedigree file # In fact, there are 35 individuals in the pedigree file # The extra input file contains the ibd graphs output by gl_auto. # The number of individuals in these ibd graphs is determined by the program. # The output scores file is a reduced version containing only individuals # who are found in the pedigree file, and who have data for the trait. input extra file 'ped47_fgl.oscor' output scores file 'ped47_fgl.reduce' # Number of MC iterations provides the number of replicates in gl_auto file. # If larger than the number in the file, a warning is given. # If less than the number in the file, only this number will be used. set MC iterations 1000 # The "select markers" statement will determine the markers at which # lod scores are computed -- the "map test tloc" statment below is DUMMY. select markers 2 4 5 8 10 # The following are required DUMMY statements. select trait 1 set trait 1 tloc 11 set tloc 11 allele freqs 0.5 0.5 use single meiosis sampler # DUMMY statement; there is no MCMC map test tloc 11 at marker 1 # DUMMY statement; lm_linkage requires 1 # map test tloc statement set L-sampler probability 0.5 # DUMMY statement; there is no MCMC |
The pedigree is `ped47_dummy.ped'. Note that the actual pedigree
structure used in this file is not used, other than to establish to
MORGAN that a single pedigree is involved. In reality, there
may be several pedigree components, if ibd graphs on these components
were generated jointly in the gl_auto
output file.
# This pedigree file contains a subset of the 47 individuals in the gl_auto # output file, and maybe other individuals. # The pedigree file must include all those whose trait data are # to be included: # Individuals not in the gl_auto ibd graphs will be dropped. # Individuals not observed for specified trait will be dropped. # Individuals in the gl_auto file but not the pedigree file will be # assumed unobserved (and dropped). input pedigree record names 3 integers 7 reals 1 # The first three items are "names" which are character strings. # They are the unique IDs of each individual and his/her parents. # This pedigree file is DUMMY: the first two individuals are # designated "dad" (male) and "mom" (female) and all others are # specified as offspring of these two. # The purpose of this structure is to specify a single pedigree component. # If preferred the true original pedigree may be used. # # There follow 7 integers # The first of these (4 th. item) is dummy gender (0,for the"kids") # The next is an "observed" indicator: not used by gl_lods. # The next is a trait genotype: (not used in these examples) # The next is a trait phenotype: # 0 is unobserved. 1 is unaffected, 2 is affected # The next 2 code trait-locus inheritance patterns, # not used by the gl_lods program. # The final is a dummy trait indicating data for the examples, # but not used by the gl_lods progrsm. # Finally there is one real (read as double): this is a quantitative trait. # Numbers with integer part 999 code for unobserved. # # Many individuals with no data are dropped: file has only 34 individuals *************************************************** 302 0 0 1 1 3 2 1 1 1 105.945 306 0 0 2 1 3 1 1 1 1 99.822 307 302 306 0 1 4 2 1 1 1 111.696 308 302 306 0 0 0 0 1 1 0 999.5 3010 302 306 0 0 0 0 -1 -1 0 999.5 404 302 306 0 1 3 1 1 0 1 89.535 406 302 306 0 1 4 2 1 0 1 112.197 407 302 306 0 1 4 2 1 1 1 111.608 408 302 306 0 1 4 2 0 0 1 107.467 3050 302 306 0 0 0 0 -1 -1 0 999.5 410 302 306 0 1 4 1 1 1 1 92.77 411 302 306 0 1 3 2 1 1 1 106.814 412 302 306 0 1 4 1 1 0 1 99.992 3080 302 306 0 0 0 0 -1 -1 0 999.5 414 302 306 0 1 3 2 0 1 1 102.505 415 302 306 0 1 3 1 0 1 1 99.415 416 302 306 0 1 3 2 1 1 1 100.155 505 302 306 0 1 4 2 0 0 1 111.798 506 302 306 0 1 1 1 0 0 1 88.576 507 302 306 0 1 4 2 0 1 1 105.454 508 302 306 0 1 4 2 1 1 1 112.171 4050 302 306 0 0 0 0 -1 -1 0 999.5 509 302 306 0 1 3 1 0 0 1 99.518 510 302 306 0 1 3 1 1 1 1 98.543 511 302 306 0 1 3 2 1 1 1 111.349 512 302 306 0 1 3 2 1 0 1 100.304 513 302 306 0 1 4 2 1 0 1 103.615 514 302 306 0 1 4 2 0 0 1 115.385 515 302 306 0 0 0 0 0 0 0 999.5 516 302 306 0 1 4 2 0 0 1 111.138 5080 302 306 0 0 0 0 -1 -1 0 999.5 601 302 306 0 1 4 2 1 1 1 112.285 5160 302 306 0 1 3 1 -1 -1 0 97.043 5150 302 306 0 1 3 1 -1 -1 1 97.043 602 302 306 0 1 4 2 0 1 1 105.991 |
To satisfy MORGAN statements a few dummy marker statements are also required. The (dummy) marker data file is `ped47_dummy.markers':
# THESE ARE TOTALLY DUMMY STATEMENTS map marker dist 1 1 1 1 1 1 1 1 1 # dummy marker map set markers 1 2 3 4 5 6 7 8 9 10 allel freqs 0.5 0.5 # dummy marker model set markers 10 data 302 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 # dummy marker data |
Finally we require the gl_auto
output scores file,
`ped47_fgl.oscor'.
This file contains 100 ibd graphs on 47 individuals:
101, 102, 201, 202, 2010, 301, 302, 304, 2020, 305, 306, 307, 308, 3010, 404, 3040, 405, 406, 407, 408, 3050, 410, 411, 412, 3080, 414, 415, 416, 4040, 505, 506, 507, 508, 4050, 509, 510, 511, 512, 4080, 513, 514, 515, 516, 5080, 601, 5150, 602. |
Comparing with the 35-member pedigree file we see:
There will remain 27 individuals who will be in the reduced ibd graph file
created by gl_lods
.
Finally we require a trait-model specification; lod scores are computed under this model. The example file `ped47_D.par' provides a provides the model for a discrete trait:
set trait 1 data discrete input pedigree record trait 1 integer 4 set trait 1 for tloc 11 incomplete penetrances 0.1 0.6 0.9 |
Alternatively, the example file `ped46_Q.par' provides the models for a quantitative trait:
set trait 1 data quantitative input pedigree record trait 1 real 1 set trait 1 for tloc 11 genotype means 90.0 100.0 110.0 set trait 1 residual variance 25.0 |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
gl_lods
examples and sample output There are two gold standards; one for a quantitative trait and one for a discrete trait, with gold-standard output files `ped37_gl_lods_D.gold' and `ped37_gl_lods_D.gold'. We describe here the output file `ped37_gl_lods_D.gold'.
Much of the early output is standard MORGAN processing of the mainly dummy statments and can be ignored. Apart from a summary of the discrete trait phenotypes, the first output of interest occurs around line 145:
First it summarizes the input pedigree and ibd-graph files:
5 Selected markers: 2 4 5 8 10 nFGL from dummy pedigree input = 4 Opened input extra file "ped47_fgl.oscor" Number of individuals in dgl-graph file is 47 Opened input extra file "ped47_fgl.oscor" |
Then it process the lists, and notes the extra individual 5160 (see above)
Observed individual 5160 is not in DGL file: Trait data on 5160 will be ignored (W) Opened input extra file "ped47_fgl.oscor" Opened output score file "ped47_fgl.reduce" |
Then it processes its reduced list of observed individuals, reducing the ibd graphs, now of 27 individuals only. It also find the max number of FGL it will need: 26 in this example. It allocates for these; and provides also `NLoci' which is one greater than the number of selected markers.
1000 Graphs were requested, but only 100 were given (W) Number of individuals in each DGL graph = 27 Number of individuals in nghd structure = 35 Reset number of FGL = 26 Reopened reduced DGL graph file ped47_fgl.reduce have alloced gen_pen nFGL=26, NLoci=6 |
Then the program produces the log-likelihood contributions at each of the 5 marker locations for each of the 100 reduced ibd graphs.
For the quantitative trait data and model, the gold-standard output file is `ped47_gl_lods_Q.gold'. The format of this file is identical to that for the discrete trait, except that now the data and trait model are for a quantitative trait.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
New statements for these programs include maps for test positions, and parameters for some additional MCMC algorithms.
See Concept Index for:
location lod scores statements,
lm_lods
statements,
lm_linkage
statements,
lm_bayes
statements.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
See Concept Index for: location lod scores computing requests.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
All Lodscore programs use the general MORGAN file identification statements (see File identification statements) and the Autozyg rescue file statements (see Autozyg file identification statements).
One additional statement is optional for lm_bayes
:
output Rao-Blackwellized estimates file
The same standard out file specifications are available for lm_twoqtl
.
In particular:
output extra file
lm_twoqtl
to output batch mean estimates
used in computing the estimated Monte Carlo standard error.
See Concept Index for: location lod scores file identification statements, Rao-Blackwellized estimates.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
All Lodscore programs use the general MORGAN pedigree file description
statements (see Pedigree file description statements). One additional
statement is optional for lm_linkage
.
input pedigree record traits K1 K2 ... reals I1 I2 ...
This statement is analogous to `input pedigree record traits K1 K2 ... integers I1 I2 ...' (see Pedigree file description statements) when the trait is quantitative, rather than discrete.
See Concept Index for: location lod scores pedigree file description, quantitative trait.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
All Lodscore programs use the Autozyg output file description statements; See Autozyg output file description.
See Concept Index for: location lod scores output file description.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following statements describe the hypothesized trait locus (tloc) positions which are to be `tested'. That is, these are the positions at which lod scores will be computed.
map [chromosome I] [gender (M | F)] test tloc L1 all interval proportions X1 X2 ...
map [chromosome I] [gender (M | F)] test tloc L1 intervals J1 ... proportions X1 X2 ...
map [chromosome I] [gender (M | F)] test tloc L1 (beginning | ending | external) ([Kosambi] distances | recombination fractions) X1 X2 ...
map test tlocs L1 ... no default [interval proportions| external positions]
map [chromosome I] test tloc L at markers J1 ...
map [chromosome I] test tlocs L1 L2 jointly at markers J11 J12 ...
lm_twoqtl
to compute lod scores only at any specified
combination of marker positions rather than, as currently, on a grid.
See Concept Index for: location lod scores mapping model parameters, trait test positions, Kosambi map function, Haldane map function.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
See Concept Index for: location lod scores population model parameters.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following additional statements are specific to lod score computations:
set pseudo-priors X1 X2 ...
This statment is optional for lm_bayes
. The number of pseudo-priors is
the number of test trait locus positions plus one. The first pseudo-prior is
for the unlinked position; this should be assigned a positive value. All other
pseudo-priors must be positive or zero. The set of pseudo-priors need not be
normalized.
set I batches MC variance estimation
This statement is optional for lm_linkage
and lm_twoqtl
.
These programs batch scored realizations in order to provide a
Monte Carlo estimate of the standard deviation in estimating the lod
score. This statement determines the batch size, and hence the number
of batches. By default it is determined such that there are 20 batches.
lm_twoqtl
.
set traits K1 ... multiple tlocs L1 ...
This statement is used by the lm_twoqtl
program to specify the
tlocs L1... that contribute to each trait K1...
A statement may be provided for each separate trait.
However, the lm_twoqtl
program expects selection of one trait
with either one or two contributing tlocs.
set trait K1 for tlocs L1 L2 joint genotype means X11 X12 X13 X21 X22 X23 X31 X32 X33
This statement specifies the 9 genotypic means (3x3 matrix) for tlocs L1 and L2 in contributing to trait K1. The first index on X refers to the L1 genotype and the second to L2.
use [exact|MC] summation for trait
This statement specifies whether exact or Monte Carlo (MC) will be used
by lm_twoqtl
for computation of the trait contribution to the
lod score. Exact summation can be used only on pedigrees with six or fewer
founders.
use I MC realizations for trait
If MC summation is to be used, this statement specifies the number of realizations of tloc inheritance realizations to be used. If MC summation is not to be uased, this statement is ignored.
use multiplier I MC realizations for null
This statement specifies the number of time as many realizations are to be used in estimating the base-line unlinked lod-score. To obtain accurate lod-score estimates it is important this value is accurate, and it may therefore be advisable to use more realizations.
See Concept Index for: location lod scores computational parameters, pseudo-prior.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Please see that section for details regarding:
use (locus-by-locus sampling | sequential imputation) for setup use I sequential imputation realizations for setup set MC iterations I set burn-in iterations I sample by (scan | step) set L-sampler probability X check progress I MC iterations |
As with the Autozg programs, the number of desired MC iterations must be specified, as there is no default value.
set MC iterations I
lm_linkage
, the total MCMC run length is the sum of the number of burn-in
iterations and main iterations. For lm_bayes
, the total MCMC run length
is the sum of the number of burn-in, pseudo-prior (see below) and main
iterations.
Additional statements for lm_bayes
include the following:
set pseudo-prior iterations I
lm_bayes
performs iterations to calculate the
pseudo-priors. These pseudo-priors are used to encourage the MCMC sampler to
visit test positions of low posterior probability. The default number of
iterations to compute pseudo-priors is 50% of the number of main iterations
specified in the `set MC iterations' statement.
set sequential imputation proposals every I iterations
lm_bayes
's pseudo-prior and main MCMC iterations.
It allows the MCMC chain to "restart" every Ith iteration. Sequential
imputation is used to propose potential restart configurations which are
accepted/rejected with Metropolis-Hastings probability.
set test position window I
lm_bayes
statement specifies the window size for the
proposed tloc position update in the
Metropolis-Hastings algorithm. I is the number of hypothesized trait
positions on either side of the current position, with equal weight given to the
2*I + 1 trait positions. The default is window size is 6.
See Concept Index for: location lod scores MCMC parameters and options, MC iterations, burn-in, sequential imputation proposals.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |