11. Estimating Location lod Scores by MCMC

See Concept Index for: location lod scores estimates.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.1 Introduction to `lm_linkage`, `lm_bayes`, `lm_twoqtl`, `gl_lods` and `base_trait_lods`

The programs lm_linkage, lm_bayes, gl_lods and lm_twoqtl are referred to as ‘Lodscore’ programs. The program lm_linkage replaces the two programs lm_markers and lm_multiple of pre-2011 versions of MORGAN. As of 2013, the program lm_twoqtl remains a beta test version, but the new program gl_lods is now fully implemented and documented.

The Lodscore programs use MCMC to perform multipoint linkage analysis and trait mapping on large pedigrees where many individuals may be unobserved and exact computation is infeasible. The data are the genotypes of observed individuals in the pedigree at marker loci and discrete or continuous trait data. As with exact methods of computing lod scores, the genetic model is assumed known. The only unknown parameter is the location of the trait locus. Therefore, the user is required to specify the marker locations, trait and marker allele frequencies and penetrance function. Presently, users are limited in their choice of penetrance function, but this is under revision and will change in future releases of MORGAN.

lm_linkage is an implementation of the Lange-Sobel estimator, using either the single- or multiple-meiosis LM-sampler: See Single and multiple meiosis LM-samplers. The Lange-Sobel estimate works reasonably well in reasonable time, provided a good MCMC sampler is used, and provided the trait data do not have strong impact on the conditional distribution of meiosis indicators. The lm_linkage program samples only the meiosis indicators at marker loci, and only conditional on the marker data. Even when the trait inheritance information is strong, the method can produce quite accurate lod scores in the absence of linkage, but it can be inaccurate in estimating the strength of linkage signals. As well as producing the lod score, our current implementation provides a batch-means pointwise estimate of the Monte Carlo standard error of the lod-score estimate. lm_linkage can work with genotypic, discrete or quantitative traits.

lm_linkage combines the earlier programs lm_markers and lm_multiple. The original lm_multiple program and multiple-meiosis sampler are the work of Liping Tong [TT08]. As well as allowing use of either the single- or multiple-meiosis LM-sampler, the lm_linkage program optionally perform exact lodscore computations on small pedigree components, and includes better exact computation and pedigree peeling options for use in the lod score estimator (see Exact HMM computations).

lm_bayes is an alternative method implemented for genotypic or discrete traits. The MCMC performance is better than for the old lm_markers program, but it has other computational overheads. lm_bayes samples trait locations from a posterior distribution, and then divides it by the prior to produce the likelihood and hence the lod score. Estimation is in two phases. A preliminary run with discrete uniform prior gives order-of-magnitude relative likelihoods. Then, using the inverse of these likelihoods as prior weights of a ‘pseudo-prior’ distribution. Using this ‘pseudo-prior’ a second run is made to estimate the likelihood. The purpose of the ‘pseudo-prior’ is to produce an approximately uniform posterior, so that likelihoods will be well estimated at all test positions. It is important that the initial run is long enough for all test positions to be sampled, and for the unlinked trait position to have a reasonable number of realizations. For locations at which lod scores are very negative, or for the unlinked position when there is some linked location with strong positive lod score, this can be problematic.

Our current implementation of lm_bayes provides two lod score estimates. The first is a crude estimate which counts realizations of locations sampled to estimate the posterior: as can be seen from the output this can be quite erratic. The Rao-Blackwellized estimator is much preferred, and produces good estimates in reasonable time. The lm_bayes program is the work of Andrew George [GT03,GWT05].

The beta-test program lm_twoqtl does parametric linkage analysis for a quantitative trait model having one or two linked QTL and a polygenic component. Each QTL has two alleles with three different genotypic means. The Normally distributed polygenic component does not include dominance, and the environmental contribution is has a Normal distribution with mean zero and uncorrelated among individuals. The program output consists of MCMC-based lod score estimates of the joint locations of the one or two contributing QTL. As of 2011, the program uses the same MCMC options as lm_linkage for sampling descent at marker loci conditional on marker data. Conditionally on these realizations the program then uses exact computation (on very small pedigrees) or an additional level of Monte Carlo to estimate the relevant lod score contributions. The original versions of the lm_twoqtl were the work on YunJu Sung [STW07,SW07].

The program gl_lods computes lod score contributions for a discrete or continuous trait given a set of ibd_graphs across the chromosome, produced by gl_auto: See Introduction to lm_auto gl_auto and lm_pval. If the gl_auto run uses the ‘set MCMC markers only’ option, then the overall lod score computed by gl_lods is identical to that produces by lm_linkage when the same MCMC options are used in the in gl_auto and in lm_linkage. gl_lods many of the same trait definition and mapping request parameter statements as lm_linkage (Location lod scores statements). However its input consists of ibd_graphs and an individuals file; there are no marker data or pedigree data. For further information on the motivation for splitting of the lm_linkage lod score computation into the generation of marker-based ibd graphs (using gl_auto) followed by trait-likelihood computation on the ibd graphs: See Parameter files for the gl_lods program. See also [Tho11].

Because the gl_lods program computes log-likelihood contributions for given IBD, and does not use a pedigree, there is no way to compute the usual normalizing factor of the pedigree-based probability of observed trait data. This must be supplied to the program, and may be computed via the base_trait_lods program, provided a pedigree and the same trait model and data are used. An alternative approach using permutation of trait data conditional on the locus-specific IBD was developed by Chris Glazner [GT15] and has been implemented as an option in gl_lods. For quantitative data using variance component models, we are investigating the used of the Kullback Leibler information (expected lod score) conditional on the locus-specific IBD as a locus-specific normalizing factor.

See References, for details of the cited papers.

See Concept Index for: Markov chain Monte Carlo, lm_linkage introduction, lm_bayes introduction, lm_twoqtl introduction, meiosis indicators, multiple meiosis sampler.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.2 Sample parameter files for `lm_linkage` and `lm_bayes`

There are three example parameter files in the ‘Lodscores’ subdirectory: ‘ped73_ge.par’, ‘ped73_ph.par’ and ‘ped73_qu.par’. These files are examples of how to analyze genotypic, discrete (phenotypic), and quantitative (continuous) traits, respectively. Each of these files is written for use with lm_linkage since this is our preferred program and can analyze genotypic, discrete, and quantitative traits. The program lm_bayes will run with the same parameter files ‘ped73_ge.par’ and ‘ped73_ph.par’, but will adopt defaults for several statements specific to this program and will generate warning for others not implemented for lm_bayes. If lm_bayes is run using ‘ped73_qu.par’, all statements regarding quantitative traits will be ignored, and the program will use default genotypic data.

The marker and MCMC information is very similar for all three parameter files. For ‘ped73_qu.par’ it is as follows:

set printlevel 5

input pedigree file '../ped73.ped'
input marker data file '../ped73.marker.missing'
input seed file         '../sampler.seed'
output overwrite seed file        '../sampler.seed'

set trait 1 data quantitative
input pedigree record trait 1 real 1

select all markers 
select trait 1
set trait 1 tloc 1
map test tloc 1 all interval proportions 0.3 0.7
map test tloc 1 external recomb fracts   0.05 0.15 0.3 0.4 0.45 

sample by scan                    
set L-sampler probability 0.2

set burn-in iterations 150
set MC iterations 3000           
check progress MC iterations 1000

set global MCMC
use single meiosis sampler

The pedigree file specified by the ‘input pedigree file’ statement can contain multiple traits. As discussed in previous sections, the marker map, allele frequencies and genotypes can be contained in the parameter file or in a separate file specified by the ‘input marker data file’ statement as in the example above.

As in other programs, the trait data are included in the pedigree file. The ‘select trait’ statement tells the program which trait in this file is to be analyzed, and the ‘input pedigree record trait’ indicates where the data are to be found, while the ‘set trait ...tloc...’ statement connects the trait with a specific tloc for this analysis.

The two ‘map test tloc’ statements give trait locus test positions at which the lod scores should be calculated. When the trait locus is located between two markers, the position is specified in terms of the proportional genetic distance between the two markers (this option makes handling gender-specific maps easy). In this example, the test trait positions are specified to be at 30 and 70 percent of the interval. The second ‘map test tloc’ statement allows test trait locus positions located before the first marker or after the last marker to be specified; the positions are specified explicitly in terms of recombination fractions (or genetic distances) with the nearest marker locus. Note that an external recombination fraction of 0.5 is not necessary since the likelihood of an unlinked trait locus is always used as a reference when computing the lod scores.

The final seven statements give MCMC specifications. The ‘sample by scan’ statement instructs the program to update all the meiosis indicators, S, at each iteration, in an order determined by random permutation. The alternative ‘sample by step’ updates only one locus (L-sampler) or only one meiosis (M-sampler) in each iteration. The ‘sample by scan’ statement is the default and strongly recommended. The L-sampler probability is set at 20 percent, which is often a good choice. For a detailed discussion of effects of varying L- to M-sampler ratio, see section 10.6 in [Tho00].

In the ‘set burn-in iterations’ statement, 150 burn-in iterations, are requested. The next statement requests 3000 MCMC iterations; for each realized set of marker-location inheritance vectors the trait-likelihood contribution will be computed at each test position of the trait locus. This is for demonstration purposes only. For real data analyses, use longer runs, on the order of 10^5 MCMC iterations. The last statement in this group tells the program to report progress every 1000 iterations.

Although the lm_linkage program can use the multiple-meiosis sampler, and this is recommended, the final two statements here specify ‘set global MCMC’ and ‘use single meiosis sampler’. Thus, for this example, the single-meiosis sampler will be used (as in the old lm_markers program) and MCMC will be performed globally over all pedigree components, rather than component-by-component. This provides an example of how these options may be used for compatibility with older examples.

As an example of MORGAN parameter statement flexibility we give also a part of a version of the above parameter file in which lines are broken, words mis-spelled, and lower/upper case used arbitrarily::

#  This version of ped73_qu.par is designed to show the flexibility
#     of MORGAN parameter statements

set prinlev 5

inPU Pedi file '../ped73.ped'
input mark 
data FILE '../ped73.marker.missing'
input seed file         '../sampler.seed'
outp over 
seed file        '../sampler.seed'

set trait data 1 quant
Input pedigree Recod trait 1 real 1

sele traix 1
set trait 1 
tloc 1
map test tloc 1 all intevl props 0.3 0.7
map test tloc 1 exte reco frac 0.05 0.15 
0.3 0.4 0.45 
seLect 
all markers 

samp BY scan       
set L-sam prob 0.2
set 3000 mc ITER 
set burn-in iter 150
check progss 1000 MC iters 
set glob mcmc
USE singe meio samp

Note that statements may be split over lines, that only the first four characters of each word is required, that upper and lower case are not distinguished, and that in some statements (for example, the ‘check progress’ and ‘set MC iterations’ statements) even the location of the integer count within the statement is flexible.

For more details of the MCMC specifications see MCMC parameter statements.

Specifying Trait Data Type

Trait data type is set by using the ‘set trait data’ statement. Recall that the ‘input pedigree record trait’ statement must be used to specify which column in the file is to be used as the trait value (see Pedigree file description statements). The three trait data types discussed in this example are implemented by including the following statements in the parameter file discussed above. Note the trait and numbers are arbitrary, but the connection must be made consistently through the file.

‘ped73_ge.par’ specifies a genotypic trait with the following statements:

set trait 3 data genotypic
input pedigree record trait 3 integer 3

select trait 3
set trait 3 tloc 1
set tloc 1 allele freqs 0.4  0.6

‘ped73_ph.par’ specifies a phenotypic trait with the following statements:

set trait 2 data discrete
input pedigree record trait 2 integer 4

select trait 2
set traits 2 tlocs 1
set traits 2 for tlocs 1 incomplete penetrance 0.05 0.6 0.95
set tlocs 1  allele freqs 0.4  0.6

Recall that for discrete data, one must specify the penetrances (see Autozyg computational parameters).

‘ped73_qu.par’ specifies a quantitative trait with the following statements:

set trait 1 data quantitative
input pedigree record trait 1 real 1

select trait 1
set trait 1 tloc 1
set trait 1 for tlocs 1 genotype  mean 90.0 100.0 110.0
set trait 1 residual variance 25.0
set tloc 1 allele freqs 0.4 0.6

When using a quantitative trait, genotypic means and residual variance must be specified. Additive variance can be specified with the statement ‘set trait ... additive variance’. The default value is zero.

The ‘set tloc ... allele freqs’ statement specifies allele frequencies at the trait locus. If the allele frequencies sum to less than 1, a warning message will be issued:

Sum of allele frequencies is not in range .9999, 1.0001  (W)

If the allele frequencies sum to above 1.0001, the program quits and generates an error message.

Below is a summary of the trait data types accepted for each program:

	Genotypic ped73_ge.par	Phenotypic ped73_ph.par	Quantitative ped73_qu.par
lm_linkage	Yes	Yes	Yes

lm_bayes	Yes	Yes	No

Additional penetrance options are available for liability classes and age-of-onset data.

An example is given in the file ‘xact_ph_liab.par’ in the ‘Lodscore’ Gold standards. The statement gives the liability class specific penetrance matrix. The penetrances are for the 11, 12, 22 genotypes. Morgan will set the penetrances for genotype 21 equal to that for 12.

set trait data 1 discrete with liability
input pedigree record trait 1 integer pairs 8 11

set 3 liability classes with penetrances
         0.025 0.325 0.325
         0.150 0.625 0.625
         0.350 0.950 0.950

An example is given in the file ‘xact_ph_age.par’ in the ‘Lodscore’ Gold standards:

input pedigree record trait 1 integer pairs 8 10  # For using ages of onset.
set trait 1 data discrete with covariate

set trait 1 for tloc 1 genotypic means  90.0 70.0 45.5
set standard deviations for genotypes   11.0 15.5 20.5

set width of ages of onset window  2.0

See Concept Index for: sample parameter file for lm_linkage, sample parameter file for lm_bayes, gender–specific maps, meiosis indicators, L-sampler, M-sampler, multiple meiosis sampler, genotypic trait specification, phenotypic trait specification, discrete trait specification, quantitative trait specification, continuous trait specification, liability class penetrances, age-of-onset penetrances.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.3 Running `lm_linkage` examples and sample output

lm_linkage can be run with all three parameter files in the ‘Lodscores/’ subdirectory. As usual, the syntax for running the program is:

./lm_linkage <parameter file>

This section describes the output obtained by using the parameter file ‘ped73_qu.par’. To run the example, type:

./lm_linkage ped73_qu.par

The interesting part of the output is the LodScore estimates. For each test position, we have the estimated lod score and the estimated Monte Carlo standard error.

LodScore estimates by Rao-Blackwellized computation:
 
 Trait pos #     position (Haldane cM)
   or marker        male     female       LodScore    StdErr

           1    -115.129   -115.129         0.0303    0.0005
           2     -80.472    -80.472         0.0558    0.0012
           3     -45.815    -45.815         0.0779    0.0031
           4     -17.834    -17.834        -0.0306    0.0080
           5      -5.268     -5.268        -0.2811    0.0142
    marker-1       0.000      0.000        -0.4986    0.0195
           6       3.000      3.000        -0.4469    0.0141
           7       7.000      7.000        -0.4342    0.0230
    marker-2      10.000     10.000        -0.4605    0.0363
           8      13.000     13.000        -0.4254    0.0247
           9      17.000     17.000        -0.4454    0.0209
    marker-3      20.000     20.000        -0.5301    0.0197
          10      23.000     23.000        -0.3174    0.0211
          11      27.000     27.000        -0.1176    0.0233
    marker-4      30.000     30.000        -0.0052    0.0259
          12      33.000     33.000         0.5058    0.0208
          13      37.000     37.000         0.8794    0.0159
    marker-5      40.000     40.000         1.0772    0.0138
          14      43.000     43.000         0.9832    0.0156
          15      47.000     47.000         0.8432    0.0213
    marker-6      50.000     50.000         0.7210    0.0252
          16      53.000     53.000         0.6558    0.0256
          17      57.000     57.000         0.5140    0.0271
    marker-7      60.000     60.000         0.3522    0.0288
          18      63.000     63.000         0.0113    0.0225
          19      67.000     67.000        -0.5473    0.0123
    marker-8      70.000     70.000        -0.9543    0.0095
          20      73.000     73.000        -0.4578    0.0212
          21      77.000     77.000        -0.1866    0.0178
    marker-9      80.000     80.000        -0.1135    0.0116
          22      83.000     83.000         0.0888    0.0091
          23      87.000     87.000         0.3132    0.0064
   marker-10      90.000     90.000         0.4544    0.0071
          24      95.268     95.268         0.6010    0.0046
          25     107.834    107.834         0.6423    0.0028
          26     135.815    135.815         0.4017    0.0011
          27     170.472    170.472         0.1762    0.0003
          28     205.129    205.129         0.0758    0.0001

For more information regarding the MCMC parameters and diagnostic output, See MCMC computational options.

See Concept Index for: running lm_linkage examples, lm_linkage sample output.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.4 Running `lm_bayes` examples and sample output

Under the subdirectory ‘Lodscores/’, run the lm_bayes example on the discrete (phenotypic) trait data by typing:

./lm_bayes ped73_ph.par

The results from lm_bayes are the lod scores toward the end of the output. Two estimates of the lod scores are provided: (1) count realizations of locations sampled to estimate the posterior probability (‘crude’) and (2) Rao-Blackwellized estimator (‘R-B’). Both are provided for comparison, but the latter should be more accurate.

 LodScore estimates:

 Trait pos #  position (Haldane cM)  pseudo  freq        LodScore    
   or marker      male    female     prior  visited   crude     R-B

           0  unlinked  unlinked   0.025023    94        NA        NA
           1  -115.129  -115.129   0.025276    66   -0.1580   -0.0046
           2   -80.472   -80.472   0.025727    77   -0.0987   -0.0125
           3   -45.815   -45.815   0.027843    96   -0.0372   -0.0473
           4   -17.834   -17.834   0.037973    71   -0.3030   -0.1825
           5    -5.268    -5.268   0.057289    96   -0.3506   -0.3583
    marker-1     0.000     0.000         NA    NA        NA        NA
           6     3.000     3.000   0.078826    89   -0.5221   -0.4919
           7     7.000     7.000   0.086379    88   -0.5667   -0.5255
    marker-2    10.000    10.000         NA    NA        NA        NA
           8    13.000    13.000   0.092502    87   -0.6014   -0.5456
           9    17.000    17.000   0.090858    94   -0.5600   -0.5386
    marker-3    20.000    20.000         NA    NA        NA        NA
          10    23.000    23.000   0.063483   109   -0.3400   -0.3738
          11    27.000    27.000   0.044111   103   -0.2065   -0.2086
    marker-4    30.000    30.000         NA    NA        NA        NA
          12    33.000    33.000   0.026053   114    0.0663    0.0203
          13    37.000    37.000   0.018403   103    0.1731    0.1698
    marker-5    40.000    40.000         NA    NA        NA        NA
          14    43.000    43.000   0.011818   100    0.3527    0.3585
          15    47.000    47.000   0.009347    90    0.4088    0.4600
    marker-6    50.000    50.000         NA    NA        NA        NA
          16    53.000    53.000   0.010351   121    0.4930    0.4236
          17    57.000    57.000   0.014614   121    0.3432    0.2804
    marker-7    60.000    60.000         NA    NA        NA        NA
          18    63.000    63.000   0.023348    96    0.0392    0.0769
          19    67.000    67.000   0.030506   123    0.0307   -0.0412
    marker-8    70.000    70.000         NA    NA        NA        NA
          20    73.000    73.000   0.033357   136    0.0356   -0.0903
          21    77.000    77.000   0.030400   124    0.0358   -0.0514
    marker-9    80.000    80.000         NA    NA        NA        NA
          22    83.000    83.000   0.024811    96    0.0128    0.0282
          23    87.000    87.000   0.019535   144    0.2928    0.1160
   marker-10    90.000    90.000         NA    NA        NA        NA
          24    95.268    95.268   0.013755   110    0.3281    0.2561
          25   107.834   107.834   0.013714   125    0.3849    0.2600
          26   135.815   135.815   0.018372   132    0.2816    0.1339
          27   170.472   170.472   0.022361   108    0.1091    0.0489
          28   205.129   205.129   0.023966    87   -0.0149    0.0188

Note that lm_bayes does not provide lod scores at the marker locations.

See Concept Index for: running lm_bayes examples, lm_bayes sample output, Rao-Blackwellized estimates.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.5 Running `lm_twoqtl` examples and sample output

The program lm_twoqtl remains beta test, so that instead of examples in ‘MORGAN_Examples’ we describe the gold standard example in the main MORGAN source directory in subdirectory ‘Lodscore/Gold’. However, much work has been done on lm_twoqtl, so that its marker-based MCMC is now as for lm_linkage. Four gold-standard examples of the lm_twoqtl parameter files and output may be found in the ‘Lodscore/Gold’ directory of MORGAN. Examples 1 and 3 are for a single trait locus, and 2 and 4 use two QTL. Examples 1 and 2 use exact computation for the trait lod score contributions on these very small examples. Examples 3 and 4 use Monte Carlo. We will use example 4 in this tutorial description, since this is the most general and novel. To create other examples, copy one of these files and replace the parameters in the file with those that you want to specify.

The various trait-model options for lm_twoqtl are summarized in the following table:

Additive Genetic Variance:	Zero	Positive
Number of QTL: 1	one locus	one locus plus polygene
2	two loci	two loci plus polygene

Trait models can be any of the above four entries. However, for a one-locus trait model with no polygenic component, the program lm_linkage will provide more accurate results more quickly.

The lod score is estimated on a one-dimensional grid of points for one QTL, and a two-dimensional grid of points for two QTL. In the future the new parameter statement

map [chromosome  I]  test tlocs  L1 L2  jointly at markers J11 J12 ...

will allow two-locus lod score programs that provide lod scores at arbitrary pairs of marker positions.

The content of file ‘twoqtl4.par’ (reordered slightly for clarity) is:

use single meiosis sampler           # Select the MCMC sampler to be used.

set printlevel 5                     # Include everything in the output file.

set sampler seeds  0x53f78285 0xdfbca001
set trait   seeds  0x53f78285 0xdfbca001

input marker data file `./twoqtl.markers'
input pedigree file `./twoqtl.ped'
output extra file `./twoqtl_batch4'

select all markers
select trait 1

set trait 1 multiple tlocs 1 2
set tloc 1  allele freqs 0.1 0.9
set tloc 2  allele freqs 0.3 0.7

# requests for grid of tloc positions for lod scores
map test tloc 1 all interval proportions 0.5
map test tloc 1 external recomb fracts   0.3
map test tloc 2 all interval proportions 0.5
map test tloc 2 no default external positions

# standard MCMC requests
use sequential imputation for setup
use 100 sequential imputation realizations for setup
sample by scan
set L-sampler probability 0.2
set burn-in iterations 10
set MC iterations 60
compute scores every 10 iterations

# lodscore scoring requests
set 3 batches MC variance estimation
check progress 20 MC iterations
# For clarity, the wording `MC' has been removed from the following three  parameter statements:
#  MC in parameter statements now refers only to MCMC.
use Monte Carlo summation for trait
simulate 5 IBD realizations for trait
use multiplier 1 realization for null

# quantitative trait model specification
set trait 1 data quantitative
set trait 1 for tloc 1 genotype  mean  -2.0  0.0  2.0
set trait 1 for tloc 2 genotype  mean  -3.0  0.0  3.0
set trait 1 residual  variance  1.0
set trait 1 additive  variance  1.0

Note the number of MCMC scans (60) is very small, as also are the number of Monte Carlo realizations to be used in evaluating the trait likelihood contributions (5). Additionally, only every 10th MCMC scan is used for computing lod score contributions. This is reasonable, in that for lm_twoqtl lod score computation is computationally intensive, so that the standard procedure of scoring every scan is not efficient. However, with only 60 total scans, this means lod scores are based on only 6 realizations of inheritance conditional on the marker data. The example is for illustrative purposes only; in real examples much more Monte Carlo would be required both in the marker-based MCMC and for estimating trait contributions to each score.

Most statements are as for earlier lod-score programs and can be found in the Statement Index. The statements included in this example that require additional comment are

use single meiosis sampler

The old single-meiosis sampler is specified for consistency with earlier results; in practice the multiple meiosis sampler is preferred.

set trait 1 multiple tlocs 1 2

This statement specifies that trait 1 is contributed to by both tloc 1 and tloc 2.

set trait 1 for tloc 1 genotype mean -2.0 0.0 2.0

set trait 1 for tloc 2 genotype mean -3.0 0.0 3.0

The genotypic means for tlocs 1 and are set separately, which will imply their additive contribution to trait 1. If the tlocs are not to contribute additively, the user should instead use the statement

set trait 1 for tlocs 1 2 joint genotype means ... followed by 9 genotype means for the two tloc genotype combinations.

output extra file ‘./twoqtl_batch4’

If an ‘extra file’ is specified, it is used by lm_twoqtl for output of batched means used in variance estimation. Most users will not require this file, although it can be used in MCMC diagnostics.

set 3 batches MC variance estimation

In this minimal example, the 6 scored realizations are divided into 3 batches, each of size 2. Again, real examples would use much larger number of realizations, and likely the default number of batches (20).

use Monte Carlo summation for trait

simulate 5 IBD realizations for trait

In this example, Monte Carlo summation is to be used for evaluating each trait-locus likelihood contributions conditional on marker-based realizations of inheritance, and for each such realizations there will be 5 realizations of trait allele descent.

use multiplier 1 realization for null

In this example the same number of realizations will be used to evaluate the marginal probability of trait data as are used for each lodscore grid point. In real examples, it may be advisable to increase this ratio, to obtain an accurate base level for the lodscore estimate.

The default procedure of estimation of lod scores on each of the two components separately is used. These are then summed, giving the final concluding output for this example:

# Lod score estimates and MC sd for entire pedigree:

# Index     TLoc1       TLoc2     LodScore    StdErr

    1     -45.815     -45.815       1.4058    0.7605
    2     -45.815       0.000       0.3872    0.6212
    3     -45.815      10.000       0.4379    0.6232
    4     -45.815      20.000       0.6616    0.6856
    5     -45.815      65.815       0.4661    0.5916
    6       0.000     -45.815       0.3503    0.6529
    7       0.000       0.000       0.1343    0.6034
    8       0.000      10.000      -0.0129    0.5600
    9       0.000      20.000       1.0364    0.5772
   10       0.000      65.815       1.1077    0.7724
   11      10.000     -45.815       0.0983    0.6902
   12      10.000       0.000       0.0461    0.5605
   13      10.000      10.000       0.8585    0.6088
   14      10.000      20.000       1.2174    0.5866
   15      10.000      65.815       0.3512    0.7539
   16      20.000     -45.815       1.5180    0.6479
   17      20.000       0.000       0.7387    0.6567
   18      20.000      10.000       0.9330    0.7020
   19      20.000      20.000       1.2473    0.6162
   20      20.000      65.815       1.3465    0.7664
   21      65.815     -45.815      -0.3066    0.7764
   22      65.815       0.000      -0.1259    0.6895
   23      65.815      10.000       0.5714    0.7124
   24      65.815      20.000       1.3537    0.6550
   25      65.815      65.815       0.5343    0.7140

These results consist of base-10 lodscore estimates with MCMC standard deviations, estimated at the requested grid of test positions.

See Concept Index for: running lm_twoqtl examples, lm_twoqtl sample output, map test tlocs jointly at markers.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.6 Parameter files for the `gl_lods` program

See References for details of the cited papers.

The program gl_lods computes lod score contributions for a discrete or continuous trait given a set of ibd_graphs across the chromosome, produced by gl_auto: See Introduction to lm_auto gl_auto and lm_pval. If the gl_auto run uses the ‘set MCMC markers only’ option, then the overall lod score computed by gl_lods is identical to that produced by lm_linkage when the same MCMC options are used in the in gl_auto and in lm_linkage. gl_lods many of the same trait definition and mapping request parameter statements as lm_linkage (Location lod scores statements). However its input consists of ibd_graphs and an individuals file; there are no marker data or pedigree data.

The goal of using gl_auto and gl_lods is to separate the lod score computation from the marker-based MCMC that produces realizations of the inheritance vectors at loci across the chromosome. The input to gl_lods consists of these realizations, in the format of gl_auto output compressed ibd_graphs, a specification of a trait model, and a list of individuals with their trait data. gl_lods computes lod score contributions for a specified trait model and data on specified individuals, directly using the ibd graphs on these individuals. See also [Tho11]. From MORGAN 3.2. the gl_lods program is no longer beta-test; parameter statements, capabilities, and examples have been significantly updated.

The separation of lod-score computation and marker-based MCMC has several advantages:

Lod score contributions are computed for each ibd graph, providing a distribution ("fuzzy lod") over the ibd-graph realizations.
Use of IBDgraph software to identify equivalent ibd graphs across realizations and across markers, will enable lod score contributions to be computed once only for each distinct ibd graph;
Lod scores for many different genetic models and for different traits may be computed on the same set of ibd graphs.
The pedigree and marker information may be separated from the trait data. This increases data confidentiality in multi-center genetic studies.

Examples for gl_lods are still under development, but we describe here two examples based on the gl_lods gold standards in the ‘Lodscore/Gold’ subdirectory of MORGAN. In the case of the first ped47_... example the files are only in that directory and can be run there as ‘make gold.6’. The other example simCmps_fgl_... is both included as a gold standard, and has also been added to the ‘MORGAN_Examples’ files.

The ped47 example: The ped47 example is a single pedigree component. The following shows the parameter file ‘ped47_gl_lods.par’. The trait model specifications for the two cases of a discrete and a continuous trait are given in supplementary parameter files below.

# ped47 main parameter file for gl_lods program.

set printlevel 5             # Include everything in the output file.

input individuals file        "./ped47.ind"
input extra file              "./ped47_fgl.oscor"
output overwrite extra file  "./ped47_fgl.reduce"  
# Use the overwrite option to avoid appending multiple outputs.

# The number of ibdgraphs provides the number of replicates in gl_auto file.
set maximum 1000 ibdgraphs per component

# This map test tloc statement indicates at which markers lodscores are to be
# calculated.
map test tloc 11 at markers 2 4 5 8 10

select trait 1
set trait 1 tloc 11
set tloc 11 allele freqs 0.5 0.5

Note that the program uses an individuals file, not a pedigree file. The format is similar, but mother and father identifiers are replaced by a component number (in this case ‘1’ for all individuals). See Creating an individuals file. Note that the component number is referred to as a "name", so that the columns of integers and reals in the file correspond to those in an analogous pedigree file. The individuals file ‘ped47.ind’ is as follows:

# Lines or parts of lines preceded by a hash sign, such as this one,
#    are treated as comments by the software.
# This individuals file contains a subset of the 47 individuals in the gl_auto
#   output file; the subset must include all those whose trait data are
#   to be analyzed -- it may include additional individuals.

input pedigree size 42
#  pedigree size must be the number of individuals in the fgl-graphs of the
#    gl_auto output file.
#  The number read here may be only a subset; if so make sure file ends
#    with the last individual.

input individuals record names 2 integers 7 reals 1

#  The first item is individual's "name" which is a character string.
#     The second of these is individual's component.
#
#  There follow 7 integers
#     The first of these (third item) is dummy gender (0)
#     The next is an "observed" indicator used only by genedrop.
#     The next is a trait genotype:
#          0 is unobserved. 1 is genotype (1,1); 3 is (1,2), 4 is (2,2)
#     The next is a trait phenotype:
#          0 is unobserved. 1 is unaffected, 2 is affected
#     The next 2 code trait-locus inheritance patterns, used by 1 version of
#          markerdrop.
#     The final is a dummy trait indicating data for reduced fgl file
#         (but it is not required/used by the gl_lods program)
#
#  Finally there is one real (read as double); this is a quantitative trait.
#     Numbers with integer part 999 code for unobserved.
#
#  In fact, many individuals with no data are dropped; 
#  This individuals  file has only 35 individuals. 
#  Since 35<42 (the requested pedigree size), all will be read).
***************************************************
302   1  1  1 3 2  1  1 1  105.945
306   1  2  1 3 1  1  1 1  99.822
307   1  0  1 4 2  1  1 1  111.696
308   1  0  0 0 0  1  1 0  999.5
<lines omitted here>
5080  1  0  0 0 0 -1 -1 0  999.5
601   1  0  1 4 2  1  1 1  112.285
5160  1  0  1 3 1 -1 -1 0  97.043
5150  1  0  1 3 1 -1 -1 1  97.043
602   1  0  1 4 2  0  1 1  105.991

Note that the input/output extra file is used to process the file of IBD graphs that were produced (normally by gl_auto) in the analysis of marker data. In this example the file is a gl_auto output scores file, input here as an extra file: ‘ped47_fgl.oscor’. This file contains 100 ibd graphs on 47 individuals:

  101, 102, 201, 202, 2010, 301, 302, 304, 2020, 305, 306, 307, 308,
 3010, 404, 3040, 405, 406, 407, 408, 3050, 410, 411, 412, 3080, 414,
  415, 416, 4040, 505, 506, 507, 508, 4050, 509, 510, 511, 512, 4080,
  513, 514, 515, 516, 5080, 601, 5150, 602.

One reason for including this example is to show how the program deals with discrepancies between the list of individuals in the individuals file and those in ibd graphs file. Comparing with the 35-member individuals file:

5160 is in the individuals file, not in the output scores file; will be dropped;
about 11 individuals are in the above list, but not the individuals file; these will be assumed unobserved, and dropped.
another 7 individuals are in the individuals file and the above list, but the trait data indicates them as unobserved; they will be dropped.

There will remain 27 individuals who will be in the reduced ibd graph file created by gl_lods.

Finally we require a trait-model specification; lod scores are computed under this model. Note that the input pedigree record statement is used for the individuals files just as it would be for a pedigree file. The example file ‘ped47_D.par’ provides a provides the model for a discrete trait:

     output overwrite scores file  "./ped47_gl_D.scores"
     set trait 1 data discrete
     input pedigree record trait 1 integer 4
     set trait 1 for tloc 11 incomplete penetrances 0.1 0.6 0.9

The file also specifies the output scores file ’‘./ped47_gl_D.scores’’: the log-likelihood contributions from each IBD graph will be saved to this file.

Alternatively, the example file ‘ped46_Q.par’ provides the models for a quantitative trait:

     output overwrite scores file  "./ped47_gl_Q.scores"
     set trait 1 data quantitative
     input pedigree record trait 1 real 1
     set trait 1 for tloc 11 genotype means 90.0 100.0 110.0
     set trait 1 residual variance 25.0
     set number of permutations 5

This example differs from the previous one, in that a permutation-based scheme is used to provide a locus-specific base lod score conditional on the IBD at that locus: see [GT15].

The simCmps example: The program gl_lods will also run on data sets containing multiple components. Provided the collection of ibd graphs are generated by component, the gl_lods program will compute a lod score for each component, as does lm_linkage. This example is based on a pedigree file and data originally used as an lm_linkage gold standard. It consists of three almost identical pedigree components with very similar data. The date in ‘SimCmps.ind’ used as a gold standard in the MORGAN ‘Lodscore/Gold’ directory differs slightly from that in the same filename under ‘MORGAN_Examples/Lodscores’, and so also therefore do the output lod scores, but functionally the two are identical. The parameter file for gl_lods is as follows:

set printlevel 5

input individuals file               "./simCmps.ind"

# files for processing IBD graphs
input extra file                     "./simCmps_fgl.oscor"
output overwrite extra file         "./simCmps_fgl.reduce"

# File for outputting the log-likelihood contributions
output overwrite scores file         "./simCmps_gl.scores"

# The number of ibdgraphs provides the number of replicates in the gl_auto
# output scores file.
set maximum 100 ibdgraphs per component

# This map test tloc statement indicates at which markers lodscores are to be
# calculated.
map test tloc 11 at markers 1 3 5 6 9 10

select trait 1
set trait 1 data quantitative
input pedigree record trait 1 reals 1

set trait 1 tloc 11
set tloc 11 allele freqs 0.3 0.7

set trait 1 for tloc 11 genotype  means 485.0 500.0 515.0
set trait 1 residual  variance  10.0
set trait 1 additive  variance  10.0

# Provide externally computed base log likelihoods for each component.
set base log likelihoods -24.56078 -26.30071 -24.85028

The parameter statements are mostly standard specifications familiar from lm_linkage. There are two main differences: (1) A file of ibd graphs and an individuals file are specified rather than pedigree file and marker data, and (2) in order for gl_lods to convert its log-likelihood contributions to an estimated lod score a "base log likelihood" must be provided for each component. This should be the probability of the trait data under the model; this probability forms the denominator of the lod score. It may be computed using the base_trait_lods program: See Running the base_trait_lods program.

Similarly to the previous example, the simCmps.ind file contains the (potential) trait data on 156 individuals (3 sets of 52); in fact only a total of 55 are observed for the quantitative trait. The file simCmp_fgl.oscor contains 100 ibd graphs on the 159 individuals of the original lm_linkage simCmps example; this includes the 156 individuals of the current example. Note that the ordering of the IBD graphs is by component– first 100 graphs for the first component simL_.., then 100 for simJ_.., and finally 100 for simK_...

See Concept Index for: ibd graph, base log likelihood, individuals file.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.7 Running `gl_lods` examples and sample output

ped47 example: We show first some features of the ped47 example, which exists only in the gold standards files. The gl_lods gold standards may be run as make gold.6 in ‘MORGAN/Lodscore/Gold’ or the output ped47_gl_lods_[D/Q].gold files may be consulted. We describe here the output file ‘ped47_gl_lods_D.gold’.

The program first summarizes the input information in the usual way:

 Trait data are discrete

 Discrete data at trait locus are to be read from integer 4
    in each individual's record

 Incomplete penetrances for the trait are:
        genotype  code  penetrance
        --------  ----  ----------
           (1 1)     0       0.100
    (1 2), (2 1)     1       0.600
           (2 2)     2       0.900


 Number of IBD graphs requested is 1000

 Number of batches for MC variance estimation is 20

 === Checking of parameter statements completed ===

Note that the batches for MC variance is the default as no value was provided in the parameter file: this statement is not used by gl_lods.

Next the information from the individuals file is checked, analogous to pedigree file checking. Since there is no pedigree, only any input genders will be scored. Also note that we specified (up to) 42 individuals, but there are only 35 in the file.

 Opened input individuals file "./ped47.ind"

 35 individuals read

 42 individuals expected  (W)

 Component information:

    size  females    males  unsexed    first

      35        1        1       33      302

 === Checking of individuals file completed ===

Then follows the usual MORGAN summary of the phenotypes of individuals:

 Opened input individuals file "./ped47.ind"

 Pedigree records contain:  names, 7 integers, 1 real number

 Trait data:

 Component 1:

    phenotype 2:
    302 307 406 407 408 411
    414 416 505 507 508 511
    512 513 514 516 601 602

    phenotype 1:
    306 404 410 412 415 506
    509 510 5160 5150

Finally it reports on the processing of the ibd graphs, including the result of the IBDgraph determination of equivalence classes. It also find the max number of FGL it will need: 26 in this example. It also provides ‘NLoci’; one more than the number of locations for which log-likelihoods will be computed.

 Number of individuals in IBD graphs file is 47
 1000 graphs were requested, but only 100 were given (W)

 Observed individual 5160 is not in the input IBD graphs file:
   The trait data on 5160 will be ignored (W)

 Number of individuals in nghd structure is 35
 Have alloced gen_pen nFGL=26, NLoci=6

 Opened output scores file "./ped47_gl_D.scores"

 ====== Computing LodScore for component 1 ======

 Number of observed individuals in each IBD graph for component 1 is 27
 Grouped 1292 locations into 988 equivalence classes

 Log likelihoods for 100 ibd graphs at markers
   2  4  5  8  10
 will be printed to the output scores file

Then the program produces the log-likelihood contributions at each of the 5 marker locations for each of the 100 reduced ibd graphs. Note the final warning:

gl_lods is not able to provide lod scores unless the user provides a base log
likelihood using the "set base log likelihood X1 ..." parameter statement (W)

In order to produce lod scores, rather than only log-likelihoods a base value is needed – corresponding to the marginal probability of trait data in the normal lod score computation. Since gl_lods does not have pedigree information, it is unable to compute this value, and it must be supplied via a parameter statement. It can be computed by running base_trait_lods. Note also that the log-likelihood contributions are base-e. The base log-likelihood to be input, and the final lod scores produced, are base-10.

For the quantitative trait data and model, the gold-standard output file is ‘ped47_gl_lods_Q.gold’. The format of this file is identical to that for the discrete trait, except that now the data and trait model are for a quantitative trait. Additionally this example uses a permutation-based approach to produce the analogue of a lod score; see [GT15]. The gold standard uses only 5 permutations; in practice many more would be used.

 Permutation lod scores will be computed

   Version 1 of permutation lods

 Component 1 lod scores found by gl_lods:

 Marker number      LodScore    StdErr

      marker-2        2.0274
      marker-4        2.7382
      marker-5        3.8031
      marker-8        1.5405
     marker-10        2.4196

   Version 2 of permutation lods

 Component 1 lod scores found by gl_lods:

 Marker number      LodScore    StdErr

      marker-2        1.3984
      marker-4        1.9594
      marker-5        3.3204
      marker-8        1.2037
     marker-10        2.0212

This permutation approach is still in process of testing: two alternative estimates are presented. Note that although this approach produces a “LodScore” the interpretation is not as for a classical lod score; see [GT15]

simCmps example: This example is also a part of the gold.6 gold standard, but may also be run in the Lodscore subdirectory of ‘MORGAN_Examples’ using the command (Note that the ‘MORGAN_Examples’ version is from MORGAN version 3.3: it is slightly modified in the gold standards for version 3.3.2)
./gl_lods simCmps_gl_lods.par > simCmps_gl.out

The results in the output file ‘simCmps_gl.out’ are very similar to those for the previous example. The section summarizing the ibd graphs is

 Opened input extra file "./simCmps_fgl.oscor"

 Number of individuals in IBD graphs file is 159

 Number of individuals in nghd structure is 156
 Have alloced gen_pen nFGL=96, NLoci=7

 Opened output scores file "./simCmps_gl.scores"

Then for each component the log-likelihoods are computed, but this time base log-likelihoods were provided in the parameter file. The log-likelihoods are therefore converted to estimated lod scores at the 6 requested marker positions. For component 1:

====== Computing LodScore for component 1 ======

 Number of observed individuals in each IBD graph for component 1 is 19
 Grouped 1260 locations into 544 equivalence classes

 Lod scores for 100 ibd graphs at markers:
.......

 ====== Computing LodScore for component 1 ======

 Number of observed individuals in each IBD graph for component 1 is 18
 Grouped 1260 locations into 536 equivalence classes

 Log likelihoods for 100 ibd graphs at markers
   1  3  5  6  9  10
 will be printed to the output scores file


 Component 1 lod scores found by gl_lods:

 Marker number      LodScore    StdErr

      marker-1       -6.9215
      marker-3       -7.2357
      marker-5        3.1253
      marker-6        3.4153
      marker-9       -4.8130
     marker-10       -4.7849

This is then repeated for components 2 and 3. There is no Monte Carlo within this non-permutation version of gl_lods. However, even if the three pedigree components were identical, there is Monte Carlo variation in the generation of the ibd graphs by the gl_auto program.

See Concept Index for: base log likelihood.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.8 Running the `base_trait_lods` program

The base_trait_lods program provides the probability of trait data on a pedigree. (Previously, this information had to be extracted from a dummy run of the lm_linkage program.) The output base log-likelihoods may be input to gl_lods, enabling it to compute a classical normalized lod score; See Parameter files for the gl_lods program.

Typically the computation will be an exact single-locus computation by pedigree peeling. However, in the event the pedigree is too large or complex for exact computation a Monte Carlo alternative is provided. An example of each is provided in the ‘Lodscore/Gold’ directory, and they may be run as make gold.7.

Exact computation

The input for exact computation required any a pedigree, trait data, and a specification of the trait model

 input pedigree file  "./xact.ped"
 output scores file   "./bt_lods_pp.scores"

 input pedigree record trait 1 integer 8
 set trait 1 data discrete

 select trait 1
 set trait 1 tloc 15
 set tloc 15 allele freqs 0.5 0.5

 set trait 1 for tloc 15 incomplete penetrances 0.05 0.6 0.95

The output is self-explanatory: a base log likelihood is computed for each pedigree component. In this example there are two components, and the base log-likelihoods are also printed to the output scores file:

 1 202   -3.326957
 2 408   -1.563501

Note the name of an index individual is also given, so that components can be correctly identified. The base log-likelihoods can now be read directly from this file into the g_lods program.

Monte Carlo computation

Descent is simulated on the pedigree, and the realized IBD used to compute a Monte Carlo estimate of the base log-likelihood using the gen_pen routine. To avoid problems with underflow on large pedigrees, some preliminary “pseudo-prior” iterations are used to provide a crude base estimate. This is then used in each term, and factored out at the end of the computation. The only difference in the parameter file is that the specified sampler seeds are used, and there are two Monte Carlo requests:

 set sampler seeds  0x53f78285 0xdfbca001 # Replace by seed file for real runs.
 # input seed file "sampler.seed"
 # output overwrite seed file "sampler.seed"
 # Monte Carlo setup and requests

 set MC realizations 3000
 set pseudo_prior iterations 150

The output is again self-explanatory. Note that, in addition to a Monte Carlo standard error, the 5th and 95th percentiles of the log-likelihood contributions are provided. once the distribution of contributions is often highly skewed, the quantiles often provide a better measure of precision than does the standard error.

====== Base trait LodScore for component 1 estimated using Monte Carlo ======

 Now starting 150 preliminary realizations
 Crude log-likelihood from preliminary realizations = -8.047451e+00
 Beginning 3000 realizations of Monte Carlo
 Monte Carlo realizations completed

 Number of      Monte Carlo    Monte Carlo     5th %       95th %
 realizations   lod score      StdErr          quantile    quantile

       3000         -3.3298         0.0075      -4.0353     -2.8788

The estimated base log-likelihoods are printed to the output scores file as before. Note that in this very small example, there is almost no difference between the exact base log-likelihoods and the Monte Carlo estimates. However, in general this will not be so; where exact computation is feasible it is to be preferred.

 1 202   -3.329781
 2 408   -1.565421

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.9 Location lod scores statements

New statements for these programs include maps for test positions, and parameters for some additional MCMC algorithms.

See Concept Index for: location lod scores statements, gl_lods statements, lm_linkage statements, lm_bayes statements.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.9.1 Location lod scores computing requests

For the ‘select’ statement for your MCMC simulation, See Autozyg computing requests. Select all or some of the markers and ‘trait 1’, and map this trait to a ‘tloc 1’ (this is the trait locus to be assigned varying test positions).

See Concept Index for: location lod scores computing requests.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.9.2 Location lod scores file identification statements

All Lodscore programs use the general MORGAN file identification statements (see File identification statements) and the Autozyg rescue file statements (see Autozyg file identification statements).

One additional statement is optional for lm_bayes:

output Rao-Blackwellized estimates file: If this file is specified, the set of Rao-Blackwellized lod score estimates at each trait position is written at the frequency specified in the ‘compute scores’ statement.

The same standard out file specifications are available for lm_twoqtl. In particular:

output extra file: This statement is used by lm_twoqtl to output batch mean estimates used in computing the estimated Monte Carlo standard error.

See Concept Index for: location lod scores file identification statements, Rao-Blackwellized estimates.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.9.3 Location lod scores pedigree file description

Most Lodscore programs use the general MORGAN pedigree file description statements (see Pedigree file description statements). However the gl_lods program uses an individuals file rather than a pedigree file (see Individuals file).

The following statement used by gl_lods is the individuals-file analogue of the corresponding pedigree file statement:

input individuals record names 2 [integers I] [reals J]: The two "names" consist of the name and component number of the individual.

In addition, there is a statement optional for lm_linkage and gl_lods:

input pedigree record traits K1 K2 … reals X1 X2 …: This statement is analogous to ‘input pedigree record traits K1 K2 … integers I1 I2 …’ (see Pedigree file description statements) when the trait is quantitative, rather than discrete. It allows the file locations of multiple traits to be defined, although only one can be selected for any analysis.

See Concept Index for: location lod scores pedigree file description, quantitative trait.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.9.4 Location lod scores output file description

All Lodscore programs use the Autozyg output file description statements; See Autozyg output file description.

See Concept Index for: location lod scores output file description.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.9.5 Location lod scores mapping model parameters

See genedrop mapping model parameters, for statements specifying the genetic map for the markers.

The following statements describe the hypothesized trait locus (tloc) positions which are to be ‘tested’. That is, these are the positions at which lod scores will be computed.

map [chromosome I] [gender (M | F)] test tloc L1 all interval proportions X1 X2 …: Interval proportions specify the proportional genetic distance between markers for the trial positions for the test trait locus. The same ratios are used between each marker pair, regardless of the inter-genetic distance (in cM).
map [chromosome I] [gender (M | F)] test tloc L1 intervals J1 … proportions X1 X2 …: This statement specifies interval proportions, but between specific pairs of markers. Interval 1 is between markers 1 and 2, interval 2 is between markers 2 and 3, etc.
map [chromosome I] [gender (M | F)] test tloc L1 (beginning | ending | external) ([Kosambi] distances | recombination fractions) X1 X2 …: This statement specifies trial trait positions on the chromosome before the first marker and/or after the last marker.
map test tlocs L1 ... no default [interval proportions| external positions]: This pair of statements is used to eliminate computation of lod scores at default interval and/or external positions on the active chromosome.
map [chromosome I] test tloc L at markers J1 ...: This statement (new with MORGAN 3.0) is increasingly used with denser SNP marker data. If used, lod scores will be computed only at the positions of the specific markers. Note the marker indexing is by the count in the marker data file, not by selected marker.
map [chromosome I] test tlocs L1 L2 jointly at markers J11 J12 ...: This statement (not yet implemented) will allow two-locus lod score programs such as lm_twoqtl to compute lod scores only at any specified combination of marker positions rather than, as currently, on a grid.

See Concept Index for: location lod scores mapping model parameters, trait test positions, map function.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.9.6 Location lod scores population model parameters

See genedrop population model parameters, for statements specifying the allele frequencies for the markers and trait loci, and See Autozyg population model parameters, for statements specifying marker names.

See Concept Index for: location lod scores population model parameters.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.9.7 Location lod scores computational parameters

See ibddrop statements, for setting the sampler seeds.
See Autozyg computational parameters, for specifying the marker data.
See Autozyg computational parameters, for specifying the trait data as genotypic, quantitative or discrete and for specifying penetrances when trait data are discrete.
See genedrop computational parameters, for setting genotype means for each tloc in the case of a quantitative trait.

The following additional statements are specific to lod score computations:

set maximum I ibdgraphs per component

This statement is used by gl_lods which needs to know the upper limit of the number of ibd graphs per pedigree component provided as input.

set base log likelihood X1 X2 ...

This statement is used by the program gl_lods. The log of the marginal probability of the traits data on each pedigree component must be given to enable gl_lods to normalize its lod scores. These may be computed using the base_trait_lods program: See Running the base_trait_lods program.

set number of permutations N

This statement may be used by the program gl_lods. At each location at which a log-likelihood is computed given the realized IBD, trait-data permutation provides a normalizing null term conditional on that same IBD structure. This permutation approach is still being studied; see [GT15].

simulate N ibd realizations

This statement may be used by the program base_trait_lods, which provides a probability of trait data on a defined pedigree, under a specified trait model. If the pedigree is too complex for single-locus exact pedigree peeling, a Monte Carlo estimate may be obtained by realizing IBD on the pedigree. This statement provides the number of realizations to be used.

set pseudo-priors X1 X2 ...

This statement is optional for lm_bayes. The number of pseudo-priors is the number of test trait locus positions plus one. The first pseudo-prior is for the unlinked position; this should be assigned a positive value. All other pseudo-priors must be positive or zero. The set of pseudo-priors need not be normalized.

This statement is also optional for base_trait_lods. If the base_trait_lods base log-likelihood is to be estimated by Monte Carlo then this statement gives the number of realizations on which a crude estimate is made. This is then used in the main iterations to lessen the chance of over/under-flow

set I batches MC variance estimation

This statement is optional for lm_linkage, lm_twoqtl and gl_lods. These programs batch scored realizations in order to provide a Monte Carlo estimate of the standard deviation in estimating the lod score. This statement determines the batch size, and hence the number of batches. By default it is determined such that there are 20 batches.

The following additional statements are specific to tloc specification and likelihood computation for the program lm_twoqtl.

set traits K1 ... multiple tlocs L1 ...: This statement is used by the lm_twoqtl program to specify the tlocs L1... that contribute to each trait K1... A statement may be provided for each separate trait. However, the lm_twoqtl program expects selection of one trait with either one or two contributing tlocs.
set trait K1 for tlocs L1 L2 joint genotype means X11 X12 X13 X21 X22 X23 X31 X32 X33: This statement specifies the 9 genotypic means (3x3 matrix) for tlocs L1 and L2 in contributing to trait K1. The first index on X refers to the L1 genotype and the second to L2.
use [exact|MC] summation for trait: This statement specifies whether exact or Monte Carlo (MC) will be used by lm_twoqtl for computation of the trait contribution to the lod score. Exact summation can be used only on pedigrees with six or fewer founders.
simulate I IBD realizations for trait: If Monte Carlo summation is to be used, this statement specifies the number of realizations of tloc inheritance realizations to be used. If Monte Carlo summation is not to be used, this statement is ignored.
use multiplier I realizations for null: This statement specifies the number of time as many realizations are to be used in estimating the base-line unlinked lod-score. To obtain accurate lod-score estimates it is important this value is accurate, and it may therefore be advisable to use more realizations.

See Concept Index for: location lod scores computational parameters, pseudo-prior iterations.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

11.9.8 Location lod scores MCMC parameters and options

All the statements described in see MCMC parameter statements for specifying the MCMC parameters are used for the location lod scores programs.

Please see that section for details regarding:

use (locus-by-locus sampling | sequential imputation) for setup
use I sequential imputation realizations for setup

set MC iterations I
set burn-in iterations I
sample by (scan | step)
set L-sampler probability X

check progress I MC iterations

As with the Autozyg programs, the number of desired MC iterations must be specified, as there is no default value.

set MC iterations I: This statement sets the total number of ‘main’ L- and M-sampler iterations. For lm_linkage, the total MCMC run length is the sum of the number of burn-in iterations and main iterations. For lm_bayes, the total MCMC run length is the sum of the number of burn-in, pseudo-prior (see below) and main iterations.

Additional statements for lm_bayes include the following:

set pseudo-prior iterations I: Following burn-in, lm_bayes performs iterations to calculate the pseudo-priors. These pseudo-priors are used to encourage the MCMC sampler to visit test positions of low posterior probability. The default number of iterations to compute pseudo-priors is 50% of the number of main iterations specified in the ‘set MC iterations’ statement.
set sequential imputation proposals every I iterations: This option applies to lm_bayes’s pseudo-prior and main MCMC iterations. It allows the MCMC chain to “restart” every Ith iteration. Sequential imputation is used to propose potential restart configurations which are accepted/rejected with Metropolis-Hastings probability.
set test position window I: This lm_bayes statement specifies the window size for the proposed tloc position update in the Metropolis-Hastings algorithm. I is the number of hypothesized trait positions on either side of the current position, with equal weight given to the 2*I + 1 trait positions. The default is window size is 6.

See Concept Index for: location lod scores MCMC parameters and options, MC iterations, burn-in, sequential imputation proposals.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.

11.1 Introduction to `lm_linkage`, `lm_bayes`, `lm_twoqtl`, `gl_lods` and `base_trait_lods`
11.2 Sample parameter files for `lm_linkage` and `lm_bayes`
11.3 Running `lm_linkage` examples and sample output
11.4 Running `lm_bayes` examples and sample output
11.5 Running `lm_twoqtl` examples and sample output
11.6 Parameter files for the `gl_lods` program
11.7 Running `gl_lods` examples and sample output
11.8 Running the `base_trait_lods` program
11.9 Location lod scores statements

11. Estimating Location lod Scores by MCMC

11.1 Introduction to lm_linkage, lm_bayes, lm_twoqtl, gl_lods and base_trait_lods

11.2 Sample parameter files for lm_linkage and lm_bayes

11.3 Running lm_linkage examples and sample output

11.4 Running lm_bayes examples and sample output

11.5 Running lm_twoqtl examples and sample output

11.6 Parameter files for the gl_lods program

11.7 Running gl_lods examples and sample output

11.8 Running the base_trait_lods program

11.9 Location lod scores statements

11.9.1 Location lod scores computing requests

11.9.2 Location lod scores file identification statements

11.9.3 Location lod scores pedigree file description

11.9.4 Location lod scores output file description

11.9.5 Location lod scores mapping model parameters

11.9.6 Location lod scores population model parameters

11.9.7 Location lod scores computational parameters

11.9.8 Location lod scores MCMC parameters and options

11.1 Introduction to `lm_linkage`, `lm_bayes`, `lm_twoqtl`, `gl_lods` and `base_trait_lods`

11.2 Sample parameter files for `lm_linkage` and `lm_bayes`

11.3 Running `lm_linkage` examples and sample output

11.4 Running `lm_bayes` examples and sample output

11.5 Running `lm_twoqtl` examples and sample output

11.6 Parameter files for the `gl_lods` program

11.7 Running `gl_lods` examples and sample output

11.8 Running the `base_trait_lods` program