[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
13.1 Introduction to PolyEM programs | ||
13.2 Sample multivar parameter file | ||
13.3 Running multivar example and sample output | ||
13.4 multivar statements |
See Concept Index for: polygenic model, PolyEM, EM algorithm, quantitative trait.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
See References, for details of the cited papers.
PolyEM is a set of programs to evaluate the likelihood and compute MLEs for polygenic models of quantitative traits by EM algorithms. The original versions of these programs were based on the work described in [TS90] and [TS92].
There are four main programs whose features are summarized below:
univar
: This program fits a univariate trait model. It is
primarily for test purposes. The likelihood is computed by three methods
including classical pedigree
polygenic peeling [ES71],
which does not extend to looped pedigrees, by direct inversion of the
covariance matrix, which
does not extend to large pedigrees,
and by a general
matrix elimination peeling method which is the method used by the
other ‘PolyEM’ programs.
unibig
: An extension of univar
to big pedigrees that
implements more efficient methods to compute the polygenic likelihood on
large looped pedigrees.
bivar
: An extension of unibig
for bivariate traits.
multivar
: An extension of bivar
for multivariate traits.
All programs can work with looped pedigrees. The exception is that looped
pedigrees cannot be used for the polygenic peeling algorithm in univar
.
The other programs do not use polygenic peeling to evaluate the likelihood.
Only examples and statement references for multivar
are given since it
has the most complete features. The statements for other programs are similar
with some exceptions. For example, any statements with between-trait
covariances do not apply to univar
or unibig
, since these
deal only with a single trait.
See Concept Index for:
PolyEM introduction,
univar
,
unibig
,
bivar
,
multivar
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
multivar
parameter fileThe example pedigree file ‘polyem.ped’ for the PolyEM programs is a 90-member pedigree consisting of two 45-member components. The format is similar to ‘ped73.ped’, which was used in most of the previous examples.
The first three entries in each line consist of the individual’s name, father’s name and mother’s name. Integers starting with the fourth column (usually gender) can be fixed effects (gender, age class, etc.) or discrete phenotypes.
For quantitative traits, real numbers follow the names and integers. These real numbers represent trait measurements. Missing values are coded with integer part ‘999’, such as 999.5 in the following example.
Here is part of the pedigree file ‘polyem.ped’. This file can be found in the ‘PolyEM’ subdirectory of ‘MORGAN_Examples’.
input pedigree size 90 input pedigree record names 3 integers 3 reals 2 **************************************** 1 0 0 1 1 0 0.0246 -1.0125 2 0 0 2 1 0 -0.5978 1.5963 3 0 0 1 1 0 -0.8124 0.5662 4 0 0 2 1 0 0.4334 1.7721 5 1 2 1 1 0 0.1802 -1.4672 6 1 2 1 1 0 -1.7557 0.8091 7 3 4 2 1 0 999.5 999.5 8 3 4 2 1 0 1.9128 0.9780 9 0 0 2 1 0 0.9530 2.3473 ... |
Below is the example multivar
parameter file, ‘polyem.par’.
input pedigree file ‘polyem.ped’ select traits 1 2 set trait 2 effects 1 2 start residual covariance -0.09 start additive covariance -0.0017 start residual variance 1.10 0.65 start additive variance 0.037 0.0288 fit residual covariance 1 fit additive covariance 1 fit environmental model output spacing 20 EM iterations limit EM iterations 200 |
multivar
can fit a polygenic model with one to five traits, which
can be modeled as dependent and/or independent.
One ‘select traits’ statement must be given. The number of integer
values entered as arguments to the statement must be the number of
quantitative traits expected by the program being run. For example, for
programs univar
and unibig
, the ‘select traits’
statement must have
a single integer argument. When running bivar
, two integer arguments
must be given. For multivar
, one to five integer arguments must be
given in the ‘select traits’ statement, for the one to five quantitative
traits selected. Unlike other MORGAN programs, the trait
numbers correspond to the column number of the reals in the pedigree file.
In the example, the statement ‘select traits 1 2’ indicates that
the first two column of real numbers contain the trait data to be analyzed.
The statement ‘set trait 2 effects 1 2’ indicates that the second column of real numbers is to be modeled with two fixed effects (also called covariates). The integers give the location of the fixed effects (covariates) starting with column 4 in the pedigree file. In this example, the fixed effects are to be found in columns 4 and 5. Important: a fixed effect location of ‘1’ indicates that the effect value will be found in column 4 (after the 3 name columns). The most commonly modeled fixed effect is gender, which, if present, resides in column 4 of a MORGAN pedigree file.
The statement ‘start trait mean’ allows the user to specify the starting trait mean for a selected trait. Since no ‘start trait mean’ statement is included after either of the ‘select trait’ statements, both of the initial means are computed by the PolyEm program. Similarly, one may specify the initial values for each effect with the statement ‘start trait I effect M X1 X2 ...’, where ‘I’ is the trait number, ‘M’ is the effect number and ‘Xi’ is the starting value of the ith level (i = 1, 2, 3, ...). These starting values represent deviations from the global mean. The starting values are normalized so that their weighted sum is zero (weighted by the number of individuals in that level). When using the ‘start trait I effect M X1 X2 ...’ it is important to know that if more levels are present in the column of numbers corresponding to the trait ‘I’ in the pedigree file than are specified in the ‘start trait’ statement, PolyEm programs will compute the starting value(s) for these additional levels. Since the program will not issue a warning or error message, it is important to always check the output to confirm that the number of levels present in the file was as intended. Since the ‘start trait I effect M X1 X2 ...’ statement is not included in this example, the PolyEm program will compute the initial values of the effect.
Initial values for additive and residual variances and covariances are specified in the next four statements. These statements are required. With the variance statements, the number of arguments must be the same as the number of traits selected and must be in order of increasing trait number. With the covariance statements, the number of arguments must be the same as the number of pairs of traits selected. See multivar segregation model parameters, for discussion of the ordering of these arguments.
multivar
can also fit a purely environmental model with no genetic
component. The ‘fit environmental model’ statement tells multivar
to
fit a purely environmental model, with no genetic variance. This null
hypothesis model is produced in addition to the genetic/environmental model.
The final two statements specify the number of EM iterations and how often the EM estimates are to be printed out.
Note that one has the opportunity to provide predetermined eigenvalues of the G-matrix of observed individuals. The ‘set eigenvalues’ statement is used to specify the eigenvalues, with the number of values equal to the number of observed individuals. If desired, the eigenvalues can be provided through an input file accessed with a ‘input eigenvalue file’ statement in the parameter file, or through the command line (see multivar computational parameters).
See Concept Index for:
multivar
sample parameter file,
missing quantitative trait data.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The command to run multivar
(unibig
and bivar
have the same
set of options) is:
./multivar parfile [ped pedfile] [eigen eigenfile] |
where parfile is the name of the parameter file and is required. pedfile overrides the ‘input pedigree file’ statement, and eigenfile overrides the ‘input eigenvalue file’ statement in the parameter file.
Under the subdirectory ‘PolyEM/’, run the example by typing:
./multivar polyem.par |
Toward the end of the multivar
output, are the parameter estimates and
the log-likelihood from the last iteration of the EM algorithm. If you chose to
fit a null (purely environmental) model, using the ‘fit environmental
model’ statement, those parameter estimates and log-likelihood are also given. A
likelihood ratio test can then be performed, with test statistic equal to the
absolute value of 2 times the difference between the log-likelihoods of the two
models. A conservative test is provided by comparing the test statistic to a
chi-squared distribution, with the degrees of freedom being the difference in
the numbers of estimated parameters between these two models. Note that in
these ‘Polyem’ programs, model log-likelihoods are in base e
rather than the usual lod score base-10 convention.
iteration #201: additive variance estimates (traits 1, 2) 0.816 0.037 covariances 0.138 residual variance estimates (traits 1, 2) 0.223 0.610 covariances -0.239 trait 1 overall mean -0.063 trait 2 overall mean 1.780 fixed effect 1 -0.717 0.546 fixed effect 2 -1.008 -0.552 1.167 current log-likelihood = -183.098 estimates of environmental model residual variance estimates (traits 1, 2) 1.136 0.642 covariances -0.102 trait 1 overall mean 0.062 trait 2 overall mean 1.801 fixed effect 1 -0.773 0.589 fixed effect 2 -1.010 -0.553 1.170 environmental model log-likelihood = -197.799 |
In this example, we see that the fitted genetic model has a very significantly larger log-likelihood: 2(-183.098 + 197.799) = 29.4, with 3 extra genetic variance parameters fitted. The estimates of the fixed effects are little altered by fitting the genetic model but for trait 1 the additive genetic variance is large relative to the residual variance, indicating a strong genetic component. Note that, for each trait, the sum of the additive and residual variances from the genetic model is close to the residual variance in the environmental model.
See Concept Index for:
running multivar
example,
multivar
sample output.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
multivar
statements13.4.1 multivar computing requests | ||
13.4.2 multivar segregation model parameters | ||
13.4.3 multivar computational parameters | ||
13.4.4 multivar computational options | ||
13.4.5 multivar output options |
See Concept Index for:
PolyEM statements,
multivar
statements.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
multivar
computing requestsselect traits I1...
One ‘select trait’ statement is required, and must list the
traits to be
modeled. Up to five traits are allowed for multivar
.
The trait number, I, corresponds to the column of real numbers in
the pedigree file, with the first column of real numbers being
trait 1, the second column trait 2 and so on.
set trait I effects M1...
M1... are the fixed effects to be modeled for a specified trait I. They are the integer columns in which they appear in the pedigree file. That is the columns after the three names, so that fixed effect ‘1’ is in the 4th column (usually gender), effect ‘2’ is in column 5, and so on.
See Concept Index for:
multivar
computing requests.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
multivar
segregation model parametersstart trait I mean X
There is one statement per trait, specifying the starting value for the mean trait values. The PolyEM programs will compute the initial values if not given.
start trait I effect M X1 X2 …
Starting values for the fixed effect levels for the traits are computed, unless specified in this statement.
start additive variances X1 X2 …
The starting values for the variances of the traits are required. The number of values must be the same as the number of traits selected, in the order of increasing trait number.
start residual variances X1 X2 …
Starting values for residual variances are also required.
start additive (covariances | correlations) X12 …
Starting values for the covariances (or correlations) between the traits are required. They are given the order: X12, X13, …, X1n, X23, …, X2n, …, where Xij is the covariance for the ith and jth selected traits from the pedigree file.
start residual (covariances | correlations) X12 …
See the ‘start additive...’ statements.
See Concept Index for:
multivar
segregation model parameters.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
multivar
computational parametersfit additive (covariances | correlations) X12 …
This statement specifies which covariances to be estimated and which to be fixed at 0. The order of values is the same as the ‘start additive covariances’ statement with ’1’ indicating a covariance to be fit and ’0’ a covariance to be fixed.
Note that if trait 1 is correlated with trait 3, and so is trait 2 with trait 3, the correlation between 1 and 2 cannot be zero. So we have to be a bit careful in specifying the correlation structure.
fit residual (covariances | correlations) X12 …
Similar statement for residual covariances.
set eigenvalues X1 X2 …
Optional. This statement is used to provide predetermined eigenvalues of the G-matrix of observed individuals, with the number of values the same as the number of observed individuals.
input eigenvalue file eigenfile
Optional. If present, it overrides the ‘set eigenvalues’ statements.
limit breeding iterations I
This statement specifies the maximum number of breeding–values iterations. The default number currently is 20.
set breeding convergence X
This statement specifies the convergence criterion for breeding–values iterations. The default number is currently is 1.0e-8.
limit EM iterations I
This statement specifies the number of EM iterations. The default number presently is 200. There is no option to specify convergence criterion. If convergence has not been achieved, the final estimates can be used as starting values to rerun the program.
See Concept Index for:
multivar
computational parameters,
observed individuals.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
multivar
computational optionscompute eigenvalues
If this statement is present, the values given in either the eigenfile or the statement ‘set eigenvalues’ are ignored and the eigenvalues are computed by the program. This is the default action if no eigenvalues are given.
use (full | partitioned) EM
Use this statement to choose between two iterative procedures in maximum likelihood estimates. With the ‘full EM’ option, the fixed effects, additive and residual variance and covariance are simultaneously updated. This is the default action.
With the ‘partitioned EM’ option, the maximization step is partitioned into two parts. The first part is to maximize the likelihood over additive and residual variances/covariances; the second part over residual variances/covariances and fixed effects. The expectation step is run after each part. Partitioned EM takes more computer time.
fit environmental model
This statement asks a purely environmental model with no genetic variances to be fit, in additional to the genetic/environmental model.
See Concept Index for:
multivar
computational options.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
multivar
output optionsoutput statistics (covariances | correlations)
By default, covariances are printed out.
output final adjusted phenotypes
If this option is specified, trait values adjusted for all fixed effects are computed and output.
output spacing I EM iterations
This statement requests a print out of the EM estimates every Ith iteration. The default number is defined in the program header file.
check gmatrix
This statement requests a print out of the G matrix for observed individuals and quit without doing the likelihood computation.
check ginverse
This statement requests a print out of the G inverse matrix and the program quits unless ‘check eigenvalues’ has also been specified.
check eigenvalues
This statement requests the program to print out of the eigenvalues, whether computed or input, and then to quit. These eigenvalues can then be used as input in subsequent runs.
check eigenvalue computation
This statements causes some comments to be printed by the function that computes the eigenvalues.
check trace
This statements requests the trace of the G–inverse matrix to be printed.
See Concept Index for:
multivar
output options.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.