[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
7.1 Introduction to ibddrop | ||
7.2 Sample ibddrop parameter file | ||
7.3 Running ibddrop example and sample output | ||
7.4 ibddrop statements |
See Concept Index for: a priori ibd probabilities, identity by descent, ibd.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
ibddrop
ibddrop
estimates probabilities of gene identity by descent, ibd,
(such as kinship, inbreeding, or multi-gene identities) by Monte Carlo in the
absence of data. Given the pedigree and a genetic map, ibddrop
simulates
meioses indicators and scores them to estimate the ibd probabilities among a
set of gametes. As originally written, the parameter format of ibddrop
was set up to parallel that of the program lm_auto
See Estimating Conditional IBD Probabilities by MCMC.
The lm_auto
program also estimates
ibd probabilities, but does so conditionally on marker and (if requested)
trait data. This format has been retained in the current ibddrop
but it
is important to recognize that in ibddrop
’markers’ and ’tloc’
(trait locus) refers only to a location on the chromosome, and not to any
allelic or phenotypic entities.
The simplest example of estimation of ibd probabilities among a set of gametes is the computation of an individual’s inbreeding coefficient. In this example, the set of gametes in question are the maternal and paternal gametes that make up the individual. A set of two gametes can be either ibd or not-ibd. To keep track of ibd status among the gametes, we can label the paternal allele ‘1’. If the two alleles are ibd, the maternal allele would also be labeled ‘1’, and the resulting ibd pattern would be ‘1 1’. If the two alleles are not ibd, the maternal allele would be labeled ‘2’ and the resulting pattern would be ‘1 2’. The individual’s inbreeding coefficient is the probability that the two alleles follow the ‘1 1’ pattern.
If there are three gametes in the set, there are five potential ibd
patterns: ‘1 1 1’ (all three gametes are ibd), ‘1 1 2’ (the first
two are ibd and the third is not), ‘1 2 1’ (the first and third are
ibd) , ‘1 2 2’ (the last two are ibd), and ‘1 2 3’ (none are
ibd). ibddrop
can estimate probabilities of ibd patterns among
up to 10 gametes in a set. ibddrop
outputs a probability for each
ibd pattern at each marker.
Gene identity can be scored either for each locus separately, in which patterns
of identity among up to ten gametes can be scored, or it can be scored
jointly over a moving window of several loci. If the moving window option is
selected, ibddrop
estimates the probabilities of each ibd/non-ibd
pattern at loci across the window, for the specified pair of
gametes.
For MORGAN V3.4, the ibddrop
program has been extensively
revised.
ibddrop
has been change to
’simulate ibd realizations’.
See Concept Index for:
ibddrop
introduction,
ibd pattern,
meiosis indicators,
simulation of descent in a pedigree.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
See the Concept Index for:
ibddrop
sample parameter file,
7.2.1 Classic ibddrop parameter file | ||
7.2.2 Using dense marker simulation | ||
7.2.3 Selection against autozygosity |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The example parameter file ‘jv_ped_ibd.par’
in the
‘IBD’ subdirectory of
‘MORGAN_Examples’ has been updated so that it will run under
MORGAN V3.4. So also have examples parameter files
‘ibddrop1.par’ and ‘ibddrop2.par’. Details may be found
in ‘README_ibd’ in that same subdirectory.
However, for convenience, we describe
here examples
in the Gold
subdirectory of the Genedrop
program
directory. One such example is the parameter file parIBD_LL
:
set printlevel 5 input pedigree file "ped45" simulate markers simulate tloc 11 map markers recomb fract .18 .1 .1 .1 map tloc 11 marker 2 recomb fract .06 set component 1 scoreset 1 proband gametes 331 0 333 1 set component 1 scoreset 2 proband gametes 531 0 531 1 331 0 333 1 set component 2 scoreset 1 proband gametes 3v1 0 3v3 1 set component 2 scoreset 2 proband gametes 5v1 0 5v1 1 3v1 0 3v3 1 set component 3 scoreset 2 proband gametes 5w1 0 5w1 1 3w1 0 3w3 1 set component 3 scoreset 1 proband gametes 3w1 0 3w3 1 set sampler seeds 0x8a226a51 0xd2978c71 simulate 20000 ibd realizations |
The parameter file specifies the pedigree file name ‘ped45’ and then
asks simulation at markers and at one trait locus. The number of markers
is determined by the map statement: since there are 4 recombination parameters
provided, there will be 5 marker locations in addition to the trait location.
Note that, since there are no data, this is simply a way to specify 6
locations, one of which (the tloc
) may play a special role and/or
may be unlinked.
‘ped45’ contains 45 individuals, who are 3 replicates of the JV pedigree.
The file includes also gender and a ’trait’ but trait data are ignored by
ibddrop
. (Other programs which use the trait data,
such as lm_auto
may also used this same pedigree file.
See Estimating Conditional IBD Probabilities by MCMC.)
The two ‘map’ statements specify the genetic map. From the first statement, the genetic distances between the markers are 44.6, 44.6, 11.2 and 11.2 centiMorgans. From the second statement, the trait lies between markers 2 and 3, at 22.3 centiMorgans with marker 2.
The ‘set proband gametes’ statements tell ibddrop
which gametes to
score: that is, the gametes among which the ibd probabilities will be
estimated. In this example, we selected, from component 1 (the first family in
the data set), the maternal (0) gamete of ‘331’ and the paternal (1) gamete
of ‘333’. The next statement selected four gametes to score from family 2.
Note that characters are allowed in the names of individuals.
The ‘input seed file’ statement enables the file to use the seeds from file ‘sampler.seed’. The ‘output overwrite seed file’ statement allows the program to replace the contents of the seed file with the newly generated seeds. If this options were omitted, when the program finished running, new seeds would be appended to the end of the file. Seeds can also be set using the ‘set sampler seeds’ statement (see ibddrop statements).
The number of Monte Carlo realizations is set to be 20,000 by the ‘simulate ibd realizations’ statement.
Note that to compute a
multilocus ibd probability, the statement
‘set locus window’ can be used to specify the number of loci to score
jointly. ibddrop
has limited functionality for computing
multilocus probabilities, it can only examine two gametes to determine whether
or not the two are ibd.
For an example see the parameter file parIBD_WL
where the statement
‘set locus window 3’ is included, and each proband gamete set is
of two gametes only.
For additional options, including scoring of a specific multi-gamete ibd
pattern over two or more loci, see the file Sample lm_auto parameter file:
lm_auto
has this more general option of scoring multi-gamete ibd
patterns.
See Concept Index for: map function, proband gametes, seeds for sampler, seed file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
An example of the ’dense’ simulation is given in the ‘Gold’ file dense_markers.par:
set printlevel 5 map chromosome 2 markers recomb fract .1 .05 .15 .2 map chromosome 3 markers recomb fract .04 .12 .1 .1 map chromosome 4 markers recomb fract .18 .1 .1 .1 set component 1 scoreset 1 proband gametes 331 0 333 1 set component 1 scoreset 2 proband gametes 531 0 531 1 331 0 333 1 set component 2 scoreset 1 proband gametes 3v1 0 3v3 1 set component 2 scoreset 2 proband gametes 5v1 0 5v1 1 3v1 0 3v3 1 set component 3 scoreset 1 proband gametes 3w1 0 3w3 1 set component 3 scoreset 2 proband gametes 5w1 0 5w1 1 3w1 0 3w3 1 simulate chromosome 4 dense markers simulate 100 ibd realizations # Tell ibddrop where to read the pedigree from. input pedigree file "./ped45" # Provide a file name for the ibddrop ibdgraphs results file. output overwrite scores file "./dense_markers.ibdgraphs" # Set the sampler seeds. set sampler seeds 0x8a226a51 0xd2978c71 # The following select markers parameter statement tells ibddrop # where to score and output IBD probabilities at. select chromosome 4 markers 1 3 5 |
Here the pedigree ‘ped45’ and proband gametes are as before. Locations on 3 different chromosomes are given, but just one (here ‘Chromosome 4’ must be chosen for simulation of ibd. the ‘simulate dense markers’ option specifies that simulation will be by generating recombination breakpoints in a continuum, rather than using marker-to-marker computations, and again a number of ‘ibd realizations’ is requested. An ‘output scores file’ is optionally provided. If it is, the simulated ‘ibdgraphs’ will be output in compact format. See The ibd_class utility. The scoring of IBD patterns over loci and outputting of IBD probabilities remain the same as before. Optionally, the ‘select markers’ statement can be used to specify a subset of the original ’marker’ locations at which to score ibd. If this statement is not present, ibd will be scored at all the mapped locations for that chromosome. The length of the simulated chromosome is determined by the distance between the first and the last selected markers.
Although here in this small example only a few locations are specified, this ’dense’ option is intended for cases where there are many marker locations, so marker-to-marker simulation is inefficient. If the ‘ibdgraph’ is output, subsets of marker locations for later scoring may later be selected without re-simulation. See Sample parameter files for ibd_create; fgl2ibd and fgl2haplo and fgl2dgl. Since scoring is done at each specified location, this part of the computation will be linear in the number of markers selected. With very large numbers of selected markers, it is suggested that ‘printlevel 3’ is a better option than ‘printlevel 5’ to avoid attempts to print the marker map. The capability of scoring jointly over a moving window is not currently available for the "dense" option.
See Concept Index for: dense markers
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Examples in which selection is imposed at the first location (the ‘tloc’)
are provided in the two Gold
files
‘ibd_cleo_sparse.par’ and ‘ibd_cleo_dense.par’.
An additional example,
‘cleo_segments.par’, is included in the ‘IBD’ subdirectory
of ‘MORGAN_Examples’.
These examples are so named because they use the very highly inbred
Cleopatra pedigree, ‘cleopatra.ped’.
# for running "ibddrop" with selection against autozygosity and against # mom-identity, at target locus 0, on Cleopatra pedigree # Aug 6, 2017 -- new version for Gold standard # Note this version prints the FGL statistics both to file and to output # It also prints autozygosity to the extra output file # input pedigree file 'cleopatra.ped' # Include everything in the output file. set printlevel 5 simulate markers simulate tloc 11 map markers distances 1 1 2 3 5 8 13 21 34 55 map tloc 11 marker 0 distance 1 set component 1 scoreset 1 proband gametes Ptolemy-VIII 0 Ptolemy-VIII 1 Cleopatra-III 0 Cleopatra-III 1 set component 1 scoreset 2 proband gametes Berenice-III 0 Berenice-III 1 Cleopatra-VII 0 Cleopatra-VII 1 set component 1 scoreset 4 proband gametes Cleopatra-V 0 Cleopatra-V 1 Cleopatra-VII 0 Cleopatra-VII 1 set sampler seeds 0x8a226a51 0xd2978c71 simulate 100000 ibd realizations output overwrite extra file "ibd_cleo_sparse.scor" set dummy reals 0.9, 0.5, 0.9 # selection at 0.9 against autozygosity # selection of 0,5 against mom-identity # and 0.9 for both |
Much of the parameter file is as before, but note that the tloc
at which selection is to be imposed should be at the left end of the set of
markers at which ibd is to be scored.
As before, ibd probabilities among the proband gametes is printed to
the standard output. Since in ibddrop
, the ‘output scores file’
(if requested) is used to save the realized ibd graphs, the
‘ioutput extra file’ is used to print the autozygosity statistics to file
for possible later analyses.
Selection is imposed at the target tloc
by means of three selection
indices input via the ‘set dummy reals’ statement.
See Input extra variables. The three numbers (0.9, 0.5 and 0.9 in
the above example) are viability weights relative to a norm of 1.
The first is for a potentially autozygous offspring (ibd between the
two offspring gametes), the second for a
potential offspring being identical (ibd at both gametes) to the mother,
and the third is for a potential offspring with both characteristic
(so that all four gametes of offspring and mother are ibd). Selection
at the ‘tloc’ is imposed by modifying the segregation at this location.
Descent at the linked markers is then simulated conditional on descent
at the ‘tloc’. Note that selection does not change the pedigree
structure, but only the segregation of DNA within the pedigree. The
Cleopatra pedigree is chosen for the example, because its extreme
inbreeding results in major effects of both types of selection.
The parameter file for dense marker simulation in this example is
ibd_cleo_dense.par
. It differs only in the
statements:
simulate dense markers output overwrite extra file "ibd_cleo_dense.scor" |
Modulo variation due to randomness, the results for these two versions are identical. Only the methods of simulation of meiosis across the chromosome differs, with the ‘sparse’ realizations being done marker to marker using recombination fractions, and the ‘dense’ version generating recombination breakpoints.
An alternative is provided in the parameter file
cleo_segments.par
. This file was not included in the
MORGAN V3.4 release, but instead is now added to
‘MORGAN_Examples/IBD’. This is very similar to the
cleo_dense.par
file,
but differs in that the FGL graphs are additionally printed.
The ‘output scores file’ is used for these FGL graphs.
From the FGL graphs,
segments of ibd among any set of gametes can be reconstructed
and scored, rather than scoring marker-by-marker.
See Concept Index for: Cleopatra pedigree, selection against autozygosity, selection against maternal identity.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
ibddrop
example and sample outputThe syntax for running this MORGAN program is:
<./program> <parameter file> [ > <output file name> ] |
where , optionally, ‘>’ redirects the standard output (<stdout>) to an output file instead of to the screen.
For example, the ‘parIBD_LL’ example can be run in the ‘Gold’ subdirectory of ‘Genedrop’ with the following command:
../ibddrop parIBD_LL > ibddrop.out |
The genetic map specified by the statements ‘map markers recomb fract’ and ‘map tlocs 11 marker 2 recomb fract’ is below. Note the position of the trait locus (T11) with respect to the marker loci.
Chromosome map .............. Inter-locus distances in cM, using Haldane map function: T11 --------------+--------------------- 22.3 6.4 4.8 11.2 11.2 +------+-------------+------+------+ M1 M2 M3 M4 M5 |
Since the parameter file contains six‘set proband gametes’ statements,
ibddrop
will produce six sets of results in the output file (here
‘ibddrop.out’).
The exact probability estimates will, of course, depend on the random seed used. Some example results for the second component are detailed below.
Summary for component 2: Probabilities of IBD patterns Proband gamete set 1: 3v1 0 3v3 1 pattern marker-1 marker-2 tloc-11 marker-3 marker-4 marker-5 label 1 1 .2526 .2497 .2517 .2509 .2495 .2500 0 1 2 .7473 .7503 .7483 .7491 .7506 .7500 1 Probabilities of IBD patterns Proband gamete set 2: 5v1 0 5v1 1 3v1 0 3v3 1 pattern marker-1 marker-2 tloc-11 marker-3 marker-4 marker-5 label 1 1 1 1 .0314 .0300 .0309 .0303 .0290 .0294 0 1 1 1 2 .0284 .0272 .0278 .0288 .0290 .0289 1 1 1 2 1 .0149 .0144 .0135 .0129 .0137 .0143 3 1 1 2 2 .0100 .0087 .0092 .0092 .0092 .0092 4 1 1 2 3 .0263 .0301 .0283 .0277 .0271 .0263 5 1 2 1 1 .0658 .0649 .0637 .0627 .0631 .0625 6 1 2 1 2 .0056 .0051 .0060 .0060 .0056 .0063 7 1 2 1 3 .0589 .0593 .0583 .0590 .0588 .0594 8 1 2 2 1 .0648 .0678 .0697 .0701 .0714 .0698 9 1 2 2 2 .0494 .0505 .0503 .0507 .0486 .0493 10 1 2 2 3 .1366 .1416 .1389 .1386 .1393 .1387 11 1 2 3 1 .1366 .1348 .1349 .1346 .1330 .1349 12 1 2 3 2 .0296 .0279 .0265 .0251 .0260 .0280 13 1 2 3 3 .0961 .0955 .0975 .0979 .0995 .0995 14 1 2 3 4 .2455 .2420 .2444 .2464 .2469 .2434 15 |
The probabilities are summarized by the ibd pattern. Each integer in the
pattern represents one of the gametes that ibddrop
was asked to score.
Same numbers indicate gametes that are ibd. For instance, ‘1 1 1 1’
means all four gametes are ibd; ‘1 2 1 1’ means gametes 1, 3, and 4 are
ibd, while gamete 2 is not ibd with the others; ‘1 2 3 4’ means all
four gametes are not ibd.
The ibd patterns are scored for each locus separately; there is a column for
each of the five markers and one for the trait locus. The final column
’label’ is a label for the state that can be easily inverted to obtain
the ibd pattern; its main use is internal to the program. However,
the in lm_auto
program one may request scoring of specific
ibd patterns by specifying the desired state labels
(see Sample lm_auto parameter file).
To compute multilocus ibd probabilities, say for 3 loci, use the parameter file ‘parIBD_WL’ which contains the line ‘set locus window 3’. The interesting part of the output for component 2 is:
Probabilities of IBD patterns for windows of 3 loci Proband gamete set 1: 5v1 0 5v1 1 IBD wndw 1 wndw 2 wndw 3 wndw 4 0 0 0 .7826 .7680 .7902 .7919 0 0 1 .0255 .0459 .0473 .0471 0 1 0 .0712 .0377 .0264 .0249 0 1 1 .0109 .0374 .0257 .0274 1 0 0 .0312 .0694 .0488 .0484 1 0 1 .0496 .0062 .0050 .0047 1 1 0 .0045 .0161 .0267 .0268 1 1 1 .0244 .0192 .0300 .0288 |
This time, ibddrop
was asked to compute ibd probabilities in windows
of three loci at a time. The four windows can be seen from the marker map:
(M1,M2,T11), (M2,T11,M3), (T11,M3, M4), (M3, M4, M5).
If the trait locus were unlinked to the marker loci, it would be
placed to the left of the five marker loci on the map. Thus the
first window, ‘wndw 1’, would include the trait locus and the
first two marker loci.
The values in the
‘ibd’ column at the left of the table represent ‘ibd’ patterns. The
pattern ‘0 0 0’ means that the selected gametes are not ibd at the
three loci in each window. The pattern ‘0 0 1’ means that the selected
gametes are not ibd at the first two loci in the window, but are ibd at
the third. The values in the columns give the probability of the ibd
pattern at the left for each of the four windows. For example, the probability
that the maternal and paternal gametes of individual 5v1 are ibd at marker
loci 3 and 5, but not at marker locus 4 is 0.0047.
The format of output for the dense marker options is effectively identical to
the above: only the simulation of meioses and recombination breakpoints differ.
Example outputs for the beta-test version of ibddrop
with
selection may be found in the ‘Gold’ subdirectory of ‘Genedrop’.
The relevant files all contain the string ‘cleo’, since the examples
use the Cleopatra pedigree. The output for this beta-test program will
not be further discussed here.
The alternative of outputting FGL graphs, and scoring ibd directly by segments as mentioned in see Selection against autozygosity, is provided in the parameter file ‘cleo_segments.par’. For reference this is included in the ‘IBD’ subdirectory of ‘MORGAN_Examples’. Note that, for compatibility with MORGAN V3.4, it is also necessary to output minimal regular scoring output. However, the intention would be to use the simulated FGL graphs in any downstream analyses.
See Concept Index for:
running ibddrop
example,
ibddrop
sample output,
ibd pattern.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
ibddrop
statementsNote that ibddrop
does not simulate or use marker or trait data. The
statements are used only to specify the map of the loci at at which descent is
to be simulated and ibd scored. The locations of loci are specified in this
way so that direct comparisons can be made between output of ibddrop
and
of lm_auto
(see Running lm_auto example and sample output), where
simulation is conditional on marker and trait data.
The additional ibddrop
statements are:
simulate [chromosome I] [dense] markers
This statement requests that markers are to be simulated. For ‘ibddrop’ the name ’markers’ refers only to chromosome locations, not to actual alleles or individual genotypes. The number of markers (i.e. locations) is inferred from the marker map. If the option to use dense markers is selected, descent is simulated by creating the recombination breakpoints in a meiosis, rather than simulating inheritance from marker to marker using recombination fractions.
simulate tloc L
This statement, which typically follows the simulate markers
statement,
establishes the trait locus to be simulated. Note that this trait locus must
be mapped onto the chromosome selected for marker simulation.
map tlocs L1 … unlinked
This statement specifies a trait to be simulated that is not linked to markers. Only one trait can be simulated and this trait will be placed to the left of all markers.
set [component M] proband gametes N1 K1 N2 K2...
In this statement, the user specifies which gametes ibddrop
is to score.
Each statement must contain gametes from a single component, as the components
are assumed to be independent, i.e. the probability of ibd between gametes
from different components is zero. Pairs consisting of an individual’s name and
a meiosis indicator are listed, with ‘0’ indicating the individual’s
maternal gamete and ‘1’ indicating their paternal gamete.
In the current version of MORGAN, the number of proband gametes in a set is limited to 10.
set [chromosome I] locus window K
This statement gives the window size (number of loci) for which the multilocus ibd probabilities are scored. If no size is given, each locus is scored separately.
set sampler seeds H1 H2
This statement initializes a pair of seeds for the random number generator. The seeds must be positive and no greater than ‘0xFFFFFFFF’, with the first seed (congruential seed) odd, and the second seed (Tausworthe seed) nonzero. If no seeds are specified, default seeds are used.
simulate K ibd realizations
This statement is requited for ibddrop
.
See Concept Index for:
ibddrop
statements,
proband gametes,
meiosis indicators,
seeds for sampler.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.