Version: r1181
Email: browning@uw.edu
The SEQERR program estimates the rate at which homozygous major allele genotypes are mis-called as heterozygote genotypes at low frequency markers. This error rate is estimated using observed allele and genotype frequencies at low frequency variants in identity by descent segments. Identity by descent segments in sequence data can be detected using the IBDseq program.
If you publish genotype error rates estimates obtained from the SEQERR program, please cite the following reference:
B L Browning and S R Browning (2013) Detecting identity by descent and estimating genotype error rates in sequence data. The American Journal of Human Genetics 93(5):840-851. doi:10.1016/j.ajhg.2013.09.014
[ top ]
To use SEQERR, enter the following command at the command line prompt:
java -jar seqerr.jar
arguments
where arguments is a space-separated list of parameters, each expressed as parameter=value.
chrom
or chrom:start-end
.
where chrom
is the chromosome identifier in
the VCF files, and start
and end
are
the interval start and end positions.
The start
or the end
value may be
omitted if it corresponds to a chromosome end.ibdlength=2e6
) assumes no genetic map
is specified and the data are from an outbred human population.
If a map file with cM distances is used with data from an
outbred human population, a minimum IBD segment length of 2 cM
(ibdlength=2
) would be appropriate.
ibdtrim
parameter should be large enough so that over-estimation of IBD
segment length is negligible. Length units are
either base-pair distances or genetic distances,
depending on whether a genetic map is specified with
the map parameter. The default value of
500 kb (ibdtrim=5e5
) assumes no genetic map
is specified and the data are from an outbred human population.
If a map file with cM distances is used with data from an
outbred human population, a minimum IBD segment length of 0.5 cM
(ibdlength=0.5
) would be appropriate.
If the reference or target data contains samples
with ancestry from more than population, you should use the
excludesamples
parameter to limit the analysis
to samples from a single population.
[ top ]
Two output files are produced:
an error file (.err) containing error rate estimates broken down by minor allele counts up to the minor allele count determined by the maxmaf parameter.
Four tab-delimited columns are printed to the error file: 1) the allele count, 2) the numerator of the estimator, 3) the denominator of the estimator, and 4) the error rate estimate.
If you split your data by chromosome, and analyze each chromosome separately, you can obtain genome-wide error rate estimates for each allele count by summing the numerator values (column 2) across chromosomes, summing the denominator values (column 3) across chromosomes, and then taking the ratio of these sums.
[ top ]
In this example, the sequence error rate is estimed for simulated sequence data with random errors.
$ # Download example VCF and IBD files:
$ wget https://bochet.gcc.biostat.washington.edu/beagle/seqerr.test.vcf.gz
$ wget https://bochet.gcc.biostat.washington.edu/beagle/seqerr.test.ibd
$ # Download the seqerr program
$ wget https://bochet.gcc.biostat.washington.edu/beagle/seqerr.jar
$ # Run seqerr to estimate the error rate
$ java -jar seqerr.jar gt=seqerr.test.vcf.gz ibd=seqerr.test.ibd out=seqerr.test
[ top ]
The SEQERR program is licensed under the Apache License, Version 2.0 (the License). You may not use the SEQERR program except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.
Some source files in the net.sf.samtools
package
are licensed under the
MIT License. See the source files for additional
license information.
The SEQERR program is distributed on an "AS IS" BASIS WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
seqerr.r1181.jar | java executable file |
seqerr.r1181.zip | source code |
[ top ]
Copyright: 2013 Brian L. Browning
Last updated: 7 Nov 2013.