SEQERR

Version: r1181
Email: browning@uw.edu

Contents


Introduction

The SEQERR program estimates the rate at which homozygous major allele genotypes are mis-called as heterozygote genotypes at low frequency markers. This error rate is estimated using observed allele and genotype frequencies at low frequency variants in identity by descent segments. Identity by descent segments in sequence data can be detected using the IBDseq program.

If you publish genotype error rates estimates obtained from the SEQERR program, please cite the following reference:

B L Browning and S R Browning (2013) Detecting identity by descent and estimating genotype error rates in sequence data. The American Journal of Human Genetics 93(5):840-851. doi:10.1016/j.ajhg.2013.09.014

[ top ]


Running SEQERR

To use SEQERR, enter the following command at the command line prompt:

java -jar seqerr.jar arguments

where arguments is a space-separated list of parameters, each expressed as parameter=value.

Required Parameters

gt=file
specifies the VCF file with genotype data.
ibd=file
specifies the Beagle-format IBD file. The IBDseq program can be used to generate the IBD file.
out=prefix
specifies the prefix for output filenames.

Optional Parameters

chrom=interval
specifies the chromosome interval in the format chrom or chrom:start-end. where chrom is the chromosome identifier in the VCF files, and start and end are the interval start and end positions. The start or the end value may be omitted if it corresponds to a chromosome end.
excludesamples=file
specifies a file containing samples to be excluded from the analysis (one sample identifier per line).
map=file
specifies a PLINK-format genetic map file. HapMap GRCh36 and GRCh37 genetic maps in PLINK format can be downloaded from here
ibdlength=non-negative number
specifies the minimum IBD segment length used in the analysis. The minimum IBD segment length should be sufficiently large so that the rate of false-positive IBD segments is negligible. Length units are either base-pair distances or genetic distances, depending on whether a genetic map is specified with the map parameter. The default value of 2 Mb (ibdlength=2e6) assumes no genetic map is specified and the data are from an outbred human population. If a map file with cM distances is used with data from an outbred human population, a minimum IBD segment length of 2 cM (ibdlength=2) would be appropriate.
ibdtrim=non-negative number
specifies the length to be trimmed from each end of the IBD segment segment used in the analysis. The ibdtrim parameter should be large enough so that over-estimation of IBD segment length is negligible. Length units are either base-pair distances or genetic distances, depending on whether a genetic map is specified with the map parameter. The default value of 500 kb (ibdtrim=5e5) assumes no genetic map is specified and the data are from an outbred human population. If a map file with cM distances is used with data from an outbred human population, a minimum IBD segment length of 0.5 cM (ibdlength=0.5) would be appropriate.
maxmaf=positive number less than 0.5
specifies the maximum minor allele frequency for variants used in the analysis (default: 0.02).
maxmissing=non-negative number less than 0.5
specifies the maximum missing genotype rate for variants used in the analysis (default: 0.05).

If the reference or target data contains samples with ancestry from more than population, you should use the excludesamples parameter to limit the analysis to samples from a single population.

[ top ]


Output files

Two output files are produced:

[ top ]


Example

In this example, the sequence error rate is estimed for simulated sequence data with random errors.

$ # Download example VCF and IBD files:
$ wget https://bochet.gcc.biostat.washington.edu/beagle/seqerr.test.vcf.gz
$ wget https://bochet.gcc.biostat.washington.edu/beagle/seqerr.test.ibd

$ # Download the seqerr program
$ wget https://bochet.gcc.biostat.washington.edu/beagle/seqerr.jar

$ # Run seqerr to estimate the error rate
$ java -jar seqerr.jar gt=seqerr.test.vcf.gz ibd=seqerr.test.ibd out=seqerr.test

[ top ]


Download SEQERR

The SEQERR program is licensed under the Apache License, Version 2.0 (the License). You may not use the SEQERR program except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Some source files in the net.sf.samtools package are licensed under the MIT License. See the source files for additional license information.

The SEQERR program is distributed on an "AS IS" BASIS WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

seqerr.r1181.jar java executable file
seqerr.r1181.zip source code

[ top ]

Copyright: 2013 Brian L. Browning
Last updated: 7 Nov 2013.