Copyright (c) 2007-2013 Brian L. Browning
Email: browning@uw.edu
This page was last updated on 05
Feb 2013
This page includes simple utility programs for manipulating text files. If you are performing analyses using BEAGLECALL or Beagle, you may find some of these programs to be useful for preparing input files and for working with output files. The Beagle utilities are written in java and run on all common computing platforms (e.g. Windows, Unix, Linux, Solaris, Mac).
All the utility programs on this web page are licensed under the Apache version 2.0 open source license. You may obtain a copy of the License from http://www.apache.org/licenses/LICENSE-2.0
The gtstats utility calculates genotype statistics for each marker in a VCF file with GT field data.
The following usage instructions can be obtained by entering "java -jar gtstats.jar help" at the command line prompt:
usage: cat [vcf] | java -jar gtstats.jar > [out]
where
[vcf] = input VCF file.
[out] = output file with per-marker allele statistics.
One line with 15 tab-delimited fiels is written per marker:
1-8) VCF fixed fields (CHROM, POS, ID,
REF, ALT, QUAL, FILT, INFO)
9) Missing genotype count
10) Missing genotype frequency
11) Non-REF allele count
12) Non-REF allele frequency
13) Minor allele
count
(REF allele vs Non-REF alleles)
14) Minor allele
frequency (REF
allele vs Non-REF alleles)
15) HWE
P-value
(REF allele vs Non-REF alleles)
A genotype is considered missing if either allele is
missing. Missing genotypes
are not included in allele count, allele frequency, or HWE
statistics.
Download gtstats.jar.
The splitvcf utility splits a single VCF file into multiple VCF files corresponding to overlapping chromosome intervals.
The following usage instructions can be obtained by entering "java -jar splitvcf.jar" at the command line prompt:
usage: cat [vcf] | java -jar splitvcf.jar [chrom] [records] [overlap] [prefix]
where
[vcf] = input VCF file.
[chrom] = a chromosome or chromosome interval
(e.g. "20" or "20:1-800").
[records] = number of VCF records per output file.
[overlap] = number of VCF records shared between
consecutive output files.
[prefix] = output VCF file prefix.
Output files are GZIP compressed
and are named: [prefix].vcf.1.gz,
[prefix].2.vcf.gz, [prefix].3.vcf.gz, ....
Download splitvcf.jar.
The mergevcf utility smerges multiple VCF files corresponding to overlapping chromosome intervals into a single VCF file.
The following usage instructions can be obtained by entering "java -jar mergevcf.jar" at the command line prompt:
usage: java -jar mergevcf.jar [chrom] [vcf 1] [vcf 2] ... > [out vcf]
where
[chrom] = a chromosome or chromosome interval (e.g. "20"
or "20:1-800").
[vcf #] = VCF files to be merged.
[vcf] = the merged VCF file.
All input VCF files must contain
data for the same set of samples.
Input VCF files can be listed in any order and may contain data
for overlapping chromosome intervals such that the last markers
in
one VCF file are identical to the first markers in another VCF
file.
The ends of overlapping VCF files are trimmed to remove the
overlap.
Phased genotypes in overlapping VCF files are aligned using the
heterozygote genotype nearest the middle of the overlap.
Download mergevcf.jar.
The consensusvcf utilitys creates a VCF files with a consensus phasing from a set of VCF files with phased GT field data for the same samples and markers.
The following usage instructions can be obtained by entering "java -jar consensusvcf.jar" at the command line prompt:
usage: java -jar consensusvcf.jar [vcf 1] [vcf 2] ... > [consensus]
Download consensusvcf.jar.
The base2genetic utility converts NCBI base positions to genetic map positions.
The following usage instructions can be obtained by entering "java -jar base2genetic.jar" at the command line prompt:
usage: cat [input file] | java -jar base2genetic.jar [column] [map file] > [output file]
where
[column] = column with position to be
transformed (1 = first column).
[map file] = file with two columns: base position and
genetic map position.
The positions must be in chromosomal order.
Lines of white-space delimited fields are read from
standard input, and numeric data
in the specified column is transformed from base position to
genetic map position by
interpolating between data points in the map file.
Download base2genetic.jar.
The beagle2gprobs utility filters converts a Beagle v3 genotypes file to a Beagle genotype v3 probabilities files.
The following usage instructions can be obtained by entering "java -jar beagle2gprobs.jar" at the command line prompt:
usage: java -jar beagle2gprobs.jar [markers] [bgl] [missing] > [out]
Download beagle2gprobs.jar.
The beagle2linkage utility converts Beagle v3 genotypes files to linkage files.
The following usage instructions can be obtained by entering "java -jar beagle2linkage.jar" at the command line prompt:
usage: cat [bgl] | java -jar beagle2linkage.jar [prefix]
Download beagle2linkage.jar.
The beagle2vcf utility converts a Beagle v3 genotypes file to VCF format.
The following usage instructions can be obtained by entering "java -jar beagle2vcf.jar" at the command line prompt:
usage: java -jar beagle2vcf.jar [chrom] [markers] [bgl] [missing] > [vcf]Download beagle2vcf.jar
The gprobs2beagle utility filters converts a Beagle v3 genotype probabilities file to a Beagle v3 genotypes file.
The following usage instructions can be obtained by entering "java -jar gprobs2beagle.jar" at the command line prompt:
usage: cat [gprobs] | java -jar gprobs2beagle.jar [threshold] [missing] > [bgl]
Download gprobs2beagle.jar.
The linkage2beagle tility converts a linkage file to a Beagle v3 genotypes file.
The following usage instructions can be obtained by entering "java -jar linkage2beagles.jar" at the command line prompt:
usage: java -jar linkage2beagle.jar [data] [ped] > [bgl]
Download linkage2beagle.jar.
The vcf2beagle utility converts a VCF file with GT field data to a Beagle v3 genotypes file.
The following usage instructions can be obtained by entering "java -jar vcf2beagle.jar" at the command line prompt:
usage: cat [vcf file] | java -jar vcf2beagle.jar [missing] [prefix]Download vcf2beagle.jar.
The vcf2gprobs utility converts a VCF file with GP field data to a Beagle v3 genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar vcf2gprobs.jar help" at the command line prompt:
usage: cat [vcf] | java -jar vcf2gprobs.jar > [gprobs]
where
[vcf] = input file in VCF 4.1 format.
[gprobs] = Beagle version 3 genotype probabilities file.
VCF records that do not have a GP
format field or that do not have
exactly one ALT allele are omitted from the output "gprobs"
file.
Download vcf2gprobs.jar.
The gprobshwe utility calculates exact Hardy-Weinberg equilibrium p-values for each marker in a Beagle v3 genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar gprobshwe.jar" at the command line prompt:
usage: cat [gprobs] | java -jar gprobshwe.jar [threshold] > [out]Download gprobshwe.jar.
The gprobsmetrics utility calculates per-marker statistics from Beagle v3 genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar gprobsmetrics.jar help" at the command line prompt:
usage: cat [gprobs] | java -jar gprobsmetrics.jar > [out]Download gprobsmetrics.jar.
The gprobsmissing utility calculates the missing genotypes proportion for each marker in a Beagle v3 genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar gprobsmissing.jar" at the command line prompt:
usage: cat [gprobs] | java -jar gprobsmissing.jar [threshold] > [out]Download gprobsmissing.jar.
The gprobssamplemissing utility calculates the missing genotypes proportion for each sample in a Beagle v3 genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar gprobssamplemissing.jar" at the command line prompt:
usage: cat [gprobs] | java -jar gprobssamplemissing.jar [threshold] > [out]Download gprobssamplemissing.jar.
The changecolumn utility replaces values in a column of a file.
The following usage instructions can be obtained by entering "java -jar changecolumn.jar" at the command line prompt:
usage: cat [input file] | java -jar changecolumn.jar [column] [change file] > [output file]
where
[input file] = file with white-space delimited
data. Lines of [input file] may have
fewer than [column] fields.
[column] = column to be
changed (1 = first column).
[change file] = file with two white-space delimited
fields per line: The first field
is the value to be replaced. The second field is the new value.
[output file] = a space-delimited output file with the
specified column changed. Any
field in column [column] of [input file] that is not found in
the first
column of [change file] is left unchanged.
Download changecolumn.jar.
The changeline utility replaces values in a line of a file.
The
following usage instructions can be obtained by entering "java
-jar changeline.jar" at the command line prompt:
usage: cat [input file] | java -jar changeline.jar [line]
[change file] > [output file]
where
[input file] = file with white-space delimited
data.
[line] = line
to be changed (1 = first line).
[change file] = file with two white-space delimited
fields per line: The first field
is the value to be replaced. The second field is the new value.
[output file] = a space-delimited output file with the
specified line changed. Any
field in line [line] of [input file] that is not found in the
first
column of [change file] is left unchanged.
Download changeline.jar.
The cut utility extracts columns from a file.
The following usage instructions can be obtained by entering "java -jar cut.jar" at the command line prompt.
usage: cat [input file] | java -jar cut.jar a1:b1 a2:b2 ... > [output file]
where
[input file] = file with white-space delimited
columns.
[a#:b#] = first (a#)
and last (b#) indices (inclusive) of a set of
consecutive columns to extract. The first column has index
1.
[output file] = space-delimited file with extracted
columns.
Download cut.jar.
The filtercolumns utility filters lines of input data with white-space delimited fields according to the value of a specified field.
The following usage instructions can be obtained by entering "java -jar filtercolumns.jar" at the command line prompt:s
usage:
cat [input file] | java -jar filtercolumns.jar [line]
[min] [max] > [output file]
or
cat [input file] | java -jar filtercolumns.jar [line]
[word file] > [output file]
where
[input file] = file with space-delimited fields.
[line] =
signed line number (first line is 1 or -1).
[line] > 0 if criteria is for INCLUSION.
[line] < 0 if criteria is for EXCLUSION.
[min] =
min field value (a number).
[max] =
max field value (a number).
[word file] = text file with one word per
line.
[output file] = file with columns that pass filter.
Lines
with white-space delimited fields are read from standard
input.
If [line] > 0, columns whose value on the specified [line]
is between
[min] and [max] inclusive or is equal to a word in [word file]
are written
to standard output. If [line] < 0, columns whose
value on the specified
[line] is NOT between [min] and [max] inclusive or is NOT
equal to a word in
[word file] are written to standard output. An error is
thrown if any two
lines have a differing number of white-space delimited fields.
Download filtercolumns.jar.
The filterlines utility filters lines of input data with white-space delimited fields according to the value of a specified field.
The following usage instructions can be obtained by entering "java -jar filterlines.jar" at the command line prompt.
usage:
cat [input file] | java -jar filterlines.jar [field]
[min] [max] > [output file]
or
cat [input file] | java -jar filterlines.jar [field]
[word file] > [output file]
where
[input file] = file with space-delimited fields.
[field] = signed
field number (first column is 1 or -1).
[field] > 0 if criteria is for INCLUSION.
[field] < 0 if criteria is for EXCLUSION.
[min] =
min field value (a number).
[max] =
max field value (a number).
[word file] = text file with one word per
line.
[output file] = file with lines that pass filter.
Lines are read from standard
input. If [field] > 0, lines for
which the specified field is between [min] and [max] inclusive
or is
equal to a word in [word file] are written to standard output.
If [field] < 0, lines for which the specified field is NOT
between
[min] and [max] inclusive or is NOT equal to a word in [word
file]
are written to standard output. Lines that have fewer
fields than
the specified number of fields are printed if [field] < 0.
The paste sutility pastes together files that have shared initial columns followed by data columns.
The following usage instructions can be obtained by entering "java -jar paste.jar" at the command line prompt:
usage: java -jar paste.jar [shared columns] [file 1] [file 2] ... > [out]Download paste.jar.
The transpose utility transposes the rows and columns of a file.
The following usage instructions can be obtained by entering "java -jar transpose.jar help" at the command line prompt:s
usage: cat [input file] | java -jar transpose.jar > [output file]
where
[input file] = file with rectangular array of
white-space delimited data.
[output file] = output file with space-delimited ("
") transposed data.
Download transpose.jar.
The updategprobs utility updates a Beagle version 3 genotype probabilities file with data from another Beagle version 3 genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar updategprobs.jar" at the command line prompt:
usage: cat [gprobs] | java -jar updategprobs.jar [markers] [replace] > [out]
Download updategprobs.jar.
The ibdmerge utility merges Beagle version 4 IBD
files.
The following usage instructions can be obtained by entering "java -jar ibdmerge.jar" at the command line prompt:
usage: cat [ibd files] | java -jar ibdmerge.jar > [out]
Download ibdmerge.jar.
The Beagle utilties are open source utilities that are licensed under the Apache License, Version 2.0. You may not use the Beagle utilities except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
The Beagle utilities are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Download source code for the Beagle Utilities: beagle_utilities_05Feb13.src.zip.