BEAGLE Utilities

Copyright (c) 2007-2010 Brian L. Browning
Email:
browning@uw.edu
This page was last updated on 13 Sep 2010

Contents

Introduction

File converson utilities

Data QC utilities

File manipulation utilities

Genotype imputation utilities

Java Source Code


Introduction

This page includes simple utility programs for manipulating text files.  If you are performing analyses using BEAGLECALL or BEAGLE, you may find some of these programs to be useful for preparing input files and for working with output files. The BEAGLE utilities are written in java and run on all common computing platforms (e.g. Windows, Unix, Linux, Solaris, Mac).

All the utility programs on this web page are licensed under the Apache version 2.0 open source license. You may obtain a copy of the License from http://www.apache.org/licenses/LICENSE-2.0

back to contents


base2genetic.jar

Description:

base2genetic.jar converts NCBI base positions to genetic map positions.

Usage:

The following usage instructions can be obtained by entering "java -jar base2genetic.jar" at the command line prompt:

usage: cat [input file] | java -jar base2genetic.jar [column] [map file] > [output file]

where
  [column]   = column with position to be transformed (1 = first column).
  [map file] = file with two columns: base position and genetic map position.
               The positions must be in chromosomal order.

Lines of white-space delimited fields are read from standard input, and numeric data
in the specified column is transformed from base position to genetic map position by
interpolating between data points in the map file.

Notes:

  1. A map file for NCBI Build 36 can be constructed from the HapMap recombination map (http://hapmap.ncbi.nlm.nih.gov/downloads/recombination/ )
  2. The base2genetic.jar utility is useful when preparing BEAGLE markers files for homozygosity-by-descent and identity-by-descent analysis.

Download base2genetic.jar.

back to contents


beagle2gprobs.jar

Description:

The beagle2gprobs.jar  utility filters converts a BEAGLE genotypes file to a BEAGLE genotype probabilities files.

Usage:

The following usage instructions can be obtained by entering "java -jar beagle2gprobs.jar" at the command line prompt:

usage: java -jar beagle2gprobs.jar [ID file] [BEAGLE file] [missing] > [output file]

where
  [ID file]     = file with one line per marker and 3 white-space delimited identifiers
                  per line: 1) marker, 2) allele A, and 3) allele B.  The markers in
                  [ID file] and [BEAGLE file] must be in the same order.
  [BEAGLE file] = BEAGLE genotypes file.
  [missing]     = missing allele code used in the BEAGLE genotypes file.
  [output file] = BEAGLE genotype probabilities file.  Non-missing genotypes in
                  [BEAGLE file] have probability 1 for the called genotype.  Missing
                  genotypes have probability 0.333 for each possible genotype.

Notes:

  1. The beagle2gprobs.jar utility writes data to standard output.
  2. The beagle2gprobs.jar utility can be used to create initial genotype probabilities files for BEAGLECALL.

Download beagle2gprobs.jar.

back to contents


beagle2linkage.jar

Description:

The beagle2linkage.jar utility converts BEAGLE genotypes files to linkage files. 

Usage:

The following usage instructions can be obtained by entering "java -jar beagle2linkage.jar" at the command line prompt:

usage: cat [input file] | java -jar beagle2linkage.jar [prefix]

where
  [input file] = BEAGLE genotypes file.
  [prefix]     = prefix for output linkage file name (.ped) and data file name (.dat).

The first two columns of the BEAGLE file are printed to [prefix].dat.  The remaining
columns of the BEAGLE file are printed in linkage format.  Linkage format has one row
per sample, one column per non-marker variable, and two columns per marker variable.

Notes:

  1. The beagle2linkage.jar utility reads data from standard input.
  2. Linkage format is described at http://www.sph.umich.edu/csg/abecasis/QTDT/docs/pedigree.html.
  3. When processing large files, computational time may be reduced if you use a machine with several gigabytes of memory and you use the "-Xmx<MB>m" java argument between "-java" and "-jar" where "<Mb>" is the number of megabytes of available memory.
  4. The beagle2linkage.jar utility will create one or more temporary data files in your system's default temporary file directory.  You can specify a different directory for the temporary files by adding the "-Djava.io.tmp.dir<directory>" argument between "-java" and "-jar", where " <directory>" is the name of an alternate directory for storing temporary data files

Download beagle2linkage.jar.

back to contents


gprobs2beagle.jar

Description:

The gprobs2beagle.jar utility filters converts a BEAGLE genotype probabilities file to a BEAGLE genotypes file.

Usage:

The following usage instructions can be obtained by entering "java -jar gprobs2beagle.jar" at the command line prompt:

usage: cat [input file] | java -jar gprobs2beagle.jar [threshold] [missing] > [output file]

where
  [input file]   = BEAGLE genotype probabilities file.
  [threshold]    = minimum posterior probability required to call a genotype.
  [missing]      = missing allele code used in output file.
  [output file]  = BEAGLE genotypes file with genotype calls.

If a genotype's most likely call has probability less than the specified threshold,
the genotype is set to missing in the output BEAGLE file.

Notes:

  1. The gprobs2beagle.jar utility reads data from standard input and write data to standard output.
  2. The gprobs2beagle.jar utility uses the probabilities "as is" in the BEAGLE genotype probabilities file, and does not normalize the probabilities to sum to 1.
  3. The gprobs2beagle.jar utillity provides a simple way to "call" genotypes.  The called genotypes can be converted to linkage format with beagle2linkage.jar .

Download gprobs2beagle.jar.

back to contents


linkage2beagle.jar

Description:

The linkage2beagle.jar utility converts a linkage file to a BEAGLE genotypes file. 

Usage:

The following usage instructions can be obtained by entering "java -jar linkage2beagles.jar" at the command line prompt:

usage: java -jar linkage2beagle.jar [data file] [ped file] > [output file]

where
  [data file]   = file with first two columns of the output BEAGLE genotypes file.
  [ped file]    = pedigree file with data for variables listed in [data file].
  [output file] = output BEAGLE genotypes file.

The pedigree file is in linkage format with one row per sample, one column per
non-marker variable, and two columns per marker variable.  Marker variables
in [data file] must have an "M" in the first column.

Notes:

  1. The linkage2beagle.jar utility writes data to standard output.
  2. The format of the pedigree file is described at http://www.sph.umich.edu/csg/abecasis/QTDT/docs/pedigree.html.
  3. When processing large files, computational time may be reduced if you use a machine with several gigabytes of memory and use the "-Xmx<MB>m" java argument between "-java" and "-jar" where "<Mb>" is the number of megabytes of available memory.
  4. The linkage2beagle.jar utility will create one or more temporary data files in your system's default temporary file directory.  You can specify a different directory for the temporary files by adding the "-Djava.io.tmp.dir<directory>" argument between "-java" and "-jar", where " <directory>" is the name of an alternate directory for storing temporary data files.

Download linkage2beagle.jar.

back to contents


gprobshwe.jar

Description:

The gprobshwe.jar utility calculates exact Hardy-Weinberg equilibrium p-values for each marker in a BEAGLE genotype probabilities file.

Usage:

The following usage instructions can be obtained by entering "java -jar gprobshwe.jar" at the command line prompt:

usage: cat [input file] | java -jar gprobshwe.jar [threshold] > [output file]

where
  [input file]  = BEAGLE genotype probabilities file.
  [threshold]   = minimum probability required to call a genotype.
  [output file] = file with six tab-delimited columns and one line per marker:
                     column 1: marker identifier.
                     column 2: proportion of uncalled genotypes.
                     column 3: number of AA genotype calls.
                     column 4: number of AB genotype calls.
                     column 5: number of BB genotype calls.
                     column 6: P-value from exact test of Hardy-Weinberg equilibrium.

Notes:

  1. The gprobshwe.jar utility reads data from standard input and writes data to standard output.
  2. The gprobshwe.jar utility uses the probabilities "as is" in the BEAGLE genotype probabilities file, and does not normalize the genotype probabilities to sum to 1.
  3. The gprobshwe.jar utility output can be processed with filterlines.jar.
  4. The algorithm for the exact test of HWE P-value is described in the article: JE Wigginton, DJ Cutler, GR Abecasis (2005) A Note on exact tests of Hardy-Weinberg equilibrium. Am J Hum Genet 76:887-893.  This utility uses a java port of C++ code written by Jan Wigginton that is available from Goncalo Abecasis' web site: http://www.sph.umich.edu/csg/abecasis/Exact/

Download gprobshwe.jar.

back to contents


gprobsmissing.jar

Description:

The gprobsmissing.jar utility calculates the missing genotypes proportion for each marker in a BEAGLE genotype probabilities file. 

Usage:

The following usage instructions can be obtained by entering "java -jar gprobsmissing.jar" at the command line prompt:

usage: cat [input file] | java -jar gprobsmissing.jar [threshold] > [output file]

where
  [input file]  = BEAGLE genotype probabilities file.
  [threshold]   = minimum probability required to call a genotype.
  [output file] = file with two tab-delimited columns and one line per marker:
                     column 1: marker identifier.
                     column 2: proportion of uncalled genotypes.

Notes:

  1. The gprobsmissing.jar utility reads data from standard input and writes data to standard output.
  2. The gprobsmissing.jar utility uses the probabilities "as is" in the BEAGLE genotype probabilities file, and does not normalize the probabilities to sum to 1.
  3. The gprobsmissing.jar utility output can be processed with filterlines.jar. 

Download gprobsmissing.jar.

back to contents


gprobssamplemissing.jar

Description:

The gprobssamplemissing.jar utility calculates the missing genotypes proportion for each sample in a BEAGLE genotype probabilities file. 

Usage:

The following usage instructions can be obtained by entering "java -jar gprobssamplemissing.jar" at the command line prompt:

usage: cat [input file] | java -jar gprobssamplemissing.jar [threshold] > [output file]

where
  [input file]  = BEAGLE genotype probabilities file.
  [threshold]   = minimum probability required to call a genotype.
  [output file] = file with two tab-delimited columns and one line per sample:
                     column 1: sample identifier.
                     column 2: proportion of uncalled genotypes.

Notes:

  1. The gprobssamplemissing.jar utility reads data from standard input and writes data to standard output.
  2. The gprobssamplemissing.jar utility uses the probabilities "as is" in the BEAGLE genotype probabilities file, and does not normalize the probabilities to sum to 1.
  3. The gprobssamplemissing.jar utility output can be processed with filterlines.jar. 

Download gprobssamplemissing.jar.

back to contents


changecolumn.jar

Description:

The changecolumn.jar utility replaces values in a column of an input file.

Usage:

The following usage instructions can be obtained by entering "java -jar changecolumn.jar" at the command line prompt:

usage: cat [input file] | java -jar changecolumn.jar [column] [change file] > [output file]

where
  [input file]  = file with white-space delimited data.  Lines of [input file] may have
                  fewer than [column] fields.
  [column]      = column to be changed (1 = first column).
  [change file] = file with two white-space delimited fields per line:  The first field
                  is the value to be replaced. The second field is the new value.
  [output file] = a space-delimited output file with the specified column changed.  Any
                  field in column [column] of [input file] that is not found in the first
                  column of [change file] is left unchanged.

Notes:

  1. The changecolumn.jar utility reads data from standard input and write data to standard output.
  2. The changecolumn.jar utility is useful when one or more marker identifiers must be changed.

Download changecolumn.jar.

back to contents


changeline.jar

Description:

The changeline.jar utility replaces values in a line of an file.

Usage:

The following usage instructions can be obtained by entering "java -jar changeline.jar" at the command line prompt:

usage: cat [input file] | java -jar changeline.jar [line] [change file] > [output file]

where
  [input file]  = file with white-space delimited data.
  [line]        = line to be changed (1 = first line).
  [change file] = file with two white-space delimited fields per line:  The first field
                  is the value to be replaced. The second field is the new value.
  [output file] = a space-delimited output file with the specified line changed.  Any
                  field in line [line] of [input file] that is not found in the first
                  column of [change file] is left unchanged.

Notes:

  1. The changeline.jar utility reads data from standard input and write data to standard output.
  2. The changeline.jar utility is useful when one or more sample identifiers must be changed.

Download changeline.jar.

back to contents


cut.jar

Description:

The cut.jar utility extracts columns from a file.

Usage:

The following usage instructions can be obtained by entering "java -jar cut.jar" at the command line prompt.

usage: cat [input file] | java -jar cut.jar a1:b1 a2:b2 ... > [output file]

where
  [input file]  = file with white-space delimited columns.
  [a#:b#]       = first (a#) and last (b#) indices (inclusive) of a set of
                  consecutive columns to extract.  The first column has index 1.
  [output file] = white-space delimited file with extracted columns.

Notes:

  1. The cut.jar utility reads data from standard input and write data to standard output.
  2. An alternative to the cut.jar utility is the unix "cut" utility.

Download cut.jar.

back to contents


filtercolumns.jar

Description:

The filtercolumns.jar utility filters lines of input data with white-space delimited fields according to the value of a specified field.

Usage:

The following usage instructions can be obtained by entering "java -jar filtercolumns.jar" at the command line prompt:s

usage:
  cat [input file] | java -jar filtercolumns.jar [line] [min] [max] > [output file]
or
  cat [input file] | java -jar filtercolumns.jar [line] [word file] > [output file]

where
  [input file]  = file with space-delimited fields.
  [line]        = signed line number (first line is 1 or -1).
                     [line] > 0 if criteria is for INCLUSION.
                     [line] < 0 if criteria is for EXCLUSION.
  [min]         = min field value (a number).
  [max]         = max field value (a number).
  [word file]   = text file with one word per line.
  [output file] = file with columns that pass filter.

Lines with white-space delimited fields are read from standard input.
If [line] > 0, columns whose value on the specified [line] is between
[min] and [max] inclusive or is equal to a word in [word file] are written
to standard output.  If [line] < 0, columns whose value on the specified
[line] is NOT between [min] and [max] inclusive or is NOT equal to a word in
[word file] are written to standard output.  An error is thrown if any two
lines have a differing number of white-space delimited fields.

Notes:

  1. The filtercolumns.jar utility reads data from standard input and writes data to standard output.
  2. The filtercolumns.jar utility takes two arguments when filtering on string values and takes three arguments when filtering on numerical values.
  3. If you want to extract or include line instead of columns, use the filterliness.jar utility.

Download filtercolumns.jar.

back to contents


filterlines.jar

Description:

The filterlines.jar utility filters lines of input data with white-space delimited fields according to the value of a specified field.

Usage:

The following usage instructions can be obtained by entering "java -jar filterlines.jar" at the command line prompt.

usage:
  cat [input file] | java -jar filterlines.jar [field] [min] [max] > [output file]
or
  cat [input file] | java -jar filterlines.jar [field] [word file] > [output file]

where
  [input file]  = file with space-delimited fields.
  [field]       = signed field number (first column is 1 or -1).
                     [field] > 0 if criteria is for INCLUSION.
                     [field] < 0 if criteria is for EXCLUSION.
  [min]         = min field value (a number).
  [max]         = max field value (a number).
  [word file]   = text file with one word per line.
  [output file] = file with lines that pass filter.

Lines are read from standard input.  If [field] > 0, lines for
which the specified field is between [min] and [max] inclusive or is
equal to a word in [word file] are written to standard output.
If [field] < 0, lines for which the specified field is NOT between
[min] and [max] inclusive or is NOT equal to a word in [word file]
are written to standard output.  Lines that have fewer fields than
the specified number of fields are not printed.

Notes:

  1. The filterlines.jar utility reads data from standard input and writes data to standard output.
  2. The filterlines.jar utility takes two arguments when filtering on string values and takes three arguments when filtering on numerical values.
  3. If you want to extract or include columns instead of lines, use the filtercolumns.jar utility.
  4. The filterlines.jar utility is extremely useful.  I use filterlines.jar more than any other BEAGLE utility.
Download filterlines.jar.

back to contents


paste.jar

Description:

The paste.jar utility pastes together files that have shared initial columns followed by data columns.

Usage:

The following usage instructions can be obtained by entering "java -jar paste.jar" at the command line prompt:

usage: java -jar paste.jar [shared columns] [file 1] [file 2] ... > [output file]

  [shared columns] = number of shared initial columns that are identical in all input files.
  [file #]         = an input file.
  [output file]    = space-delimited file with the shared initial columns followed
                     by the non-initial columns of [file 1], [file 2], ....

Notes:

  1. The paste.jar utility writes data to standard output.

Download paste.jar.

back to contents


transpose.jar

Description:

The transpose.jar utility transposes the rows and columns of a rectangular array of white-space delimited data. 

Usage:

The following usage instructions can be obtained by entering "java -jar transpose.jar help" at the command line prompt:s

usage: cat [input file] | java -jar transpose.jar > [output file]

where
  [input file]  = file with rectangular array of white-space delimited data.
  [output file] = output file with space-delimited (" ") transposed data.

Notes:

  1. The transpose.jar utility reads data from standard input and write data to standard output.
  2. The transpose.jar utility can transpose large, gigabyte size files.  
  3. When processing large files, computational time may be reduced if you use a machine with several gigabytes of memory and use the "-Xmx<MB>m" java argument between "-java" and "-jar" where "<Mb>" is the number of megabytes of available memory.
  4. The transpose.jar utility will create one or more temporary data files in your system's default temporary file directory.  You can specify a different directory for the temporary files by adding the "-Djava.io.tmp.dir<directory>" argument between "-java" and "-jar", where " <directory>" is the name of an alternate directory for storing temporary data files.

Download transpose.jar.

back to contents


updategprobs.jar

Description:

The updategprobs.jar utility updates a BEAGLE genotype probabilities file with data from another BEAGLE genotype probabilities file. 

Usage:

The following usage instructions can be obtained by entering "java -jar updategprobs.jar" at the command line prompt:

usage: cat [in file] | java -jar updategprobs.jar [markers] [replace] > [out file]

where
  [in file]  = input BEAGLE genotype probabilities file.
  [markers]  = a white-space delimited file whose first column contains all
               marker identifiers in [in file] and [replace] in chromosomal order.
  [replace]  = BEAGLE genotype probabilities file with replacement or additional
               genotype probabilities.  The sample identifier lines in [in file]
               and [replace] must be identical.  If a marker is present in both
               [in file] and [replace], the marker's A-allele and B-allele identifiers
               must be the same in both files.
  [out file] = output BEAGLE genotype probabilities file containing all the markers in
               [in file] and [replace].  If a marker is present in both [in file] and
               [replace] then the data in [in file] will be replaced with the data in
               [replace].

Notes:

  1. The updategprobs.jar utility reads data from standard input and writes data to standard output.
  2. The updategprobs.jar utility is useful if you have multiple reference samples of different sizes genotyped on different marker sets.  You can run BEAGLE separately with each reference panel and then combine the imputed data from each run using the updategprobs.jar utility.

Download updategprobs.jar.

back to contents


Java Source Code

The BEAGLE utilties are open source utilities that are licensed under the Apache License, Version 2.0.  You may not use the BEAGLE utilities except in compliance with the License. You may obtain a copy of the License at  http://www.apache.org/licenses/LICENSE-2.0

The BEAGLE utilities are distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.   See the License for the specific language governing permissions and limitations under the License.

Download source code for the BEAGLE Utilities:  beagle_utilities_11Sep10.src.zip.

back to contents