BEAGLE Utilities

Copyright (c) 2007-2013 Brian L. Browning
Email:
browning@uw.edu
This page was last updated on 05 Feb 2013

Contents

Introduction

VCF file utilities

File conversion utilities

Data QC utilities

File manipulation utilities

Genotype imputation utilities

IBD utilities (Beagle v4 IBD files)

Java Source Code


Introduction

This page includes simple utility programs for manipulating text files.  If you are performing analyses using BEAGLECALL or Beagle, you may find some of these programs to be useful for preparing input files and for working with output files. The Beagle utilities are written in java and run on all common computing platforms (e.g. Windows, Unix, Linux, Solaris, Mac).

All the utility programs on this web page are licensed under the Apache version 2.0 open source license. You may obtain a copy of the License from http://www.apache.org/licenses/LICENSE-2.0

back to contents


gtstats.jar

Description:

The gtstats utility calculates genotype statistics for each marker in a VCF file with GT field data.

Usage:

The following usage instructions can be obtained by entering "java -jar gtstats.jar help" at the command line prompt:

usage: cat [vcf] | java -jar gtstats.jar > [out]

where
  [vcf] = input VCF file.
  [out] = output file with per-marker allele statistics.

One line with 15 tab-delimited fiels is written per marker:

 1-8)  VCF fixed fields (CHROM, POS, ID, REF, ALT, QUAL, FILT, INFO)
   9)  Missing genotype count
  10)  Missing genotype frequency
  11)  Non-REF allele count
  12)  Non-REF allele frequency
  13)  Minor allele count             (REF allele vs Non-REF alleles)
  14)  Minor allele frequency         (REF allele vs Non-REF alleles)
  15)  HWE P-value                    (REF allele vs Non-REF alleles)

A genotype is considered missing if either allele is missing.  Missing genotypes
are not included in allele count, allele frequency, or HWE statistics.

Notes:

  1. The gtstats utility writes to standard output.
  2. The gtstats utility can be used to identify markers to exclude from the analysis based on missing rate, minor allele frequency, or Hardy-Weinberg equilibrium P-value.
  3. VCF file format is described at http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41.

Download gtstats.jar.

back to contents


splitvcf.jar

Description:

The splitvcf utility splits a single VCF file into multiple VCF files corresponding to overlapping chromosome intervals.

Usage:

The following usage instructions can be obtained by entering "java -jar splitvcf.jar" at the command line prompt:

usage: cat [vcf] | java -jar splitvcf.jar [chrom] [records] [overlap] [prefix]

where
  [vcf]     = input VCF file.
  [chrom]   = a chromosome or chromosome interval (e.g. "20" or "20:1-800").
  [records] = number of VCF records per output file.
  [overlap] = number of VCF records shared between consecutive output files.
  [prefix]  = output VCF file prefix.

Output files are GZIP compressed and are named: [prefix].vcf.1.gz,
[prefix].2.vcf.gz, [prefix].3.vcf.gz, ....

Notes:

  1. The splitvcfs utility writes to standard output.
  2. The splitvcf and mergevcf utilities can be used to parallelize an analysis by chromosome segment.
  3. VCF file format is described at http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41.

Download splitvcf.jar.

back to contents


mergevcf.jar

Description:

The mergevcf utility smerges multiple VCF files corresponding to overlapping chromosome intervals into a single VCF file.

Usage:

The following usage instructions can be obtained by entering "java -jar mergevcf.jar" at the command line prompt:

usage: java -jar mergevcf.jar [chrom] [vcf 1] [vcf 2] ... > [out vcf]

where
  [chrom] = a chromosome or chromosome interval (e.g. "20" or "20:1-800").
  [vcf #] = VCF files to be merged.
  [vcf]   = the merged VCF file.

All input VCF files must contain data for the same set of samples.
Input VCF files can be listed in any order and may contain data
for overlapping chromosome intervals such that the last markers in
one VCF file are identical to the first markers in another VCF file.
The ends of overlapping VCF files are trimmed to remove the overlap.
Phased genotypes in overlapping VCF files are aligned using the
heterozygote genotype nearest the middle of the overlap.

Notes:

  1. The mergevcf.jar utility writes to standard output.
  2. The splitvcf and mergevcf utilities can be used to parallelize an analysis by chromosome segment.
  3. VCF file format is described at http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41.

Download mergevcf.jar.

back to contents


consensusvcf.jar

Description:

The consensusvcf utilitys creates a VCF files with a consensus phasing from a set of VCF files with phased GT field data for the same samples and markers.

Usage:

The following usage instructions can be obtained by entering "java -jar consensusvcf.jar" at the command line prompt:

usage: java -jar consensusvcf.jar [vcf 1] [vcf 2] ... > [consensus]

where
  [vcf #]     = VCF files with phased GT fields.
  [consensus] = VCF file with consensus phased genotypes.

Each input VCF file must contain identical samples and markers.
The consensus phased genotypes are determined by majority vote
of the phased input files.

Notes:

  1. The consensusvcf utility writes to standard output.
  2. VCF file format is described at http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41.

Download consensusvcf.jar.

back to contents


base2genetic.jar

Description:

The base2genetic utility converts NCBI base positions to genetic map positions.

Usage:

The following usage instructions can be obtained by entering "java -jar base2genetic.jar" at the command line prompt:

usage: cat [input file] | java -jar base2genetic.jar [column] [map file] > [output file]

where
  [column]   = column with position to be transformed (1 = first column).
  [map file] = file with two columns: base position and genetic map position.
               The positions must be in chromosomal order.

Lines of white-space delimited fields are read from standard input, and numeric data
in the specified column is transformed from base position to genetic map position by
interpolating between data points in the map file.

Notes:

  1. A map file for NCBI Build 36 can be constructed from the HapMap recombination map (http://hapmap.ncbi.nlm.nih.gov/downloads/recombination/ )

Download base2genetic.jar.

back to contents


beagle2gprobs.jar

Description:

The beagle2gprobs  utility filters converts a Beagle v3 genotypes file to a Beagle genotype v3 probabilities files.

Usage:

The following usage instructions can be obtained by entering "java -jar beagle2gprobs.jar" at the command line prompt:

usage: java -jar beagle2gprobs.jar [markers] [bgl] [missing] > [out]

where
  [markers] = file with one line per marker and 3 white-space delimited
              identifiers per line: 1) marker, 2) allele A, and 3) allele B.
              The markers in [markers] and [bgl] must be in the same order.
  [bgl]     = Beagle version 3 genotypes file.
  [missing] = missing allele code used in the Beagle version 3 genotypes file.
  [out]     = output Beagle version 3 genotype probabilities file.  Non-missing
              genotypes have probability 1 for the called genotype.  Missing
              genotypes have probability 0.333 for each possible genotype.

Notes:

  1. The beagle2gprobs utility writes data to standard output.
  2. The beagle2gprobs utility can be used to create initial genotype probabilities files for BEAGLECALL.

Download beagle2gprobs.jar.

back to contents


beagle2linkage.jar

Description:

The beagle2linkage utility converts Beagle v3 genotypes files to linkage files. 

Usage:

The following usage instructions can be obtained by entering "java -jar beagle2linkage.jar" at the command line prompt:

usage: cat [bgl] | java -jar beagle2linkage.jar [prefix]

where
  [bgl]    = BEAGLE version 3 genotypes file.
  [prefix] = prefix for output linkage file name (.ped) and data file name (.dat).

The first two columns of the Beagle file are printed to [prefix].dat.  The remaining
columns of the BEAGLE file are printed in linkage format.  Linkage format has one row
per sample, one column per non-marker variable, and two columns per marker variable.
BUILD SUCCESSFUL (total time: 0 seconds)

Notes:

  1. The beagle2linkage utility reads data from standard input.
  2. Linkage format is described at http://www.sph.umich.edu/csg/abecasis/QTDT/docs/pedigree.html.
  3. When processing large files, computational time may be reduced if you use a machine with several gigabytes of memory and you use the "-Xmx<MB>m" java argument between "-java" and "-jar" where "<Mb>" is the number of megabytes of available memory.
  4. The beagle2linkage utility will create one or more temporary data files in your system's default temporary file directory.  You can specify a different directory for the temporary files by adding the "-Djava.io.tmp.dir<directory>" argument between "-java" and "-jar", where "<directory>" is the name of an alternate directory for storing temporary data files

Download beagle2linkage.jar.

back to contents


beagle2vcf.jar

Description:

The beagle2vcf utility converts a Beagle v3 genotypes file to VCF format.

Usage:

The following usage instructions can be obtained by entering "java -jar beagle2vcf.jar" at the command line prompt:

usage: java -jar beagle2vcf.jar [chrom] [markers] [bgl] [missing] > [vcf]

where
  [chrom]   = chromosome identifier in output VCF file.
  [markers] = Beagle version 3 markers file.
  [bgl]     = Beagle version 3 genotypes file.
  [missing] = missing allele code in Beagle genotypes file.
  [vcf]     = output VCF file with a GT FORMAT field for each marker.

Markers in the markers file and Beagle genotypes file must be identical
and sorted in order of increasing position.  The first allele for a marker
in the markers file will be the REF allele in the output VCF file.  Alleles
in the markers file must contain only 'A', 'C', 'G', and 'T' characters

Notes:

  1. The beagle2vcf utility writes data to standard output.
  2. VCF file format is described at http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41.

Download beagle2vcf.jar

back to contents


gprobs2beagle.jar

Description:

The gprobs2beagle utility filters converts a Beagle v3 genotype probabilities file to a Beagle v3 genotypes file.

Usage:

The following usage instructions can be obtained by entering "java -jar gprobs2beagle.jar" at the command line prompt:

usage: cat [gprobs] | java -jar gprobs2beagle.jar [threshold] [missing] > [bgl]

where
  [gprobs]    = Beagle version 3 genotype probabilities file.
  [threshold] = minimum posterior probability required to call a genotype.
  [missing]   = missing allele code used in output file.
  [bgl]       = BEAGLE version 3 genotypes file.

If a genotype's most likely call has probability less than the specified
threshold, the genotype is set to missing in the output Beagle file.

Notes:

  1. The gprobs2beagle utility reads data from standard input and write data to standard output.
  2. The gprobs2beagle utility uses the probabilities "as is" in the Beagle genotype probabilities file, and does not normalize the probabilities to sum to 1.
  3. The gprobs2beagle utillity provides a simple way to "call" genotypes.  The called genotypes can be converted to linkage format with beagle2linkage.jar .

Download gprobs2beagle.jar.

back to contents


linkage2beagle.jar

Description:

The linkage2beagle tility converts a linkage file to a Beagle v3 genotypes file.

Usage:

The following usage instructions can be obtained by entering "java -jar linkage2beagles.jar" at the command line prompt:

usage: java -jar linkage2beagle.jar [data] [ped] > [bgl]

where
  [data] = file with first two columns of the output BEAGLE version 3
           genotypes file.
  [ped]  = pedigree file with data for variables listed in [data].
  [bgl]  = output BEAGLE version 3 genotypes file.

The pedigree file is in linkage format with one row per sample, one column per
non-marker variable, and two columns per marker variable.  Marker variables
in [data file] must have an "M" in the first column.

Notes:

  1. The linkage2beagle utility writes data to standard output.
  2. The format of the pedigree file is described at http://www.sph.umich.edu/csg/abecasis/QTDT/docs/pedigree.html.
  3. When processing large files, computational time may be reduced if you use a machine with several gigabytes of memory and use the "-Xmx<MB>m" java argument between "-java" and "-jar" where "<Mb>" is the number of megabytes of available memory.
  4. The linkage2beagle utility will create one or more temporary data files in your system's default temporary file directory.  You can specify a different directory for the temporary files by adding the "-Djava.io.tmp.dir<directory>" argument between "-java" and "-jar", where "<directory>" is the name of an alternate directory for storing temporary data files.

Download linkage2beagle.jar.

back to contents


vcf2beagle.jar

Description:

The vcf2beagle utility converts a VCF file with GT field data to a Beagle v3 genotypes file.

Usage:

The following usage instructions can be obtained by entering "java -jar vcf2beagle.jar" at the command line prompt:

usage: cat [vcf file] | java -jar vcf2beagle.jar [missing] [prefix]

where
  [vcf file] = input file in VCF 4.1 format.
  [missing]  = missing allele code in output Beagle version 3 gentoypes file.
  [prefix]   = prefix for Beagle version 3 output files.

vcf2beagle converts a VCF file into a genotypes file and markers file
in Beagle version 3 format.  Three files will be created with extensions:
".markers", ".bgl.gz", and ".int".  Markers with a REF or ALT allele longer
than one character will retain the integer allele codes found in the VCF file
and will have the first five VCF record fields (CHROM, POS, ID, REF, ALT)
written to the ".int" output file.  If a VCF record does not have a
GT FORMAT field or if a VCF record does not have exactly one ALT allele,
the VCF record is omitted from the output files.

Notes:

  1. VCF file format is described at http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41.
  2. The vcf2beagle utility requires diploid genotypes. Male X-chromosome genotypes with one allele must be converted to homozygous diploid genotypes.

Download vcf2beagle.jar.

back to contents


vcf2gprobs.jar

Description:

The vcf2gprobs utility converts a VCF file with GP field data to a Beagle v3 genotype probabilities file.

Usage:

The following usage instructions can be obtained by entering "java -jar vcf2gprobs.jar help" at the command line prompt:

usage: cat [vcf] | java -jar vcf2gprobs.jar > [gprobs]

where
  [vcf]    = input file in VCF 4.1 format.

  [gprobs] = Beagle version 3 genotype probabilities file.

VCF records that do not have a GP format field or that do not have
exactly one ALT allele are omitted from the output "gprobs" file.

Notes:

  1. The vcf2gprobs utility writes data to standard output.
  2. VCF file format is described at http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41< .

Download vcf2gprobs.jar.

back to contents


gprobshwe.jar

Description:

The gprobshwe utility calculates exact Hardy-Weinberg equilibrium p-values for each marker in a Beagle v3 genotype probabilities file.

Usage:

The following usage instructions can be obtained by entering "java -jar gprobshwe.jar" at the command line prompt:

usage: cat [gprobs] | java -jar gprobshwe.jar [threshold] > [out]

where
  [gprobs]    = Beagle version 3 genotype probabilities file.
  [threshold] = minimum probability required to call a genotype.
  [out]       = file with six tab-delimited columns and one line per marker:
                   column 1: marker identifier.
                   column 2: proportion of uncalled genotypes.
                   column 3: number of AA genotype calls.
                   column 4: number of AB genotype calls.
                   column 5: number of BB genotype calls.
                   column 6: exact test of Hardy-Weinberg equilibrium P-value.

Notes:

  1. The gprobshwe utility reads data from standard input and writes data to standard output.
  2. The gprobshwe utility uses the probabilities "as is" in the Beagle genotype probabilities file, and does not normalize the genotype probabilities to sum to 1.
  3. The gprobshwe utility output can be processed with filterlines.jar.
  4. The algorithm for the exact test of HWE P-value is described in the article: JE Wigginton, DJ Cutler, GR Abecasis (2005) A Note on exact tests of Hardy-Weinberg equilibrium. Am J Hum Genet 76:887-893.  This utility uses a java port of the C++ code written by Jan Wigginton that is available from Goncalo Abecasis' web site: http://www.sph.umich.edu/csg/abecasis/Exact/

Download gprobshwe.jar.

back to contents


gprobsmetrics.jar

Description:

The gprobsmetrics utility calculates per-marker statistics from Beagle v3 genotype probabilities file.

Usage:

The following usage instructions can be obtained by entering "java -jar gprobsmetrics.jar help" at the command line prompt:

usage: cat [gprobs] | java -jar gprobsmetrics.jar > [out]

where
  [gprobs] = Beagle version 3 genotype probabilities file
  [out]    = file with one line per marker and eight fields per line:
                    1) marker identifier
                    2) minor allele
                    3) minor allele frequency
                    4) allelic r-squared
                    5) dosage r-squared
                    6) HWE dosage r-squared
                    7) accuracy
                    8) missing score


Notes:
  a) Allelic r-squared is the estimated squared correlation between the most likely
     allele dosage and the true allele dosage.
  b) Dosage r-squared is the estimated squared correlation between the estimated
     allele dosage (0*P(AA) + 1*P(AB) + 2*P(BB)) and the true allele dosage.
  c) HWE dosage r-squared assuming is the estimated squared correlation between the
     estimated allele dosage and the true allele dosage when the variance
     of the true allele dosage is calculated from the estimated allele frequency.
  d) Accuracy is the mean posterior probability of the most likely genotypes.
  e) Missing score is the infinum of the C > 0 such that the proportion of genotypes
     with probability < C is greater than or equal to (1.0 - C).
  f) Monomorphic markers are assigned r-squared values of 0, and accuracy and missing score
     values of 1.
  g) The squared correlation metrics can be derived using arguments found in Appendix 1
     of "Browning BL and Browning SR, Am J Hum Genet 2009;84(2):210-23".
  h) The allelic r-squared metric is reported by Beagle version 3, and the dosage
     r-squared metric is reported by MACH.

Notes:

  1. The gprobsmetrics utility reads data from standard input and writes data to standard output.
  2. The gprobsmetrics utility output can be processed with filterlines.jar. 

Download gprobsmetrics.jar.

back to contents


gprobsmissing.jar

Description:

The gprobsmissing utility calculates the missing genotypes proportion for each marker in a Beagle v3 genotype probabilities file. 

Usage:

The following usage instructions can be obtained by entering "java -jar gprobsmissing.jar" at the command line prompt:

usage: cat [gprobs] | java -jar gprobsmissing.jar [threshold] > [out]

where
  [gprobs]    = Beagle version 3 genotype probabilities file.
  [threshold] = minimum probability required to call a genotype.
  [out]       = file with two tab-delimited columns and one line per marker:
                     column 1: marker identifier.
                     column 2: proportion of uncalled genotypes.

Notes:

  1. The gprobsmissing utility reads data from standard input and writes data to standard output.
  2. The gprobsmissing utility uses the probabilities "as is" in the Beagle genotype probabilities file, and does not normalize the probabilities to sum to 1.
  3. The gprobsmissing utility output can be processed with filterlines.jar. 

Download gprobsmissing.jar.

back to contents


gprobssamplemissing.jar

Description:

The gprobssamplemissing utility calculates the missing genotypes proportion for each sample in a Beagle v3 genotype probabilities file. 

Usage:

The following usage instructions can be obtained by entering "java -jar gprobssamplemissing.jar" at the command line prompt:

usage: cat [gprobs] | java -jar gprobssamplemissing.jar [threshold] > [out]

where
  [gprobs]    = Beagle version 3 genotype probabilities file.
  [threshold] = minimum probability required to call a genotype.
  [out]       = file with two tab-delimited columns and one line per sample:
                  column 1: sample identifier.
                  column 2: proportion of uncalled genotypes.

Notes:

  1. The gprobssamplemissing utility reads data from standard input and writes data to standard output.
  2. The gprobssamplemissing utility uses the probabilities "as is" in the Beagle genotype probabilities file, and does not normalize the probabilities to sum to 1.
  3. The gprobssamplemissing utility output can be processed with filterlines.jar. 

Download gprobssamplemissing.jar.

back to contents


changecolumn.jar

Description:

The changecolumn utility replaces values in a column of a file.

Usage:

The following usage instructions can be obtained by entering "java -jar changecolumn.jar" at the command line prompt:

usage: cat [input file] | java -jar changecolumn.jar [column] [change file] > [output file]

where
  [input file]  = file with white-space delimited data.  Lines of [input file] may have
                  fewer than [column] fields.
  [column]      = column to be changed (1 = first column).
  [change file] = file with two white-space delimited fields per line:  The first field
                  is the value to be replaced. The second field is the new value.
  [output file] = a space-delimited output file with the specified column changed.  Any
                  field in column [column] of [input file] that is not found in the first
                  column of [change file] is left unchanged.

Notes:

  1. The changecolumn utility reads data from standard input and write data to standard output.
  2. The changecolumn utility is useful when one or more marker identifiers must be changed.

Download changecolumn.jar.

back to contents


changeline.jar

Description:

The changeline utility replaces values in a line of a file.

Usage:

The following usage instructions can be obtained by entering "java -jar changeline.jar" at the command line prompt:

usage: cat [input file] | java -jar changeline.jar [line] [change file] > [output file]

where
  [input file]  = file with white-space delimited data.
  [line]        = line to be changed (1 = first line).
  [change file] = file with two white-space delimited fields per line:  The first field
                  is the value to be replaced. The second field is the new value.
  [output file] = a space-delimited output file with the specified line changed.  Any
                  field in line [line] of [input file] that is not found in the first
                  column of [change file] is left unchanged.

Notes:

  1. The changeline utility reads data from standard input and write data to standard output.
  2. The changeline utility is useful when one or more sample identifiers must be changed.

Download changeline.jar.

back to contents


cut.jar

Description:

The cut utility extracts columns from a file.

Usage:

The following usage instructions can be obtained by entering "java -jar cut.jar" at the command line prompt.

usage: cat [input file] | java -jar cut.jar a1:b1 a2:b2 ... > [output file]

where
  [input file]  = file with white-space delimited columns.
  [a#:b#]       = first (a#) and last (b#) indices (inclusive) of a set of
                  consecutive columns to extract.  The first column has index 1.
  [output file] = space-delimited file with extracted columns.

Notes:

  1. The cut utility reads data from standard input and write data to standard output.
  2. The cut utility can be used to duplicate columns or change the order of columns.
  3. If you only want to cut columns from a file, you can also use to the unix "cut" utility.

Download cut.jar.

back to contents


filtercolumns.jar

Description:

The filtercolumns utility filters lines of input data with white-space delimited fields according to the value of a specified field.

Usage:

The following usage instructions can be obtained by entering "java -jar filtercolumns.jar" at the command line prompt:s

usage:
  cat [input file] | java -jar filtercolumns.jar [line] [min] [max] > [output file]
or
  cat [input file] | java -jar filtercolumns.jar [line] [word file] > [output file]

where
  [input file]  = file with space-delimited fields.
  [line]        = signed line number (first line is 1 or -1).
                     [line] > 0 if criteria is for INCLUSION.
                     [line] < 0 if criteria is for EXCLUSION.
  [min]         = min field value (a number).
  [max]         = max field value (a number).
  [word file]   = text file with one word per line.
  [output file] = file with columns that pass filter.

Lines with white-space delimited fields are read from standard input.
If [line] > 0, columns whose value on the specified [line] is between
[min] and [max] inclusive or is equal to a word in [word file] are written
to standard output.  If [line] < 0, columns whose value on the specified
[line] is NOT between [min] and [max] inclusive or is NOT equal to a word in
[word file] are written to standard output.  An error is thrown if any two
lines have a differing number of white-space delimited fields.

Notes:

  1. The filtercolumns utility reads data from standard input and writes data to standard output.
  2. The filtercolumns utility takes two arguments when filtering on string values and takes three arguments when filtering on numerical values.
  3. If you want to extract or include line instead of columns, use the filterlines utility.

Download filtercolumns.jar.

back to contents


filterlines.jar

Description:

The filterlines utility filters lines of input data with white-space delimited fields according to the value of a specified field.

Usage:

The following usage instructions can be obtained by entering "java -jar filterlines.jar" at the command line prompt.

usage:
  cat [input file] | java -jar filterlines.jar [field] [min] [max] > [output file]
or
  cat [input file] | java -jar filterlines.jar [field] [word file] > [output file]

where
  [input file]  = file with space-delimited fields.
  [field]       = signed field number (first column is 1 or -1).
                     [field] > 0 if criteria is for INCLUSION.
                     [field] < 0 if criteria is for EXCLUSION.
  [min]         = min field value (a number).
  [max]         = max field value (a number).
  [word file]   = text file with one word per line.
  [output file] = file with lines that pass filter.

Lines are read from standard input.  If [field] > 0, lines for
which the specified field is between [min] and [max] inclusive or is
equal to a word in [word file] are written to standard output.
If [field] < 0, lines for which the specified field is NOT between
[min] and [max] inclusive or is NOT equal to a word in [word file]
are written to standard output.  Lines that have fewer fields than
the specified number of fields are printed if [field] < 0.

Notes:

  1. The filterlines utility reads data from standard input and writes data to standard output.
  2. The filterlines utility takes two arguments when filtering on string values and takes three arguments when filtering on numerical values.
  3. If you want to extract or include columns instead of lines, use the filtercolumns utility.
Download filterlines.jar.

back to contents


paste.jar

Description:

The paste sutility pastes together files that have shared initial columns followed by data columns.

Usage:

The following usage instructions can be obtained by entering "java -jar paste.jar" at the command line prompt:

usage: java -jar paste.jar [shared columns] [file 1] [file 2] ... > [out]

  [shared columns] = number of shared initial columns that are identical
                     in all input files.
  [file #]         = an input file.
  [out]            = space-delimited file with shared initial columns followed
                     by the non-initial columns of [file 1], [file 2], ....

Notes:

  1. The paste utility writes data to standard output.

Download paste.jar.

back to contents


transpose.jar

Description:

The transpose utility transposes the rows and columns of a file. 

Usage:

The following usage instructions can be obtained by entering "java -jar transpose.jar help" at the command line prompt:s

usage: cat [input file] | java -jar transpose.jar > [output file]

where
  [input file]  = file with rectangular array of white-space delimited data.
  [output file] = output file with space-delimited (" ") transposed data.

Notes:

  1. The transpose utility reads data from standard input and write data to standard output.
  2. The transpose utility can transpose large, gigabyte size files.  
  3. When processing large files, computational time may be reduced if you use a machine with several gigabytes of memory and use the "-Xmx<MB>m" java argument between "-java" and "-jar" where "<Mb>" is the number of megabytes of available memory.
  4. The transpose utility will create one or more temporary data files in your system's default temporary file directory.  You can specify a different directory for the temporary files by adding the "-Djava.io.tmp.dir<directory>" argument between "-java" and "-jar", where " <directory>" is the name of an alternate directory for storing temporary data files.

Download transpose.jar.

back to contents


updategprobs.jar

Description:

The updategprobs utility updates a Beagle version 3 genotype probabilities file with data from another Beagle version 3 genotype probabilities file. 

Usage:

The following usage instructions can be obtained by entering "java -jar updategprobs.jar" at the command line prompt:

usage: cat [gprobs] | java -jar updategprobs.jar [markers] [replace] > [out]

where
  [gprobs]  = input Beagle version 3 genotype probabilities file.
  [markers] = a white-space delimited file whose first column contains all
              marker identifiers in [gprobs] and [replace] in chromosomal order.
  [replace] = Beagle version 3 genotype probabilities file with replacement or
              additional genotype probabilities.  The sample identifier lines in
              [gprobs] and [replace] must be identical.  If a marker is present
              in both [gprobs] and [replace], the marker's A and B alleles must
              be the same in both files.
  [out]     = output Beagle version 3 genotype probabilities file containing all
              the markers in [gprobs] and [replace].  If a marker is present in
              both [gprobs] and [replace] then the data in [gprobs] will be
              replaced with the data in [replace].

Notes:

  1. The updategprobs utility reads data from standard input and writes data to standard output.
  2. The updategprobs utility is useful if you have multiple reference panels of different sizes genotyped on different marker sets.  You can run Beagle version 3 separately with each reference panel and then combine the imputed data from each run using the updategprobs.jar utility.

Download updategprobs.jar.

back to contents


ibdmerge.jar

Description:

The ibdmerge utility merges Beagle version 4 IBD files.

Usage:

The following usage instructions can be obtained by entering "java -jar ibdmerge.jar" at the command line prompt:

usage: cat [ibd files] | java -jar ibdmerge.jar > [out]

where
  [ibd files] = space-delimited list of Beagle version 4 IBD files to merge.
  [out]       = Beagle version 4 IBD file with merged IBD segments.

If IBD segments overlap, the merged segment score is the maximum segment
score, and the merged segment haplotype indices are 0.

Notes:

  1. The ibdmerge utility reads data from standard input and writes data to standard output.
  2. This utility will not merge Beagle version 3 IBD files.

Download ibdmerge.jar.

back to contents


Java Source Code

The Beagle utilties are open source utilities that are licensed under the Apache License, Version 2.0.  You may not use the Beagle utilities except in compliance with the License. You may obtain a copy of the License at  http://www.apache.org/licenses/LICENSE-2.0

The Beagle utilities are distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.   See the License for the specific language governing permissions and limitations under the License.

Download source code for the Beagle Utilities:  beagle_utilities_05Feb13.src.zip.

back to contents