Copyright (c) 2007-2013 Brian L. Browning 
        Email: browning@uw.edu 
        This page was last updated on 05
        Feb 2013 
This page includes simple utility programs for manipulating text files. If you are performing analyses using BEAGLECALL or Beagle, you may find some of these programs to be useful for preparing input files and for working with output files. The Beagle utilities are written in java and run on all common computing platforms (e.g. Windows, Unix, Linux, Solaris, Mac).
All the utility programs on this web page are licensed under the Apache version 2.0 open source license. You may obtain a copy of the License from http://www.apache.org/licenses/LICENSE-2.0
The gtstats utility calculates genotype statistics for each marker in a VCF file with GT field data.
The following usage instructions can be obtained by entering "java -jar gtstats.jar help" at the command line prompt:
usage: cat [vcf] | java -jar gtstats.jar > [out]
where
            [vcf] = input VCF file.
            [out] = output file with per-marker allele statistics.
One line with 15 tab-delimited fiels is written per marker:
 1-8)  VCF fixed fields (CHROM, POS, ID,
          REF, ALT, QUAL, FILT, INFO)
             9)  Missing genotype count
            10)  Missing genotype frequency
            11)  Non-REF allele count
            12)  Non-REF allele frequency
            13)  Minor allele
          count            
          (REF allele vs Non-REF alleles)
            14)  Minor allele
          frequency         (REF
          allele vs Non-REF alleles)
            15)  HWE
          P-value                   
          (REF allele vs Non-REF alleles)
A genotype is considered missing if either allele is
          missing.  Missing genotypes
          are not included in allele count, allele frequency, or HWE
          statistics.
Download gtstats.jar.
The splitvcf utility splits a single VCF file into multiple VCF files corresponding to overlapping chromosome intervals.
The following usage instructions can be obtained by entering "java -jar splitvcf.jar" at the command line prompt:
usage: cat [vcf] | java -jar splitvcf.jar [chrom] [records] [overlap] [prefix]
where
          [vcf]     = input VCF file.
          [chrom]   = a chromosome or chromosome interval
        (e.g. "20" or "20:1-800").
          [records] = number of VCF records per output file.
          [overlap] = number of VCF records shared between
        consecutive output files.
          [prefix]  = output VCF file prefix.
Output files are GZIP compressed
        and are named: [prefix].vcf.1.gz,
        [prefix].2.vcf.gz, [prefix].3.vcf.gz, ....
Download splitvcf.jar.
The mergevcf utility smerges multiple VCF files corresponding to overlapping chromosome intervals into a single VCF file.
The following usage instructions can be obtained by entering "java -jar mergevcf.jar" at the command line prompt:
usage: java -jar mergevcf.jar [chrom] [vcf 1] [vcf 2] ... > [out vcf]
where
          [chrom] = a chromosome or chromosome interval (e.g. "20"
        or "20:1-800").
          [vcf #] = VCF files to be merged.
          [vcf]   = the merged VCF file.
All input VCF files must contain
        data for the same set of samples.
        Input VCF files can be listed in any order and may contain data
        for overlapping chromosome intervals such that the last markers
        in
        one VCF file are identical to the first markers in another VCF
        file.
        The ends of overlapping VCF files are trimmed to remove the
        overlap.
        Phased genotypes in overlapping VCF files are aligned using the
        heterozygote genotype nearest the middle of the overlap.
Download mergevcf.jar.
The consensusvcf utilitys creates a VCF files with a consensus phasing from a set of VCF files with phased GT field data for the same samples and markers.
The following usage instructions can be obtained by entering "java -jar consensusvcf.jar" at the command line prompt:
usage: java -jar consensusvcf.jar [vcf 1] [vcf 2] ... > [consensus]
Download consensusvcf.jar.
The base2genetic utility converts NCBI base positions to genetic map positions.
The following usage instructions can be obtained by entering "java -jar base2genetic.jar" at the command line prompt:
usage: cat [input file] | java -jar base2genetic.jar [column] [map file] > [output file]
where
            [column]   = column with position to be
          transformed (1 = first column).
            [map file] = file with two columns: base position and
          genetic map position.
                        
          The positions must be in chromosomal order.
Lines of white-space delimited fields are read from
          standard input, and numeric data
          in the specified column is transformed from base position to
          genetic map position by
          interpolating between data points in the map file.
Download base2genetic.jar.
The beagle2gprobs utility filters converts a Beagle v3 genotypes file to a Beagle genotype v3 probabilities files.
The following usage instructions can be obtained by entering "java -jar beagle2gprobs.jar" at the command line prompt:
usage: java -jar beagle2gprobs.jar [markers] [bgl] [missing] > [out]
Download beagle2gprobs.jar.
The beagle2linkage utility converts Beagle v3 genotypes files to linkage files.
The following usage instructions can be obtained by entering "java -jar beagle2linkage.jar" at the command line prompt:
usage: cat [bgl] | java -jar beagle2linkage.jar [prefix]
Download beagle2linkage.jar.
The beagle2vcf utility converts a Beagle v3 genotypes file to VCF format.
The following usage instructions can be obtained by entering "java -jar beagle2vcf.jar" at the command line prompt:
usage: java -jar beagle2vcf.jar [chrom] [markers] [bgl] [missing] > [vcf]Download beagle2vcf.jar
The gprobs2beagle utility filters converts a Beagle v3 genotype probabilities file to a Beagle v3 genotypes file.
The following usage instructions can be obtained by entering "java -jar gprobs2beagle.jar" at the command line prompt:
usage: cat [gprobs] | java -jar gprobs2beagle.jar [threshold] [missing] > [bgl]
Download gprobs2beagle.jar.
The linkage2beagle tility converts a linkage file to a Beagle v3 genotypes file.
The following usage instructions can be obtained by entering "java -jar linkage2beagles.jar" at the command line prompt:
usage: java -jar linkage2beagle.jar [data] [ped] > [bgl]
Download linkage2beagle.jar.
The vcf2beagle utility converts a VCF file with GT field data to a Beagle v3 genotypes file.
The following usage instructions can be obtained by entering "java -jar vcf2beagle.jar" at the command line prompt:
usage: cat [vcf file] | java -jar vcf2beagle.jar [missing] [prefix]Download vcf2beagle.jar.
The vcf2gprobs utility converts a VCF file with GP field data to a Beagle v3 genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar vcf2gprobs.jar help" at the command line prompt:
usage: cat [vcf] | java -jar vcf2gprobs.jar > [gprobs]
where
          [vcf]    = input file in VCF 4.1 format.
[gprobs] = Beagle version 3 genotype probabilities file.
VCF records that do not have a GP
        format field or that do not have
        exactly one ALT allele are omitted from the output "gprobs"
        file.
Download vcf2gprobs.jar.
The gprobshwe utility calculates exact Hardy-Weinberg equilibrium p-values for each marker in a Beagle v3 genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar gprobshwe.jar" at the command line prompt:
usage: cat [gprobs] | java -jar gprobshwe.jar [threshold] > [out]Download gprobshwe.jar.
The gprobsmetrics utility calculates per-marker statistics from Beagle v3 genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar gprobsmetrics.jar help" at the command line prompt:
usage: cat [gprobs] | java -jar gprobsmetrics.jar > [out]Download gprobsmetrics.jar.
The gprobsmissing utility calculates the missing genotypes proportion for each marker in a Beagle v3 genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar gprobsmissing.jar" at the command line prompt:
usage: cat [gprobs] | java -jar gprobsmissing.jar [threshold] > [out]Download gprobsmissing.jar.
The gprobssamplemissing utility calculates the missing genotypes proportion for each sample in a Beagle v3 genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar gprobssamplemissing.jar" at the command line prompt:
usage: cat [gprobs] | java -jar gprobssamplemissing.jar [threshold] > [out]Download gprobssamplemissing.jar.
The changecolumn utility replaces values in a column of a file.
The following usage instructions can be obtained by entering "java -jar changecolumn.jar" at the command line prompt:
usage: cat [input file] | java -jar changecolumn.jar [column] [change file] > [output file]
where
          [input file]  = file with white-space delimited
        data.  Lines of [input file] may have
                         
        fewer than [column] fields.
          [column]      = column to be
        changed (1 = first column).
          [change file] = file with two white-space delimited
        fields per line:  The first field
                         
        is the value to be replaced. The second field is the new value.
          [output file] = a space-delimited output file with the
        specified column changed.  Any
                         
        field in column [column] of [input file] that is not found in
        the first
                         
        column of [change file] is left unchanged.
Download changecolumn.jar.
The changeline utility replaces values in a line of a file.
The
            following usage instructions can be obtained by entering "java
          -jar changeline.jar" at the command line prompt:
      
        usage: cat [input file] | java -jar changeline.jar [line]
        [change file] > [output file] 
where
          [input file]  = file with white-space delimited
        data.
          [line]        = line
        to be changed (1 = first line).
          [change file] = file with two white-space delimited
        fields per line:  The first field
                         
        is the value to be replaced. The second field is the new value.
          [output file] = a space-delimited output file with the
        specified line changed.  Any
                         
        field in line [line] of [input file] that is not found in the
        first
                         
        column of [change file] is left unchanged.
Download changeline.jar.
The cut utility extracts columns from a file.
The following usage instructions can be obtained by entering "java -jar cut.jar" at the command line prompt.
usage: cat [input file] | java -jar cut.jar a1:b1 a2:b2 ... > [output file]
where
          [input file]  = file with white-space delimited
        columns.
          [a#:b#]       = first (a#)
        and last (b#) indices (inclusive) of a set of
                         
        consecutive columns to extract.  The first column has index
        1.
          [output file] = space-delimited file with extracted
        columns.
Download cut.jar.
The filtercolumns utility filters lines of input data with white-space delimited fields according to the value of a specified field.
The following usage instructions can be obtained by entering "java -jar filtercolumns.jar" at the command line prompt:s
usage:
          
            cat [input file] | java -jar filtercolumns.jar [line]
          [min] [max] > [output file]
          or
            cat [input file] | java -jar filtercolumns.jar [line]
          [word file] > [output file] 
where
            [input file]  = file with space-delimited fields.
            [line]        =
          signed line number (first line is 1 or -1).
                              
          [line] > 0 if criteria is for INCLUSION.
                              
          [line] < 0 if criteria is for EXCLUSION.
            [min]         =
          min field value (a number).
            [max]         =
          max field value (a number).
            [word file]   = text file with one word per
          line.
            [output file] = file with columns that pass filter.
      
Lines
          with white-space delimited fields are read from standard
          input.
          If [line] > 0, columns whose value on the specified [line]
          is between
          [min] and [max] inclusive or is equal to a word in [word file]
          are written
          to standard output.  If [line] < 0, columns whose
          value on the specified 
          [line] is NOT between [min] and [max] inclusive or is NOT
          equal to a word in
          [word file] are written to standard output.  An error is
          thrown if any two
          lines have a differing number of white-space delimited fields.
      
Download filtercolumns.jar.
The filterlines utility filters lines of input data with white-space delimited fields according to the value of a specified field.
The following usage instructions can be obtained by entering "java -jar filterlines.jar" at the command line prompt.
usage:
          cat [input file] | java -jar filterlines.jar [field]
        [min] [max] > [output file]
        or
          cat [input file] | java -jar filterlines.jar [field]
        [word file] > [output file]
where
          [input file]  = file with space-delimited fields.
          [field]       = signed
        field number (first column is 1 or -1).
                            
        [field] > 0 if criteria is for INCLUSION.
                            
        [field] < 0 if criteria is for EXCLUSION.
          [min]         =
        min field value (a number).
          [max]         =
        max field value (a number).
          [word file]   = text file with one word per
        line.
          [output file] = file with lines that pass filter.
Lines are read from standard
        input.  If [field] > 0, lines for
        which the specified field is between [min] and [max] inclusive
        or is
        equal to a word in [word file] are written to standard output.
        If [field] < 0, lines for which the specified field is NOT
        between
        [min] and [max] inclusive or is NOT equal to a word in [word
        file]
        are written to standard output.  Lines that have fewer
        fields than
        the specified number of fields are printed if [field] < 0.
The paste sutility pastes together files that have shared initial columns followed by data columns.
The following usage instructions can be obtained by entering "java -jar paste.jar" at the command line prompt:
usage: java -jar paste.jar [shared columns] [file 1] [file 2] ... > [out]Download paste.jar.
The transpose utility transposes the rows and columns of a file.
The following usage instructions can be obtained by entering "java -jar transpose.jar help" at the command line prompt:s
usage: cat [input file] | java -jar transpose.jar > [output file]
where
              [input file]  = file with rectangular array of
            white-space delimited data.
              [output file] = output file with space-delimited ("
            ") transposed data. 
Download transpose.jar.
The updategprobs utility updates a Beagle version 3 genotype probabilities file with data from another Beagle version 3 genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar updategprobs.jar" at the command line prompt:
usage: cat [gprobs] | java -jar updategprobs.jar [markers] [replace] > [out]
Download updategprobs.jar.
The ibdmerge utility merges Beagle version 4 IBD
          files.
    
The following usage instructions can be obtained by entering "java -jar ibdmerge.jar" at the command line prompt:
usage: cat [ibd files] | java -jar ibdmerge.jar > [out]
Download ibdmerge.jar.
The Beagle utilties are open source utilities that are licensed under the Apache License, Version 2.0. You may not use the Beagle utilities except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
The Beagle utilities are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Download source code for the Beagle Utilities: beagle_utilities_05Feb13.src.zip.