Copyright (c) 2007-2010 Brian L. Browning
Email:
browning@uw.edu
This page was last updated on 13 Sep 2010
This page includes simple utility programs for manipulating text files. If you are performing analyses using BEAGLECALL or BEAGLE, you may find some of these programs to be useful for preparing input files and for working with output files. The BEAGLE utilities are written in java and run on all common computing platforms (e.g. Windows, Unix, Linux, Solaris, Mac).
All the utility programs on this web page are licensed under the Apache version 2.0 open source license. You may obtain a copy of the License from http://www.apache.org/licenses/LICENSE-2.0
base2genetic.jar converts NCBI base positions to genetic map positions.
The following usage instructions can be obtained by entering "java -jar base2genetic.jar" at the command line prompt:
usage: cat [input file] | java -jar base2genetic.jar [column] [map file] > [output file]
where
[column] = column with position to be transformed (1 = first
column).
[map file] = file with two columns: base position and genetic
map
position.
The positions must be in chromosomal order.
Lines of
white-space delimited fields are read from standard input, and numeric
data
in the specified column is transformed from base position to genetic map
position by
interpolating between data points in the map
file.
Download base2genetic.jar.
The beagle2gprobs.jar utility filters converts a BEAGLE genotypes file to a BEAGLE genotype probabilities files.
The following usage instructions can be obtained by entering "java -jar beagle2gprobs.jar" at the command line prompt:
usage: java -jar beagle2gprobs.jar [ID file] [BEAGLE file] [missing] > [output file]
where
[ID file]
= file with one line per marker and 3 white-space delimited
identifiers
per line: 1) marker, 2) allele A, and 3) allele B. The markers
in
[ID file] and [BEAGLE file] must be in the same order.
[BEAGLE file] =
BEAGLE genotypes file.
[missing] = missing
allele code used in the BEAGLE genotypes file.
[output file] = BEAGLE
genotype probabilities file. Non-missing genotypes
in
[BEAGLE file] have probability 1 for the called genotype.
Missing
genotypes have probability 0.333 for each possible genotype.
Download beagle2gprobs.jar.
The beagle2linkage.jar utility converts BEAGLE genotypes files to linkage files.
The following usage instructions can be obtained by entering "java -jar beagle2linkage.jar" at the command line prompt:
usage: cat [input file] | java -jar beagle2linkage.jar [prefix]
where
[input file] = BEAGLE genotypes
file.
[prefix] = prefix for output linkage
file name (.ped) and data file name (.dat).
The first two columns
of the BEAGLE file are printed to [prefix].dat. The remaining
columns
of the BEAGLE file are printed in linkage format. Linkage format has one
row
per sample, one column per non-marker variable, and two columns per
marker variable.
Download beagle2linkage.jar.
The gprobs2beagle.jar utility filters converts a BEAGLE genotype probabilities file to a BEAGLE genotypes file.
The following usage instructions can be obtained by entering "java -jar gprobs2beagle.jar" at the command line prompt:
usage: cat [input file] | java -jar gprobs2beagle.jar [threshold] [missing] > [output file]
where
[input file] = BEAGLE
genotype probabilities file.
[threshold] = minimum
posterior probability required to call a genotype.
[missing] = missing allele code used in output
file.
[output file] = BEAGLE genotypes file with genotype
calls.
If a genotype's most
likely call has probability less than the specified threshold,
the genotype
is set to missing in the output BEAGLE file.
Download gprobs2beagle.jar.
The linkage2beagle.jar utility converts a linkage file to a BEAGLE genotypes file.
The following usage instructions can be obtained by entering "java -jar linkage2beagles.jar" at the command line prompt:
usage: java -jar linkage2beagle.jar [data file] [ped file] > [output file]
where
[data file] = file
with first two columns of the output BEAGLE genotypes file.
[ped
file] = pedigree file with data for variables listed in [data
file].
[output file] = output BEAGLE genotypes file.
The pedigree file is
in linkage format with one row per sample, one column per
non-marker
variable, and two columns per marker variable. Marker variables
in
[data file] must have an "M" in the first column.
Download linkage2beagle.jar.
The gprobshwe.jar utility calculates exact Hardy-Weinberg equilibrium p-values for each marker in a BEAGLE genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar gprobshwe.jar" at the command line prompt:
usage: cat [input file] | java -jar gprobshwe.jar [threshold] > [output file]
where
[input
file] = BEAGLE genotype probabilities file.
[threshold] = minimum probability required to call a
genotype.
[output file] = file with six tab-delimited columns and one
line per
marker:
column 1: marker
identifier.
column 2: proportion of uncalled
genotypes.
column 3: number of AA genotype
calls.
column 4: number of AB genotype
calls.
column 5: number of BB genotype
calls.
column 6: P-value from exact test of Hardy-Weinberg equilibrium.
Download gprobshwe.jar.
The gprobsmissing.jar utility calculates the missing genotypes proportion for each marker in a BEAGLE genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar gprobsmissing.jar" at the command line prompt:
usage: cat [input file] | java -jar gprobsmissing.jar [threshold] > [output file]
where
[input
file] = BEAGLE genotype probabilities file.
[threshold] = minimum probability required to call a
genotype.
[output file] = file with two tab-delimited columns and one
line per
marker:
column 1: marker
identifier.
column 2: proportion of uncalled genotypes.
Download gprobsmissing.jar.
The gprobssamplemissing.jar utility calculates the missing genotypes proportion for each sample in a BEAGLE genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar gprobssamplemissing.jar" at the command line prompt:
usage: cat [input file] | java -jar gprobssamplemissing.jar [threshold] > [output file]
where
[input
file] = BEAGLE genotype probabilities file.
[threshold] = minimum probability required to call a
genotype.
[output file] = file with two tab-delimited columns and one
line per
sample:
column 1: sample
identifier.
column 2: proportion of uncalled genotypes.
Download gprobssamplemissing.jar.
The changecolumn.jar utility replaces values in a column of an input file.
The following usage instructions can be obtained by entering "java -jar changecolumn.jar" at the command line prompt:
usage: cat [input file] | java -jar changecolumn.jar [column] [change file] > [output file]
where
[input file] = file with
white-space delimited data. Lines of [input file] may
have
fewer than [column] fields.
[column] =
column to be changed (1 = first column).
[change file] = file with two
white-space delimited fields per line: The first
field
is the value to be replaced. The second field is the new value.
[output file] = a space-delimited output file with the specified column
changed.
Any
field in column [column] of [input file] that is not found in the
first
column of [change file] is left unchanged.
Download changecolumn.jar.
The changeline.jar utility replaces values in a line of an file.
The following
usage instructions can be obtained by entering "java -jar
changeline.jar" at the command line prompt:
usage: cat [input file] | java -jar changeline.jar [line]
[change file] > [output file]
where
[input file] = file with
white-space delimited data.
[line] = line to be changed (1 = first
line).
[change file] = file with two white-space delimited fields per
line: The first
field
is the value to be replaced. The second field is the new value.
[output file] = a space-delimited output file with the specified line
changed.
Any
field in line [line] of [input file] that is not found in the
first
column of [change file] is left unchanged.
Download changeline.jar.
The cut.jar utility extracts columns from a file.
The following usage instructions can be obtained by entering "java -jar cut.jar" at the command line prompt.
usage: cat [input file] | java -jar cut.jar a1:b1 a2:b2 ... > [output file]
where
[input file] = file with
white-space delimited columns.
[a#:b#] = first (a#) and last (b#) indices
(inclusive) of a set
of
consecutive columns to extract. The first column has index 1.
[output file] = white-space delimited file with extracted columns.
Download cut.jar.
The filtercolumns.jar utility filters lines of input data with white-space delimited fields according to the value of a specified field.
The following usage instructions can be obtained by entering "java -jar filtercolumns.jar" at the command line prompt:s
usage:
cat
[input file] | java -jar filtercolumns.jar [line] [min] [max] > [output
file]
or
cat [input file] | java -jar filtercolumns.jar [line]
[word file] > [output file]
where
[input
file] = file with space-delimited fields.
[line] = signed line number (first
line is 1 or
-1).
[line] > 0 if criteria is for
INCLUSION.
[line] < 0 if criteria is for EXCLUSION.
[min] = min field value (a
number).
[max] = max
field value (a number).
[word file] = text file with one
word per line.
[output file] = file with columns that pass
filter.
Lines with white-space
delimited fields are read from standard input.
If [line] > 0, columns
whose value on the specified [line] is between
[min] and [max] inclusive or
is equal to a word in [word file] are written
to standard output. If
[line] < 0, columns whose value on the specified
[line] is NOT between
[min] and [max] inclusive or is NOT equal to a word in
[word file] are
written to standard output. An error is thrown if any two
lines have a
differing number of white-space delimited fields.
Download filtercolumns.jar.
The filterlines.jar utility filters lines of input data with white-space delimited fields according to the value of a specified field.
The following usage instructions can be obtained by entering "java -jar filterlines.jar" at the command line prompt.
usage:
cat [input file] | java -jar
filterlines.jar [field] [min] [max] > [output file]
or
cat
[input file] | java -jar filterlines.jar [field] [word file] > [output
file]
where
[input file] = file with
space-delimited fields.
[field] =
signed field number (first column is 1 or
-1).
[field] > 0 if criteria is for
INCLUSION.
[field] < 0 if criteria is for EXCLUSION.
[min] = min field value (a
number).
[max] = max
field value (a number).
[word file] = text file with one
word per line.
[output file] = file with lines that pass
filter.
Lines are read from
standard input. If [field] > 0, lines for
which the specified field
is between [min] and [max] inclusive or is
equal to a word in [word file] are
written to standard output.
If [field] < 0, lines for which the specified
field is NOT between
[min] and [max] inclusive or is NOT equal to a word in
[word file]
are written to standard output. Lines that have fewer
fields than
the specified number of fields are not printed.
The paste.jar utility pastes together files that have shared initial columns followed by data columns.
The following usage instructions can be obtained by entering "java -jar paste.jar" at the command line prompt:
usage: java -jar paste.jar [shared columns] [file 1] [file 2] ... > [output file]
[shared
columns] = number of shared initial columns that are identical in all input
files.
[file #] = an
input file.
[output file] = space-delimited file
with the shared initial columns
followed
by the non-initial columns of [file 1], [file 2], ....
Download paste.jar.
The transpose.jar utility transposes the rows and columns of a rectangular array of white-space delimited data.
The following usage instructions can be obtained by entering "java -jar transpose.jar help" at the command line prompt:s
usage: cat [input file] | java -jar transpose.jar > [output file]
where
[input file] = file with rectangular array of white-space delimited
data.
[output file] = output file with space-delimited (" ")
transposed data.
Download transpose.jar.
The updategprobs.jar utility updates a BEAGLE genotype probabilities file with data from another BEAGLE genotype probabilities file.
The following usage instructions can be obtained by entering "java -jar updategprobs.jar" at the command line prompt:
usage: cat [in file] | java -jar updategprobs.jar [markers] [replace] > [out file]
where
[in
file] = input BEAGLE genotype probabilities file.
[markers] = a white-space delimited file whose first column contains
all
marker identifiers in [in file] and [replace] in chromosomal order.
[replace] = BEAGLE genotype probabilities file with replacement or
additional
genotype probabilities. The sample identifier lines in [in
file]
and [replace] must be identical. If a marker is present in
both
[in file] and [replace], the marker's A-allele and B-allele
identifiers
must be the same in both files.
[out file] = output BEAGLE genotype
probabilities file containing all the markers
in
[in file] and [replace]. If a marker is present in both [in file]
and
[replace] then the data in [in file] will be replaced with the data
in
[replace].
Download updategprobs.jar.
The BEAGLE utilties are open source utilities that are licensed under the Apache License, Version 2.0. You may not use the BEAGLE utilities except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
The BEAGLE utilities are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Download source code for the BEAGLE Utilities: beagle_utilities_11Sep10.src.zip.