Documentation and program files for FLOSS version 1.4.1
Program: FLOSS (flexible ordered subset analysis)
Version: 1.4.1
Author: Brian L. Browning
Email: browning@uw.edu
All rights reserved
License
You have permission to use and develop the FLOSS and COV programs ("the
Program"), provided that the following conditions are met:
- You may copy and distribute verbatim copies of the Program's source
code as you receive it, in any medium, provided that you conspicuously
and appropriately publish on each copy an appropriate copyright notice
and disclaimer of warranty; keep intact all the notices that refer to
this License and to the absence of any warranty; and give any other
recipients of the Program a copy of this License along with the Program.
- If the FLOSS or COV software is used for analyses which will be
reported or published,you
specify the version of the software used, and cite the
article noted in the
citation section below.
- You acknowledge that Brian Browning, GlaxoSmithKline ("GSK") and
the GSK developers may develop
modifications to the software that may be substantially similar to your
modifications of the software and that Brian Browning, GSK and GSK developers
shall not be constrained in any way by you in Brian Browning's, GSK's and
GSK developers' use or management of such modifications. You acknowledge
the right of Brian Browning, GSK and GSK
developers to prepare and publish modifications to the software that may
be substantially similar or functionally equivalent to your modifications
and improvements, and if you obtain patent protection for any modification
or improvement to the software, you agree not to allege or enjoin
infringement of your patent by Brian Browning, GSK or GSK developers
- This software is provided ``AS IS'' and any express or implied
warranties, including, but not limited to, the implied warranties of
merchantability and fitness for a particular purpose are disclaimed.
In no event shall Brian Browning or GlaxoSmithKline be liable for any
direct, indirect, incidental, special, exemplary, or consequential damages
(including, but not limited to, procurement of substitute goods or
services; loss of use, data, or profits; or business interruption) however
caused and on any theory of liability, whether in contract, strict
liability, or tort (including negligence or otherwise) arising in any way
out of the use of this software, even if advised of the possibility of
such damage.
Contents
- Introduction
- Citation
- Creating FLOSS input files
- Running the FLOSS program
- FLOSS output files
- Download FLOSS
- Frequently Asked Questions
- References
Introduction
The FLOSS software package uses input and output files from the
MERLIN linkage analysis package (Abecasis et al, 2002) to perform
an ordered subset analysis using
either nonparametric linkage analysis z-scores or linear allele sharing
model LOD scores. The FLOSS program is written in java and requires
a java 1.4 interpreter.
back to contents
Citation
If you use FLOSS in your published work, please cite
Browning, BL (2006) FLOSS: Flexible ordered subsets analysis for
linkage analysis of complex traits. Bioinformatics 22(4):512-3.
If you are publishing results of an ordered subset analysis, the following
Suggested Reporting Guidelines may be helpful.
back to contents
Creating FLOSS input files
The FLOSS program requires two input files: a linkage score file and a
covariate file. The linkage score file is a MERLIN ".lod" output file
created using MERLIN with the --perFamily option. The covariate file can
be created using the MERLIN pedigree (.ped) and data (.dat) input files.
All covariates
must be identified with a 'C' in the MERLIN data file, and must be numeric (not
categorical) data.
To create the covariate file from the MERLIN pedigree and data files enter the
command:
java -jar cov.jar [options]
where [options] are combinations of the following flags and arguments:
- -d [.dat file]
- name of MERLIN data file. Required.
- -p [.ped file]
- name of MERLIN pedigree file. Required.
- -o [output file]
- name of output file. It is suggested
that the output covariate filename end in ".cov".
Required.
- -f [filter]
- subject filter.
Optional: defaults to "-f all".
- -n [number]
- minimum number
of subjects required to define a family
covariate value. The argument must be an
integer. Optional: defaults to "-n 2".
- -s [statistic]
- statistic used.
Optional: defaults to "-s avg".
A short suffix is appended to the name of each covariate that identifies
the subject filter,
mininum number of subjects, and
statistic used to define the family
covariate score. For example, if you defined the family
covariate score using the flags "-f all -n2 -s avg" for the
"age_of_onset" covariate then covariate name in the
covariate file would be "age_of_onset.avg2all".
Subject filter: -f
The subject filter specifies the subset of families members that will be
used to calculate the family covariate value. Three filters options are
available: all, aff, and FDU.
- -f all
- use all family members who have a covariate value.
- -f aff
- use all affected family members who have a covariate value.
- -f FDU
- First Degree Unaffected: use all family members who are not
affected (ie whose affection status is either unaffected or
unknown) and who have a first degree relative (parent, offspring,
or full sibling) who is affected. This filter is useful if you
are concerned that the covariate values of affected members will
be influenced by treatment for the affection.
Note: if there are multiple affection status variables specified in the
MERLIN data file, only the first affection status variable is used to
determine the subject's affection status.
Minimum number of subjects: -n
The minimum number of subjects required in order to define a family
covariate value. First the subject filter specified with the "-f" option
is applied. If there are fewer than the specified number of subjects after
the subject filter is applied, the family is assigned an unknown covariate
value ("NaN").
Statistic: -s
The statistic used to create the covariate value. Three statistics are
available: min, max, and avg.
- -s min
- minimum covariate value for the family members specified with the
subject filter argument.
- -s max
- maximum covariate value for the family members specified with the
subject filter argument.
- -s avg
- mean covariate value for the family members specified with the
subject filter argument.
The covariate file is a white-space delimited matrix of entries.
The first column contains FamilyID
followed by the family
identifiers. The first row contains FamilyID
followed by
the covariate identifiers. All other entries of the matrix give the
family covariate scores for the family (determined by the row)
and the covariate (determined by the column).
When creating the covariate file using the COV
program, an additional column __asm__
is added.
This column gives the maximum value for the allele sharing parameter
for each family when using linear allele sharing model LOD scores. The
__asm__
column is not used when using nonparametric
linkage scores.
Missing Covariate Data
An NaN
entry in the covariate file
means there was there were not enough pedigree members with covariate
data to assign a family covariate score.
If the number of individuals with covariate data in the subset of family
members specified by the
filter (-f
) parameter is less than the minimum number
specified by the -n
parameter, then the family covariate
score is reported as missing (i.e. NaN
). An ordered subset analysis
for a particular covariate uses only the families with
non-missing covariate scores for that covariate.
Choice of Covariates
Ordered subset analysis is well suited to covariates which can
discriminate between the families, but usually is not recommended
when the number of ranks due to the covariate ordering is small or when a
large number of families share the same family covariate score.
For example, one usually would not define a covariate based on the
number of females or males in the family or the number of
affected family members.
back to contents
Running the FLOSS program
Ordered subset analysis program is run using the "floss.jar" program. Enter
java -jar floss.jar [options]
where [options] are combinations of the following flags and arguments:
- -c [.cov file]
- name of covariate file.
Multiple covariate files can be analyzed in a single run by
including a separate "-c" flag before each covariate filename.
Required (see Creating FLOSS input files).
- -merlin [.lod file]
- name of MERLIN ".lod" file. Created by MERLIN when using
the
--perFamily
option. Required.
- -o [output prefix]
- prefix of output files. Required.
- -seed [integer]
- seed for random number generator. Optional: defaults to "-seed 0".
- -subsets [type]
- Type of ordered subsets used.
type must be "extreme" or "slice".
Optional: defaults to "-subsets extreme".
- -asm [interval_type]
- Type of allele sharing parameter interval used.
interval_type must be "unequal" or "equal".
Optional: defaults to "-asm unequal".
- -minperm [integer]
- minimum number of permutations for the permutation test.
Optional: defaults to "-minperm 100".
- -maxperm [integer]
- maximum number of permutations for the
permutation test. Optional: defaults to "-maxperm 10000".
- --npl
- Compute a nonparametric linkage (NPL) Z-score
statistic for each subset of families. Note
that two hypens "--" are required.
Optional: linear allele sharing model LOD scores are used
if "--npl" option is absent.
Type of ordered subsets: -subsets
- -subsets extreme
- Rank the families in order of increasing family covariate score
and perform linkage on all subsets of families with the
k
smallest or k
largest covariate scores.
The "extreme" option is the recommended option, and the default option.
- -subsets slice
- Rank the families in order of increasing family covariate score
and perform linkage on all subsets of families with consecutive
covariate scores. For example, if there are
N
families
linkage analysis is performed using families i
through
j
for 1 ≤ i &le j &le N
. This
option is discouraged since the increased number of subsets makes
it more difficult to detect disease loci associated with unusually low
or high covariate values and requires substantially more computing time.
Type of allele sharing parameter interval: -asm
- -asm unequal
- The allele sharing parameter interval for an ordered subset is
the intersection of the allele sharing parameter intervals for
each family in the ordered subset. The "unequal" option is the
default option.
.
- -asm equal
- The allele sharing parameter interval for an ordered subset
is the intersection of the allele parameter intervals for all
families. The parameter interval will be the same for all ordered
subsets.
back to contents
FLOSS output files
Ordered subset analysis produces four output files. The output filenames
have the format prefix.extension
where the prefix is the
filename prefix specified with the "-o" flag when running FLOSS and
the extension is ".out", ".fam", ".plt", or ".log"
Summary file (.out)
The summary file (.out) records the analysis options and gives summary
information for each covariate analyzed. The file reports the
change in linkage score between the entire set of families, and
the ordered subset with
the highest linkage score, the maximum linkage score for this ordered
subset, the optimal interval of family covariate scores, and the Monte
Carlo p-value with a 95% confidence interval. The summary file
is self-documented with documentation included at the end of the
".out" file.
Families file (.fam)
The ".fam" file gives the families ordered by the covariate values. The
".fam" file is arranged in sections corresponding to each covariate listed
in the Covariate file. The sections are separated by a blank line. Each
section contains three columns, and the first row in each section contains
labels for the columns. The first column is labeled "Family" and contains
the identifiers for families with defined covariate values in order of
increasing covariate value. The second column is labeled "Subset" and
contains "x" if the family in the first column is included in the ordered
subset with the highest linkage score (when maximized over all ordered
subsets and all loci). The third column is labeled with the covariate name and
gives the covariate value for the families in the first column.
Plotting file (.plt)
The ".plt" file contains linkage scores for the complete
set of families and for the optimal ordered subset
of families for each covariate at the loci in the MERLIN .lod file.
The plotting file
has a simple format that is easily read and plotted using a speadsheet
(eg. Excel) or a statistical software package
(eg. R).
The first column is labeled "Position" and contains the position of all loci
used in the ordered subset analysis. All data in each row is computed at
the position specified in the first column. The second column is labeled
"Orig_Score" and lists the linkage scores at the position specified in the
first column obtained using all families . After the first two columns, the
columns correspond to the covariates in the ordered subset analysis and are
labeled by the covariate names. Each covariate column gives the linkage
score at the position specified in the first column for the ordered subset
that maximizes the linkage score for that covariate.
The Download section of this documentation includes an
R script for plotting the linkage curves.
Log file (.log)
The ".log" file gives details for all ordered subset considered in the
ordered subset analysis. The ".log" file is arranged in sections
corresponding to each covariate listed in the Covariate file.
The first line in a section contains the name of the covariate. The next
line begins "Ordered Families:", and the following line or lines list the
identifiers for the families used in the ordered subset analysis. The
families are listed in order of increasing covariate values.
An =
between two family identifiers means the two families have
the same covariate score.
Following the ordered family identifiers are the results from each ordered
subset considered in the ordered subset analysis. The results are
presented in eight columns. Each line corresponds to a distinct ordered
subset and has the following entries (in order from left to right):
- First Fam gives the family identifier with the
smallest covariate value in the ordered subset.
- Last Fam gives the family identifier with the
largest covariate value in the ordered subset.
- Num Fams gives the number of families in the
ordered subset
- Peak gives the the locus where the highest
linkage score was observed for the ordered subset of families
specified by the first two entries (First Fam and Last Fam).
- Subset Score gives the highest linkage score
observed for the ordered subset of families specified
by the first two entries (First Fam and Last Fam).
- Orig Score gives the linkage score for the
position specified in the fourth entry (Peak) for the set
of all families.
- Subset Params gives the parameter value
that yielded the maximum linkage score for the ordered
subset of families specified by the first two entries
(First Fam and Last Fam). This entry is blank when using
nonparametric linkage analysis z-scores with the
--npl
option.
- Orig Params gives the parameter value
that yielded the maximum linkage score for the position
specified in the fourth entry (Peak) for the set
of all families. This entry is blank when using
nonparametric linkage analysis z-scores with the
--npl
option.
back to contents
The executable files for COV and FLOSS can be run using a
java 1.4 (or later) interpreter with the "-jar" flag. See
Creating FLOSS input files and
Running the FLOSS program for details.
The following files are available for viewing or download:
back to contents
Frequently Asked Questions
The list of Frequently Asked Questions answers
common questions about FLOSS and gives tips for using FLOSS.
back to contents
References
-
Abecasis GR, Cherny, SS, Cookson, WO, Cardon, LR (2002) MERLIN--rapid analysis
of dense genetic maps using sparse gene flow trees. Nat Genet 30:97-101.
- Browning, BL (2006) FLOSS: Flexible ordered subsets analysis for linkage analysis
of complex traits. Bioinformatics 22(4):512-3.
- Hauser ER, Watanabe RM, Duren WL, Bass MP, Langefeld CD, Boehnke M (2004)
Ordered subset analysis in genetic linkage mapping of complex traits. Genet
Epi 27:53-63.
- Kong A, Cox NJ (1997) Allele-sharing models: LOD scores and accurate
linkage tests. Am J Hum Genet 61:1179-1188.
- Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996) Parametric and
nonparametric linkage analysis: a unified multipoint approach. Am J Hum
Genet 53:1347-1363.
back to contents