Documentation and program files for FLOSS version 1.4.1

Program: FLOSS (flexible ordered subset analysis)
Version: 1.4.1
Author: Brian L. Browning
Email: browning@uw.edu

All rights reserved

License

You have permission to use and develop the FLOSS and COV programs ("the Program"), provided that the following conditions are met:

  1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.
  2. If the FLOSS or COV software is used for analyses which will be reported or published,you specify the version of the software used, and cite the article noted in the citation section below.
  3. You acknowledge that Brian Browning, GlaxoSmithKline ("GSK") and the GSK developers may develop modifications to the software that may be substantially similar to your modifications of the software and that Brian Browning, GSK and GSK developers shall not be constrained in any way by you in Brian Browning's, GSK's and GSK developers' use or management of such modifications. You acknowledge the right of Brian Browning, GSK and GSK developers to prepare and publish modifications to the software that may be substantially similar or functionally equivalent to your modifications and improvements, and if you obtain patent protection for any modification or improvement to the software, you agree not to allege or enjoin infringement of your patent by Brian Browning, GSK or GSK developers
  4. This software is provided ``AS IS'' and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall Brian Browning or GlaxoSmithKline be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

Contents

  1. Introduction
  2. Citation
  3. Creating FLOSS input files
  4. Running the FLOSS program
  5. FLOSS output files
  6. Download FLOSS
  7. Frequently Asked Questions
  8. References

Introduction

The FLOSS software package uses input and output files from the MERLIN linkage analysis package (Abecasis et al, 2002) to perform an ordered subset analysis using either nonparametric linkage analysis z-scores or linear allele sharing model LOD scores. The FLOSS program is written in java and requires a java 1.4 interpreter.

back to contents

Citation

If you use FLOSS in your published work, please cite

Browning, BL (2006) FLOSS: Flexible ordered subsets analysis for linkage analysis of complex traits. Bioinformatics 22(4):512-3.

If you are publishing results of an ordered subset analysis, the following Suggested Reporting Guidelines may be helpful.

back to contents

Creating FLOSS input files

The FLOSS program requires two input files: a linkage score file and a covariate file. The linkage score file is a MERLIN ".lod" output file created using MERLIN with the --perFamily option. The covariate file can be created using the MERLIN pedigree (.ped) and data (.dat) input files. All covariates must be identified with a 'C' in the MERLIN data file, and must be numeric (not categorical) data.

To create the covariate file from the MERLIN pedigree and data files enter the command:

java -jar cov.jar [options]

where [options] are combinations of the following flags and arguments:

-d [.dat file]
name of MERLIN data file. Required.
-p [.ped file]
name of MERLIN pedigree file. Required.
-o [output file]
name of output file. It is suggested that the output covariate filename end in ".cov". Required.
-f [filter]
subject filter. Optional: defaults to "-f all".
-n [number]
minimum number of subjects required to define a family covariate value. The argument must be an integer. Optional: defaults to "-n 2".
-s [statistic]
statistic used. Optional: defaults to "-s avg".

A short suffix is appended to the name of each covariate that identifies the subject filter, mininum number of subjects, and statistic used to define the family covariate score. For example, if you defined the family covariate score using the flags "-f all -n2 -s avg" for the "age_of_onset" covariate then covariate name in the covariate file would be "age_of_onset.avg2all".

Subject filter: -f

The subject filter specifies the subset of families members that will be used to calculate the family covariate value. Three filters options are available: all, aff, and FDU.

-f all
use all family members who have a covariate value.
-f aff
use all affected family members who have a covariate value.
-f FDU
First Degree Unaffected: use all family members who are not affected (ie whose affection status is either unaffected or unknown) and who have a first degree relative (parent, offspring, or full sibling) who is affected. This filter is useful if you are concerned that the covariate values of affected members will be influenced by treatment for the affection.

Note: if there are multiple affection status variables specified in the MERLIN data file, only the first affection status variable is used to determine the subject's affection status.

Minimum number of subjects: -n

The minimum number of subjects required in order to define a family covariate value. First the subject filter specified with the "-f" option is applied. If there are fewer than the specified number of subjects after the subject filter is applied, the family is assigned an unknown covariate value ("NaN").

Statistic: -s

The statistic used to create the covariate value. Three statistics are available: min, max, and avg.

-s min
minimum covariate value for the family members specified with the subject filter argument.
-s max
maximum covariate value for the family members specified with the subject filter argument.
-s avg
mean covariate value for the family members specified with the subject filter argument.

Covariate File Format

The covariate file is a white-space delimited matrix of entries. The first column contains FamilyID followed by the family identifiers. The first row contains FamilyID followed by the covariate identifiers. All other entries of the matrix give the family covariate scores for the family (determined by the row) and the covariate (determined by the column).

When creating the covariate file using the COV program, an additional column __asm__ is added. This column gives the maximum value for the allele sharing parameter for each family when using linear allele sharing model LOD scores. The __asm__ column is not used when using nonparametric linkage scores.

Missing Covariate Data

An NaN entry in the covariate file means there was there were not enough pedigree members with covariate data to assign a family covariate score. If the number of individuals with covariate data in the subset of family members specified by the filter (-f) parameter is less than the minimum number specified by the -n parameter, then the family covariate score is reported as missing (i.e. NaN). An ordered subset analysis for a particular covariate uses only the families with non-missing covariate scores for that covariate.

Choice of Covariates

Ordered subset analysis is well suited to covariates which can discriminate between the families, but usually is not recommended when the number of ranks due to the covariate ordering is small or when a large number of families share the same family covariate score. For example, one usually would not define a covariate based on the number of females or males in the family or the number of affected family members.

back to contents

Running the FLOSS program

Ordered subset analysis program is run using the "floss.jar" program. Enter

java -jar floss.jar [options]

where [options] are combinations of the following flags and arguments:

-c [.cov file]
name of covariate file. Multiple covariate files can be analyzed in a single run by including a separate "-c" flag before each covariate filename. Required (see Creating FLOSS input files).
-merlin [.lod file]
name of MERLIN ".lod" file. Created by MERLIN when using the --perFamily option. Required.
-o [output prefix]
prefix of output files. Required.
-seed [integer]
seed for random number generator. Optional: defaults to "-seed 0".
-subsets [type]
Type of ordered subsets used. type must be "extreme" or "slice". Optional: defaults to "-subsets extreme".
-asm [interval_type]
Type of allele sharing parameter interval used. interval_type must be "unequal" or "equal". Optional: defaults to "-asm unequal".
-minperm [integer]
minimum number of permutations for the permutation test. Optional: defaults to "-minperm 100".
-maxperm [integer]
maximum number of permutations for the permutation test. Optional: defaults to "-maxperm 10000".
--npl
Compute a nonparametric linkage (NPL) Z-score statistic for each subset of families. Note that two hypens "--" are required. Optional: linear allele sharing model LOD scores are used if "--npl" option is absent.

Type of ordered subsets: -subsets

-subsets extreme
Rank the families in order of increasing family covariate score and perform linkage on all subsets of families with the k smallest or k largest covariate scores. The "extreme" option is the recommended option, and the default option.
-subsets slice
Rank the families in order of increasing family covariate score and perform linkage on all subsets of families with consecutive covariate scores. For example, if there are N families linkage analysis is performed using families i through j for 1 ≤ i &le j &le N. This option is discouraged since the increased number of subsets makes it more difficult to detect disease loci associated with unusually low or high covariate values and requires substantially more computing time.

Type of allele sharing parameter interval: -asm

-asm unequal
The allele sharing parameter interval for an ordered subset is the intersection of the allele sharing parameter intervals for each family in the ordered subset. The "unequal" option is the default option.
.
-asm equal
The allele sharing parameter interval for an ordered subset is the intersection of the allele parameter intervals for all families. The parameter interval will be the same for all ordered subsets.

back to contents

FLOSS output files

Ordered subset analysis produces four output files. The output filenames have the format prefix.extension where the prefix is the filename prefix specified with the "-o" flag when running FLOSS and the extension is ".out", ".fam", ".plt", or ".log"

Summary file (.out)

The summary file (.out) records the analysis options and gives summary information for each covariate analyzed. The file reports the change in linkage score between the entire set of families, and the ordered subset with the highest linkage score, the maximum linkage score for this ordered subset, the optimal interval of family covariate scores, and the Monte Carlo p-value with a 95% confidence interval. The summary file is self-documented with documentation included at the end of the ".out" file.

Families file (.fam)

The ".fam" file gives the families ordered by the covariate values. The ".fam" file is arranged in sections corresponding to each covariate listed in the Covariate file. The sections are separated by a blank line. Each section contains three columns, and the first row in each section contains labels for the columns. The first column is labeled "Family" and contains the identifiers for families with defined covariate values in order of increasing covariate value. The second column is labeled "Subset" and contains "x" if the family in the first column is included in the ordered subset with the highest linkage score (when maximized over all ordered subsets and all loci). The third column is labeled with the covariate name and gives the covariate value for the families in the first column.

Plotting file (.plt)

The ".plt" file contains linkage scores for the complete set of families and for the optimal ordered subset of families for each covariate at the loci in the MERLIN .lod file. The plotting file has a simple format that is easily read and plotted using a speadsheet (eg. Excel) or a statistical software package (eg. R).

The first column is labeled "Position" and contains the position of all loci used in the ordered subset analysis. All data in each row is computed at the position specified in the first column. The second column is labeled "Orig_Score" and lists the linkage scores at the position specified in the first column obtained using all families . After the first two columns, the columns correspond to the covariates in the ordered subset analysis and are labeled by the covariate names. Each covariate column gives the linkage score at the position specified in the first column for the ordered subset that maximizes the linkage score for that covariate.

The Download section of this documentation includes an R script for plotting the linkage curves.

Log file (.log)

The ".log" file gives details for all ordered subset considered in the ordered subset analysis. The ".log" file is arranged in sections corresponding to each covariate listed in the Covariate file.

The first line in a section contains the name of the covariate. The next line begins "Ordered Families:", and the following line or lines list the identifiers for the families used in the ordered subset analysis. The families are listed in order of increasing covariate values. An = between two family identifiers means the two families have the same covariate score.

Following the ordered family identifiers are the results from each ordered subset considered in the ordered subset analysis. The results are presented in eight columns. Each line corresponds to a distinct ordered subset and has the following entries (in order from left to right):

  1. First Fam gives the family identifier with the smallest covariate value in the ordered subset.
  2. Last Fam gives the family identifier with the largest covariate value in the ordered subset.
  3. Num Fams gives the number of families in the ordered subset
  4. Peak gives the the locus where the highest linkage score was observed for the ordered subset of families specified by the first two entries (First Fam and Last Fam).
  5. Subset Score gives the highest linkage score observed for the ordered subset of families specified by the first two entries (First Fam and Last Fam).
  6. Orig Score gives the linkage score for the position specified in the fourth entry (Peak) for the set of all families.
  7. Subset Params gives the parameter value that yielded the maximum linkage score for the ordered subset of families specified by the first two entries (First Fam and Last Fam). This entry is blank when using nonparametric linkage analysis z-scores with the --npl option.
  8. Orig Params gives the parameter value that yielded the maximum linkage score for the position specified in the fourth entry (Peak) for the set of all families. This entry is blank when using nonparametric linkage analysis z-scores with the --npl option.

back to contents

Download FLOSS

The executable files for COV and FLOSS can be run using a java 1.4 (or later) interpreter with the "-jar" flag. See Creating FLOSS input files and Running the FLOSS program for details. The following files are available for viewing or download:

back to contents

Frequently Asked Questions

The list of Frequently Asked Questions answers common questions about FLOSS and gives tips for using FLOSS.

back to contents

References

back to contents