Program: FLOSS (flexible ordered subset analysis)

Version: 1.4.1

Author: Brian L. Browning

Email: browning@uw.edu

**All rights reserved**

You have permission to use and develop the FLOSS and COV programs ("the Program"), provided that the following conditions are met:

- You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.
- If the FLOSS or COV software is used for analyses which will be reported or published,you specify the version of the software used, and cite the article noted in the citation section below.
- You acknowledge that Brian Browning, GlaxoSmithKline ("GSK") and the GSK developers may develop modifications to the software that may be substantially similar to your modifications of the software and that Brian Browning, GSK and GSK developers shall not be constrained in any way by you in Brian Browning's, GSK's and GSK developers' use or management of such modifications. You acknowledge the right of Brian Browning, GSK and GSK developers to prepare and publish modifications to the software that may be substantially similar or functionally equivalent to your modifications and improvements, and if you obtain patent protection for any modification or improvement to the software, you agree not to allege or enjoin infringement of your patent by Brian Browning, GSK or GSK developers
- This software is provided ``AS IS'' and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall Brian Browning or GlaxoSmithKline be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

- Introduction
- Citation
- Creating FLOSS input files
- Running the FLOSS program
- FLOSS output files
- Download FLOSS
- Frequently Asked Questions
- References

The FLOSS software package uses input and output files from the MERLIN linkage analysis package (Abecasis et al, 2002) to perform an ordered subset analysis using either nonparametric linkage analysis z-scores or linear allele sharing model LOD scores. The FLOSS program is written in java and requires a java 1.4 interpreter.

If you use FLOSS in your published work, please cite

Browning, BL (2006) FLOSS: Flexible ordered subsets analysis for linkage analysis of complex traits. Bioinformatics 22(4):512-3.If you are publishing results of an ordered subset analysis, the following Suggested Reporting Guidelines may be helpful.

The FLOSS program requires two input files: a linkage score file and a covariate file. The linkage score file is a MERLIN ".lod" output file created using MERLIN with the --perFamily option. The covariate file can be created using the MERLIN pedigree (.ped) and data (.dat) input files. All covariates must be identified with a 'C' in the MERLIN data file, and must be numeric (not categorical) data.

To create the covariate file from the MERLIN pedigree and data files enter the command:

java -jar cov.jar [options]

where [options] are combinations of the following flags and arguments:

- -d [.dat file]
- name of MERLIN data file. Required.
- -p [.ped file]
- name of MERLIN pedigree file. Required.
- -o [output file]
- name of output file. It is suggested that the output covariate filename end in ".cov". Required.
- -f [filter]
- subject filter. Optional: defaults to "-f all".
- -n [number]
- minimum number of subjects required to define a family covariate value. The argument must be an integer. Optional: defaults to "-n 2".
- -s [statistic]
- statistic used. Optional: defaults to "-s avg".

A short suffix is appended to the name of each covariate that identifies the subject filter, mininum number of subjects, and statistic used to define the family covariate score. For example, if you defined the family covariate score using the flags "-f all -n2 -s avg" for the "age_of_onset" covariate then covariate name in the covariate file would be "age_of_onset.avg2all".

The subject filter specifies the subset of families members that will be used to calculate the family covariate value. Three filters options are available: all, aff, and FDU.

- -f all
- use all family members who have a covariate value.
- -f aff
- use all affected family members who have a covariate value.
- -f FDU
- First Degree Unaffected: use all family members who are not affected (ie whose affection status is either unaffected or unknown) and who have a first degree relative (parent, offspring, or full sibling) who is affected. This filter is useful if you are concerned that the covariate values of affected members will be influenced by treatment for the affection.

Note: if there are multiple affection status variables specified in the MERLIN data file, only the first affection status variable is used to determine the subject's affection status.

The minimum number of subjects required in order to define a family covariate value. First the subject filter specified with the "-f" option is applied. If there are fewer than the specified number of subjects after the subject filter is applied, the family is assigned an unknown covariate value ("NaN").

The statistic used to create the covariate value. Three statistics are available: min, max, and avg.

- -s min
- minimum covariate value for the family members specified with the subject filter argument.
- -s max
- maximum covariate value for the family members specified with the subject filter argument.
- -s avg
- mean covariate value for the family members specified with the subject filter argument.

`FamilyID`

followed by the family
identifiers. The first row contains `FamilyID`

followed by
the covariate identifiers. All other entries of the matrix give the
family covariate scores for the family (determined by the row)
and the covariate (determined by the column).
When creating the covariate file using the COV
program, an additional column

is added.
This column gives the maximum value for the allele sharing parameter
for each family when using linear allele sharing model LOD scores. The
`__asm__`

column is not used when using nonparametric
linkage scores.
`__asm__`

`NaN`

entry in the covariate file
means there was there were not enough pedigree members with covariate
data to assign a family covariate score.
If the number of individuals with covariate data in the subset of family
members specified by the
filter (`-f`

) parameter is less than the minimum number
specified by the `-n`

parameter, then the family covariate
score is reported as missing (i.e. `NaN`

).

Ordered subset analysis program is run using the "floss.jar" program. Enter

`java -jar floss.jar [options]`

where [options] are combinations of the following flags and arguments:

- -c [.cov file]
- name of covariate file. Multiple covariate files can be analyzed in a single run by including a separate "-c" flag before each covariate filename. Required (see Creating FLOSS input files).
- -merlin [.lod file]
- name of MERLIN ".lod" file. Created by MERLIN when using
the
`--perFamily`

option. Required. - -o [output prefix]
- prefix of output files. Required.
- -seed [integer]
- seed for random number generator. Optional: defaults to "-seed 0".
- -subsets [type]
- Type of ordered subsets used. type must be "extreme" or "slice". Optional: defaults to "-subsets extreme".
- -asm [interval_type]
- Type of allele sharing parameter interval used. interval_type must be "unequal" or "equal". Optional: defaults to "-asm unequal".
- -minperm [integer]
- minimum number of permutations for the permutation test. Optional: defaults to "-minperm 100".
- -maxperm [integer]
- maximum number of permutations for the permutation test. Optional: defaults to "-maxperm 10000".
- --npl
- Compute a nonparametric linkage (NPL) Z-score statistic for each subset of families. Note that two hypens "--" are required. Optional: linear allele sharing model LOD scores are used if "--npl" option is absent.

- -subsets extreme
- Rank the families in order of increasing family covariate score
and perform linkage on all subsets of families with the
`k`

smallest or`k`

largest covariate scores. The "extreme" option is the recommended option, and the default option. - -subsets slice
- Rank the families in order of increasing family covariate score
and perform linkage on all subsets of families with consecutive
covariate scores. For example, if there are
`N`

families linkage analysis is performed using families`i`

through`j`

for`1 ≤ i &le j &le N`

. This option is discouraged since the increased number of subsets makes it more difficult to detect disease loci associated with unusually low or high covariate values and requires substantially more computing time.

- -asm unequal
- The allele sharing parameter interval for an ordered subset is the intersection of the allele sharing parameter intervals for each family in the ordered subset. The "unequal" option is the default option. .
- -asm equal
- The allele sharing parameter interval for an ordered subset is the intersection of the allele parameter intervals for all families. The parameter interval will be the same for all ordered subsets.

Ordered subset analysis produces four output files. The output filenames
have the format `prefix.extension`

where the prefix is the
filename prefix specified with the "-o" flag when running FLOSS and
the extension is ".out", ".fam", ".plt", or ".log"

The summary file (.out) records the analysis options and gives summary information for each covariate analyzed. The file reports the change in linkage score between the entire set of families, and the ordered subset with the highest linkage score, the maximum linkage score for this ordered subset, the optimal interval of family covariate scores, and the Monte Carlo p-value with a 95% confidence interval. The summary file is self-documented with documentation included at the end of the ".out" file.

The ".fam" file gives the families ordered by the covariate values. The ".fam" file is arranged in sections corresponding to each covariate listed in the Covariate file. The sections are separated by a blank line. Each section contains three columns, and the first row in each section contains labels for the columns. The first column is labeled "Family" and contains the identifiers for families with defined covariate values in order of increasing covariate value. The second column is labeled "Subset" and contains "x" if the family in the first column is included in the ordered subset with the highest linkage score (when maximized over all ordered subsets and all loci). The third column is labeled with the covariate name and gives the covariate value for the families in the first column.

The ".plt" file contains linkage scores for the complete set of families and for the optimal ordered subset of families for each covariate at the loci in the MERLIN .lod file. The plotting file has a simple format that is easily read and plotted using a speadsheet (eg. Excel) or a statistical software package (eg. R).

The first column is labeled "Position" and contains the position of all loci used in the ordered subset analysis. All data in each row is computed at the position specified in the first column. The second column is labeled "Orig_Score" and lists the linkage scores at the position specified in the first column obtained using all families . After the first two columns, the columns correspond to the covariates in the ordered subset analysis and are labeled by the covariate names. Each covariate column gives the linkage score at the position specified in the first column for the ordered subset that maximizes the linkage score for that covariate.

The Download section of this documentation includes an R script for plotting the linkage curves.

The ".log" file gives details for all ordered subset considered in the ordered subset analysis. The ".log" file is arranged in sections corresponding to each covariate listed in the Covariate file.

The first line in a section contains the name of the covariate. The next
line begins "Ordered Families:", and the following line or lines list the
identifiers for the families used in the ordered subset analysis. The
families are listed in order of increasing covariate values.
An `=`

between two family identifiers means the two families have
the same covariate score.

Following the ordered family identifiers are the results from each ordered subset considered in the ordered subset analysis. The results are presented in eight columns. Each line corresponds to a distinct ordered subset and has the following entries (in order from left to right):

*First Fam*gives the family identifier with the smallest covariate value in the ordered subset.*Last Fam*gives the family identifier with the largest covariate value in the ordered subset.*Num Fams*gives the number of families in the ordered subset*Peak*gives the the locus where the highest linkage score was observed for the ordered subset of families specified by the first two entries (First Fam and Last Fam).*Subset Score*gives the highest linkage score observed for the ordered subset of families specified by the first two entries (First Fam and Last Fam).*Orig Score*gives the linkage score for the position specified in the fourth entry (Peak) for the set of all families.*Subset Params*gives the parameter value that yielded the maximum linkage score for the ordered subset of families specified by the first two entries (First Fam and Last Fam). This entry is blank when using nonparametric linkage analysis z-scores with the`--npl`

option.*Orig Params*gives the parameter value that yielded the maximum linkage score for the position specified in the fourth entry (Peak) for the set of all families. This entry is blank when using nonparametric linkage analysis z-scores with the`--npl`

option.

- executable files
- sample input and output files
- R script for graphing ".plt" file linkage scores
- source code
- version notes

- Abecasis GR, Cherny, SS, Cookson, WO, Cardon, LR (2002) MERLIN--rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30:97-101.
- Browning, BL (2006) FLOSS: Flexible ordered subsets analysis for linkage analysis of complex traits. Bioinformatics 22(4):512-3.
- Hauser ER, Watanabe RM, Duren WL, Bass MP, Langefeld CD, Boehnke M (2004) Ordered subset analysis in genetic linkage mapping of complex traits. Genet Epi 27:53-63.
- Kong A, Cox NJ (1997) Allele-sharing models: LOD scores and accurate linkage tests. Am J Hum Genet 61:1179-1188.
- Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 53:1347-1363.