## Publicly-Available Software

### Convex Regression With Interpretable Sharp Partitions

[CRAN crisp package ]

An R software package forfitting an interpretable and non-additive regression modelusing a convex formulation.

Reference:Petersen A, Simon N, and D Witten.Journal of Machine Learning Research17(94): 1-31. [pdf]

### Combined Annotation-Dependent Depletion

[CADD Website]

CADD is a tool forscoring the deleteriousnessof single nucleotide variants as well as insertion/deletion variants in the human genome. CADD can quantitatively prioritize functional, deleterious, and disease causal variants across a wide range of functional categories, effect sizes and genetic architectures and can be used prioritize causal variation in both research and clinical settings.

Reference:Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014 Feb 2. doi: 10.1038/ng.2892. [article]

### Modeling Interactions with Strong Heredity

[CRAN FAMILY package ]

An R software package forfitting a regression model with interactions, under strong heredity. An interaction is only allowed into the model if both of the corresponding main effects are included. Strong heredity is enforced using convex penalties.

Reference:Haris A, Simon N, and D Witten (2015) Convex modeling of interactions with strong heredity.Journal of Computational and Graphical Statistics25(4): 981-1004. [arxiv]

### Fused Lasso Additive Model

[CRAN FLAM package ] [Shiny app]

An R software package forfitting a fused lasso additive model. Each feature's fit is estimated to besparseandpiecewise constant. This leads to highly interpretable models.

Reference:Petersen A, Witten D, and N Simon (2015) Fused lasso additive model.Journal of Computational and Graphical Statistics25(4): 1005-1025. [arxiv]

### Hub Graphical Lasso

[CRAN hglasso package ]

An R software package forfitting a graphical model with highly-connected hub nodes.

Reference:Tan KM, London P, Mohan K, Lee SI, Fazel M, and D Witten (2014) Learning graphical models with hubs.Journal of Machine Learning Research15:3297-3331. [arxiv]

### Node-Based Learning of Multiple Gaussian Graphical Models

[Matlab code in zip file ]

Matlab for performingnode-based learning of multiple Gaussian graphical models. The perturbed-node and cohub node joint graphical lasso approaches are implemented.

Reference:Mohan K, London P, Fazel M, Witten D, and SI Lee (2014). Node-based learning of multiple Gaussian graphical models.Journal of Machine Learning Research15: 445-488. [arxiv]

### Sparse Biclustering of Transposable Data

[CRAN sparseBC package ]

An R software package for performingsparse biclustering of transposable data. A generalization of k-means clustering is performed to both the rows and columns of a data matrix. L1 penalties are applied, in order to encourage bicluster means to equal zero.

Reference:Tan KM and DM Witten (2014) Sparse biclustering of transposable data.Journal of Computational and Graphical Statistics23(4): 985-1008. [pdf]

### SpaCE JAM

[CRAN spacejam package ]

An R software package for performingsparse conditional graph estimation with joint additive models. Conditional independence relationships are flexibly estimated without assuming that the conditional means are linear.

Reference:Voorman A, Shojaie A, and D Witten (2014) Graph estimation with joint additive models.Biometrika101(1): 85-101. [arxiv]

### Joint Graphical Lasso

[CRAN JGL package ]

An R software package for performingjoint estimation of multiple Gaussian graphical models. A group lasso or fused lasso penalty is applied to the Gaussian log likelihood in order to encourage the graphical models to have shared structure across conditions.

Reference:Danaher P, Wang P, and D Witten (2014) The joint graphical lasso for inverse covariance estimation across multiple classes.Journal of the Royal Statistical Society, Series B76(2): 373-397. [arxiv]

### Correlate

[Correlate point-and-click Excel plug-in ]

A point-and-click Excel interface forsparse canonical correlation analysis, which allows for the integrative analysis of two data set with measurements taken on a single set of observations.

References:Witten DM, Tibshirani R, and T Hastie (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.Biostatistics10(3): 515-534. [pdf]

Witten DM and R Tibshirani (2009) Extensions of sparse canonical correlation analysis, with applications to genomic data.Statistical Applications in Genetics and Molecular Biology8(1): Article 28. [pdf]

### Penalized Linear Discriminant Analysis

[CRAN penalizedLDA package ]

An R software package for performinglinear discriminant analysis in the high-dimensional setting. A lasso or fused lasso penalty can be applied in order to obtain penalized discriminant vectors. A diagonal estimate for the within-class covariance matrix is used.

Reference:Witten DM and R Tibshirani (2011) Penalized classification using Fisher's linear discriminant.Journal of the Royal Statistical Society, Series B73(5): 753-772. [pdf]

### Sparse Clustering

[CRAN sparcl package ]

An R software package for clustering a set of n observations when p variables are available, where p>>n. It adaptively chooses a set of variables to use in clustering the observations.Sparse K-means clusteringandsparse hierarchical clusteringare implemented.

Reference:Witten DM and R Tibshirani (2010) A framework for feature selection in clustering.Journal of the American Statistical Association105(490): 713-726.

### Lassoed Principal Components

[CRAN lpc package ]

An R software package for assessing significance of genes in a microarray experiment using thelassoed principal components(LPC) method.

Reference:Witten DM and R Tibshirani (2008) Testing significance of features by lassoed principal components.Annals of Applied Statistics2(3): 986-1012. [pdf]

### Scout: Covariance-Regularized Regression

[CRAN scout package ]

An R software package implementingcovariance-regularized regression, a.k.a. thescout.

Reference:Witten DM and R Tibshirani (2009) Covariance-regularized regression and classification for high-dimensional problems.Journal of the Royal Statistical Society, Series B71(3): 615-636. [pdf]

### Penalized Multivariate Analysis

[CRAN PMA package ]

An R software package implementing thepenalized matrix decomposition(PMD), as well assparse canonical correlation analysis(CCA) andsparse principal components analysis(SPC). Sparse CCA can be used to perform an integrative analysis of two assays performed on the same set of samples. For instance, in the case of gene expression and DNA copy number data, one can use it to find sets of genes that are correlated with regions of genomic gain or loss. It can also be used to perform integrative analyses of SNP and gene expression data or SNP and CGH data.

References:Witten DM, Tibshirani R, and T Hastie (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.Biostatistics10(3): 515-534. [pdf]

Witten DM and R Tibshirani (2009) Extensions of sparse canonical correlation analysis, with applications to genomic data.Statistical Applications in Genetics and Molecular Biology8(1): Article 28. [pdf]

### Supervised Multidimensional Scaling

[CRAN superMDS package ]

An R software package for performingsupervised multidimensional scaling, an extension of least squares multidimensional scaling to the case where a binary outcome is available for each observation, in order to obtain configuration points that are consistent with the dissimilarity matrix as well as with the outcome vector.

Reference:Witten DM and R Tibshirani (2011) Supervised multidimensional scaling for visualization, classification, and bipartite ranking.Computational Statistics and Data Analysis55(1): 789-801.