Daniela Witten

Publicly-Available Software

Convex Regression With Interpretable Sharp Partitions

[CRAN crisp package ]
An R software package for fitting an interpretable and non-additive regression model using a convex formulation.
Reference: Petersen A, Simon N, and D Witten. Journal of Machine Learning Research 17(94): 1-31. [pdf]

Combined Annotation-Dependent Depletion

[CADD Website]
CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletion variants in the human genome. CADD can quantitatively prioritize functional, deleterious, and disease causal variants across a wide range of functional categories, effect sizes and genetic architectures and can be used prioritize causal variation in both research and clinical settings.
Reference: Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014 Feb 2. doi: 10.1038/ng.2892. [article]

Modeling Interactions with Strong Heredity

[CRAN FAMILY package ]
An R software package for fitting a regression model with interactions, under strong heredity. An interaction is only allowed into the model if both of the corresponding main effects are included. Strong heredity is enforced using convex penalties.
Reference: Haris A, Simon N, and D Witten (2015) Convex modeling of interactions with strong heredity. Journal of Computational and Graphical Statistics 25(4): 981-1004. [arxiv]

Fused Lasso Additive Model

[CRAN FLAM package ] [Shiny app]
An R software package for fitting a fused lasso additive model. Each feature's fit is estimated to be sparse and piecewise constant. This leads to highly interpretable models.
Reference: Petersen A, Witten D, and N Simon (2015) Fused lasso additive model. Journal of Computational and Graphical Statistics 25(4): 1005-1025. [arxiv]

Hub Graphical Lasso

[CRAN hglasso package ]
An R software package for fitting a graphical model with highly-connected hub nodes.
Reference: Tan KM, London P, Mohan K, Lee SI, Fazel M, and D Witten (2014) Learning graphical models with hubs. Journal of Machine Learning Research 15:3297-3331. [arxiv]

Node-Based Learning of Multiple Gaussian Graphical Models

[Matlab code in zip file ]
Matlab for performing node-based learning of multiple Gaussian graphical models. The perturbed-node and cohub node joint graphical lasso approaches are implemented.
Reference: Mohan K, London P, Fazel M, Witten D, and SI Lee (2014). Node-based learning of multiple Gaussian graphical models. Journal of Machine Learning Research 15: 445-488. [arxiv]

Sparse Biclustering of Transposable Data

[CRAN sparseBC package ]
An R software package for performing sparse biclustering of transposable data. A generalization of k-means clustering is performed to both the rows and columns of a data matrix. L1 penalties are applied, in order to encourage bicluster means to equal zero.
Reference: Tan KM and DM Witten (2014) Sparse biclustering of transposable data. Journal of Computational and Graphical Statistics 23(4): 985-1008. [pdf]

SpaCE JAM

[CRAN spacejam package ]
An R software package for performing sparse conditional graph estimation with joint additive models. Conditional independence relationships are flexibly estimated without assuming that the conditional means are linear.
Reference: Voorman A, Shojaie A, and D Witten (2014) Graph estimation with joint additive models. Biometrika 101(1): 85-101. [arxiv]

Joint Graphical Lasso

[CRAN JGL package ]
An R software package for performing joint estimation of multiple Gaussian graphical models. A group lasso or fused lasso penalty is applied to the Gaussian log likelihood in order to encourage the graphical models to have shared structure across conditions.
Reference: Danaher P, Wang P, and D Witten (2014) The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society, Series B 76(2): 373-397. [arxiv]

Correlate

[Correlate point-and-click Excel plug-in ]
A point-and-click Excel interface for sparse canonical correlation analysis, which allows for the integrative analysis of two data set with measurements taken on a single set of observations.
References: Witten DM, Tibshirani R, and T Hastie (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3): 515-534. [pdf]
Witten DM and R Tibshirani (2009) Extensions of sparse canonical correlation analysis, with applications to genomic data. Statistical Applications in Genetics and Molecular Biology 8(1): Article 28. [pdf]

Penalized Linear Discriminant Analysis

[CRAN penalizedLDA package ]
An R software package for performing linear discriminant analysis in the high-dimensional setting. A lasso or fused lasso penalty can be applied in order to obtain penalized discriminant vectors. A diagonal estimate for the within-class covariance matrix is used.
Reference: Witten DM and R Tibshirani (2011) Penalized classification using Fisher's linear discriminant. Journal of the Royal Statistical Society, Series B 73(5): 753-772. [pdf]

Sparse Clustering

[CRAN sparcl package ]
An R software package for clustering a set of n observations when p variables are available, where p>>n. It adaptively chooses a set of variables to use in clustering the observations. Sparse K-means clustering and sparse hierarchical clustering are implemented.
Reference: Witten DM and R Tibshirani (2010) A framework for feature selection in clustering. Journal of the American Statistical Association 105(490): 713-726.

Lassoed Principal Components

[CRAN lpc package ]
An R software package for assessing significance of genes in a microarray experiment using the lassoed principal components (LPC) method.
Reference: Witten DM and R Tibshirani (2008) Testing significance of features by lassoed principal components. Annals of Applied Statistics 2(3): 986-1012. [pdf]

Scout: Covariance-Regularized Regression

[CRAN scout package ]
An R software package implementing covariance-regularized regression, a.k.a. the scout.
Reference: Witten DM and R Tibshirani (2009) Covariance-regularized regression and classification for high-dimensional problems. Journal of the Royal Statistical Society, Series B 71(3): 615-636. [pdf]

Penalized Multivariate Analysis

[CRAN PMA package ]
An R software package implementing the penalized matrix decomposition (PMD), as well as sparse canonical correlation analysis (CCA) and sparse principal components analysis (SPC). Sparse CCA can be used to perform an integrative analysis of two assays performed on the same set of samples. For instance, in the case of gene expression and DNA copy number data, one can use it to find sets of genes that are correlated with regions of genomic gain or loss. It can also be used to perform integrative analyses of SNP and gene expression data or SNP and CGH data.
References: Witten DM, Tibshirani R, and T Hastie (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3): 515-534. [pdf]
Witten DM and R Tibshirani (2009) Extensions of sparse canonical correlation analysis, with applications to genomic data. Statistical Applications in Genetics and Molecular Biology 8(1): Article 28. [pdf]

Supervised Multidimensional Scaling

[CRAN superMDS package ]
An R software package for performing supervised multidimensional scaling, an extension of least squares multidimensional scaling to the case where a binary outcome is available for each observation, in order to obtain configuration points that are consistent with the dissimilarity matrix as well as with the outcome vector.
Reference: Witten DM and R Tibshirani (2011) Supervised multidimensional scaling for visualization, classification, and bipartite ranking. Computational Statistics and Data Analysis 55(1): 789-801.