Research

Overall, the objective of my research is to develop efficient and robust methodologies that minimize the need for assumptions that are not motivated by available scientific knowledge. I am especially interested in developing methods that allow the integration of flexible learning techniques (e.g., machine learning) while still providing valid inference. My work is motivated by applications in public health and medicine, and has been used to analyze data from complex observational studies as well as clinical trials. With my research, I hope to provide tools that facilitate the conduct of rigorous and replicable science.

Methodological / Theoretical Interests

My statistical research generally lies at the intersection of nonparametric and semiparametric statistics, causal inference, survival analysis, and statistical epidemiology. Several topics I have been most interested in are listed below. Further details can be found by clicking on each topic.

1. AUTOMATED NONPARAMETRIC AND SEMIPARAMETRIC INFERENCE

It can be challenging to perform valid, optimal statistical inference on summaries of the data-generating mechanism in nonparametric and semiparametric models. The tools used to achieve such inference are generally based upon asymptotic arguments and require specialized theoretical knowledge. I have been interested in finding novel approaches for circumventing the difficult theoretical calculations involved by (possibly intensive) numerical computations, possibly to the point of automation. The objective would be to make nonparametric and semiparametric inference more accessible to practitioners. I have also been interested in studying non-asymptotic numerical approaches to nonparametric and semiparametric inference to potentially improve on the current state-of-the-art in the area.

2. CAUSAL INFERENCE

Many scientific questions involve assessing the existence of a causal association between potential causative agents or inputs and an outcome of interest. The field of causal inference is broadly concerned with formalizing causal estimands of potential interest, and determining how and under what conditions these causal estimands can be identified from data. While such work is particularly important for observational data, it is also relevant in the analysis of clinical trials, as data from trials often incorporate observational characteristics due to missingness or loss to follow-up. I have been motivated to develop techniques for robust causal inference in a variety of contexts.

3. TARGETED LOSS-BASED ESTIMATION

Targeted maximum likelihood estimation (TMLE) -- or more generally, targeted loss-based estimation -- is a general approach for revising a given estimator of a probability distribution such that it (i) retains its original statistical properties, and (ii) additionally solves a collection of user-specified score-like equations. TMLE can be used to tackle a wide range of problems. It has been primarily used to construct efficient plug-in estimators of complex parameters, particularly in the causal inference literature. However, it has also been employed to develop procedures for doubly-robust inference, for bootstrap resampling in settings in which the standard bootstrap fails, and for principled dimension reduction, among other uses. I have been interested in studying the theoretical and practical properties of the TMLE framework, and to explore its versatility in solving outstanding methodological problems.

4. INFERENCE UNDER SHAPE CONSTRAINTS

In many situations, the target of inference is a function known to satisfy certain shape constraints in view of prior scientific knowledge (or due to laws of probability). For example, it may be known that the dose-response curve of interest is monotone on some range of exposure levels; or that the incidence rate of a disease never exceeds some known ceiling in the population. In such cases, it can be advantageous to take into account this knowledge as part of the statistical analysis. I have been interested in developing general-purpose methodology that leverages known shape constraints. In particular, I have been working on developing robust inferential procedures that allow the integration of machine learning tools in shape-constrained problems.

5. STATISTICAL EPIDEMIOLOGY

Questions of an epidemiological nature are central to public health. I have been interested in devising robust methods for addressing epidemiological questions using complex data. I have been particularly interested in inference on measures of disease risk (e.g., incidence rate, lifetime risk) in the context of biased sampling schemes, such as cross-sectional sampling studies.

Substantive Interests

My substantive interests and collaborations have spanned a wide range of areas in public health and medicine. However, more recently, my work has been largely motivated by statistical challenges arising in the analysis of HIV vaccine efficacy studies. I also continue to work closely with collaborators in aging, neurology and environmental health sciences, particularly on the study of the epidemiology of dementia. Lastly, I have had long-standing collaborations with neuroradiologists, and have an interest in medical education and learning theory.