This page describes some selected current projects

### High-throughput association testing

As the first step in establishing novel biological findings, genome-wide, exome-sequencing and genome-sequencing studies undertake association testing, between disease outcomes and thousands or millions of genetic variants. Challenges in making this work include;

### Aims and optimality in meta-analysis

Modern epidemiological studies often combine data from multiple sources; doing this requires meta-analysis, in some form. However, the typical motivation for fixed- and random-effects meta-analysis (the standard methods) does not fit the scientific goals of these studies. Through formalizing what is known about the contributing studies - and what the analysis aims to infer - we are developing new meta-analytic methods, and improving understanding of existing methods.

This is joint work with;### Better motivations for statistical tests

Statistical tests are a fundamental part of statistical inference, but are widely mis-understood and mis-used. This work aims to establish motivations for statistical tests that are easier to understand, and which better connect scientific questions with tests that might be performed when we try to anwer those questions.
This is joint work with;
### Understanding shrinkage

Shrinkage estimators - that set parts of estimates to zero - have become standard in analyses of "big data". There are various methodological motivations for these estimators, but none of them addresses the question of why one might want a shrunken estimate in the first place. We are developing a general approach to shrinkage, motivated by balancing veracity (getting close to the truth) and simplicity (getting close to zero, typically). While yielding "simple" shrunk estimates, the approach does not require any assumption that the truth is actually full of zeros - an assumption that is often unreasonable.

- Computing association tests quickly enough to ensure millions of them can be done in reasonable time. As well as fast and accurate-enough algorithms, distributed computing resources must also be used efficiently.
- Pulling together information from multiple studies - which is particularly challenging when the studies have widely-varying protocols
- Making sure the statistical properties (e.g. control of Type I error rate) of analyses are appropriate, even with rare variants
- Making good choices of which test to use, incorporating all available data and other information, to optimize power to detect the association signals that are plausible
- Developing statistical intuition for analyses that study very large numbers of associations, where most associations cannot be detected with high power

This is joint work with;

Tyler Bonnett