Thomas Lumley: work page

My useful and productive page

Teaching --- Methodology --- Talks---Real Statistics --- Statistical Computing

Teaching

Winter 2010: BIOST 518 Applied Biostatistics II (4) Multiple regression for continuous, discrete, and right censored response variables, including dummy variables, transformations, and interactions. Introduction to regression with correlated outcome data. Model and case diagnostics. Computer assignments using real data and standard statistical computer packages. Prerequisite: BIOST 517 or permission of instructor.

Spring 2010: BIOST/STAT/CSSS 529 Sample Survey Techniques. This will be similar to last year's course

Methodological Research

Sparse correlation Marginal models for sparsely correlated data. Marginal generalised linear models are often easier to interpret than conditional ones (especially for binary data). In order to use them for time series, spatial data and data with crossed random effects we need valid standard error estimates. You can get code for R, S-PLUS or Stata for some of these estimators.

Greater precision should come from weighted estimation that takes advantage of simple models for the correlation structure, but still uses model-free variance estimators to protect the validity of inference. This is very similar to MQL; the main advance is being able to do it without inverting or storing big matrices.

I'm also interested in empirical process theory and semiparametrics for dependent data, but as Barbie famously said "Math Is Hard".

Indirect comparisons. Suppose you compare A to B and B to C. What can you conclude about A vs C? I've been looking at two aspects of this. The first is meta-analysis of clinical trials taking into account indirect as well as direct comparisons. The hard part is getting the estimation to fail when it should. A worked example of this network meta-analysis is now available. The second aspect is the question of when two-sample tests are transitive. For the t-test, if the t-statistic comparing A and B and the t-statistic comparing B and C are both positive the t-statistic comparing A and C is also positive. For the Wilcoxon test this need not happen. The problem is to characterize the transitive two-sample tests.

Source apportionment Given multivariate time series of chemical or size composition of air pollution particles, the source apportionment problem is to work out how much of the pollution comes from which source. There are a number of methods, but the statistical properties of all of them are unknown.

Slides from my talks

I've given several talks recently about indirect comparisons in clinical trials and about the related issue of non-transitivity of two-sample tests. A representative set of slides is from a seminar at Vanderbilt
Slides from WNAR 2007 about survey calibration estimators and their relationship to semiparametric models for coarsened data.
Slides about the R survey package (also see its homepage).
A two-day short course in R.
New Zealand/Australia Dec 2009 Biometric Society conference and visits to Auckland and Sydney.

Real Statistics

i.e., stuff with live data. A bit of particulate air pollution stuff as above, and at the Northwest Center for Particulate Matter and Health. I also work at the Cardiovascular Health Study, a big study of the risk factors for cardiovascular disease in older people, and at the Cardiovascular Health Research Unit, which is interested in drug-gene interactions and related topics.

Statistical computing and graphics

This is mostly Free (open-source) software

XLISP-Stat has nice dynamic graphics and is quite fast. It is allegedly dead, but that doesn't make it any less useful.

I have an implementation of Generalised Estimating Equation models for XLISP-Stat. It includes diagnostic plots and a wide range of link, variance and correlation options. I even have documentation for it. If you think I should be implementing random intercept models instead, then see why I disagree . There's also Cox regression

R is a free interpreter for a dialect of the S language. The initial system came from Ross Ihaka and Robert Gentleman in Auckland, but now has a core development team of about a dozen people spread across the world (including me). It is available for Windows and for Unix systems from the Comprehensive R Archive network.

One of my main projects in R is a package for complex survey analysis. It's fairly general, but of course is a bit slow, especially if you don't have enough memory.

Another project is a system for implementing and evaluating anomaly detection algorithms in syndromic surveillance.

I have a few thoughts on cryptography here.

Other things I have worked on can be deduced from my cv

Teaching --- Methodology--- Talks ---Real Statistics --- Statistical Computing

Thomas Lumley