Teaching --- Methodology --- Talks---Real Statistics --- Statistical Computing

- Winter 2010: BIOST 518 Applied Biostatistics II (4) Multiple regression for continuous, discrete, and right censored response variables, including dummy variables, transformations, and interactions. Introduction to regression with correlated outcome data. Model and case diagnostics. Computer assignments using real data and standard statistical computer packages. Prerequisite: BIOST 517 or permission of instructor.
- Spring 2010: BIOST/STAT/CSSS 529 Sample Survey Techniques. This will be similar to last year's course

**Sparse correlation**
Marginal models for sparsely correlated data. Marginal generalised linear
models are often easier to interpret than conditional ones (especially for
binary data). In order to use them for time series, spatial data and data
with crossed random effects we need valid standard error estimates. You
can get code for R, S-PLUS or Stata for some of
these estimators.

Greater precision should come from weighted estimation that takes advantage of simple models for the correlation structure, but still uses model-free variance estimators to protect the validity of inference. This is very similar to MQL; the main advance is being able to do it without inverting or storing big matrices.

I'm also interested in empirical process theory and semiparametrics for dependent data, but as Barbie famously said "Math Is Hard".

**Indirect comparisons**. Suppose you compare A to B and B
to C. What can you conclude about A vs C? I've been looking at two
aspects of this. The first is meta-analysis of clinical trials taking into
account indirect as well as direct comparisons. The hard part is getting
the estimation to fail when it should. A worked
example of this network meta-analysis is now available. The second
aspect is the question
of when two-sample tests are transitive. For the t-test, if the
t-statistic comparing A and B and the t-statistic comparing B and C are
both positive the t-statistic comparing A and C is also positive. For the
Wilcoxon test this need not happen. The problem is to characterize the
transitive two-sample tests.

**Source apportionment** Given multivariate time series of
chemical or size composition of air pollution particles, the source
apportionment problem is to work out how much of the pollution comes from
which source. There are a number of methods, but the statistical
properties of all of them are unknown.

- I've given several talks recently about indirect comparisons in clinical trials and about the related issue of non-transitivity of two-sample tests. A representative set of slides is from a seminar at Vanderbilt
- Slides from WNAR 2007 about survey calibration estimators and their relationship to semiparametric models for coarsened data.
- Slides about the R survey package (also see its homepage).
- A two-day short course in R.
- New Zealand/Australia Dec 2009 Biometric Society conference and visits to Auckland and Sydney.

i.e., stuff with live data. A bit of particulate air pollution stuff as above, and at the Northwest Center for Particulate Matter and Health. I also work at the Cardiovascular Health Study, a big study of the risk factors for cardiovascular disease in older people, and at the Cardiovascular Health Research Unit, which is interested in drug-gene interactions and related topics.

**This is mostly ****Free (open-source) software**

XLISP-Stat has nice dynamic graphics and is quite fast. It is allegedly
dead, but that
doesn't make it any less useful.

I have an implementation of Generalised Estimating Equation models for XLISP-Stat. It includes diagnostic plots and a wide range of link, variance and correlation options. I even have documentation for it. If you think I should be implementing random intercept models instead, then see why I disagree . There's also Cox regression

R is a
free interpreter for a dialect of the S language. The initial system
came from Ross Ihaka and Robert Gentleman in Auckland, but now has a core
development team of about a dozen people spread across the world (including me). It is available for
Windows and for Unix systems from the Comprehensive R
Archive network.

One of my main projects in R is a package for complex survey analysis. It's fairly general, but of course is a bit slow, especially if you don't have enough memory.

Another project is a system for implementing and evaluating anomaly detection algorithms in syndromic surveillance.

I have a few thoughts on cryptography here.

Other things I have worked on can be deduced from my cv

Teaching --- Methodology--- Talks ---Real Statistics --- Statistical Computing

Thomas Lumley