Sparse correlation Marginal models for sparsely correlated data. Marginal generalised linear models are often easier to interpret than conditional ones (especially for binary data). In order to use them for time series, spatial data and data with crossed random effects we need valid standard error estimates. You can get code for R, S-PLUS or Stata for some of these estimators.
Greater precision should come from weighted estimation that takes advantage of simple models for the correlation structure, but still uses model-free variance estimators to protect the validity of inference. This is very similar to MQL; the main advance is being able to do it without inverting or storing big matrices.
I'm also interested in empirical process theory and semiparametrics for dependent data, but as Barbie famously said "Math Is Hard".
Indirect comparisons. Suppose you compare A to B and B to C. What can you conclude about A vs C? I've been looking at two aspects of this. The first is meta-analysis of clinical trials taking into account indirect as well as direct comparisons. The hard part is getting the estimation to fail when it should. A worked example of this network meta-analysis is now available. The second aspect is the question of when two-sample tests are transitive. For the t-test, if the t-statistic comparing A and B and the t-statistic comparing B and C are both positive the t-statistic comparing A and C is also positive. For the Wilcoxon test this need not happen. The problem is to characterize the transitive two-sample tests.
Source apportionment Given multivariate time series of chemical or size composition of air pollution particles, the source apportionment problem is to work out how much of the pollution comes from which source. There are a number of methods, but the statistical properties of all of them are unknown.
This is mostly Free (open-source) software
XLISP-Stat has nice dynamic graphics and is quite fast. It is allegedly
dead, but that
doesn't make it any less useful.
I have an implementation of Generalised Estimating Equation models for XLISP-Stat. It includes diagnostic plots and a wide range of link, variance and correlation options. I even have documentation for it. If you think I should be implementing random intercept models instead, then see why I disagree . There's also Cox regression
R is a
free interpreter for a dialect of the S language. The initial system
came from Ross Ihaka and Robert Gentleman in Auckland, but now has a core
development team of about a dozen people spread across the world (including me). It is available for
Windows and for Unix systems from the Comprehensive R
Archive network.
One of my main projects in R is a package for complex survey analysis. It's fairly general, but of course is a bit slow, especially if you don't have enough memory.
Another project is a system for implementing and evaluating anomaly detection algorithms in syndromic surveillance.
I have a few thoughts on cryptography here.