STAT/BIOST 572, Spring 2013
Advanced Regression Methods


Instructions for Students

The list of papers for Stat/Biost 572 is given below, up to and including Spring 2012.


Instructions for Faculty

In 572, students study a methods paper in depth, writing a report on the paper and its place in the statistical literature, and giving several talks on its content. The papers are suggested by faculty, who may (optionally) also advise students. As this course follows 570 and 571, the methods covered should be regression-based, but this is interpreted broadly.

A list of previous 572 papers is below. Strikethrough (i.e. strikethrough) indicates that a paper was used, by the student named in parentheses; these papers will not be available for use in 572 this year. Please review the list, and do one of the following;

  1. If the two papers you listed are available, and you don't want to change them: Do nothing
  2. If you are not listed, or some of your papers were used, or you want to change papers: email the course instructor


Suggested papers
Listed alphabetically by faculty member

Bill Barlow
  1. Barlow WE. Robust variance estimation for the case-cohort design. Biometrics. 1994 Dec;50(4):1064-72.
  2. Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics. 2005 Mar;61(1):92-105.
Norm Breslow
  1. Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M: Using the whole cohort in the analysis of case-cohort data. Am J Epidemiol 169(11):1398-405, 2009
  2. Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M: Improved Horvitz-Thompson estimation of model parameters from two-phase stratified samples: applications in epidemiology. Statistics in Biosciences 1(1):32-49, 2009
Elizabeth Brown
  1. Biometrics. 2003 Sep;59(3):521-30 A Bayesian approach for joint modeling of cluster size and subunit-specific outcomes. Dunson DB, Chen Z, Harry J.
  2. (Rong Fu)
  3. Biometrics. 2000 Dec;56(4):1007-15. Bayesian estimators for conditional hazard functions. McKeague IW, Tighiouart M. (Stefan Sharkansky)
Sharon Browning
  1. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. NATURE GENETICS 42, 565-9 (2010)
  2. Zuk, O., Hechter, E., Sunyaev, S.R. & Lander, E.S. The mystery of missing heritability: Genetic interactions create phantom heritability. Proceedings of the National Academy of Sciences of the United States of America 109, 1193-8 (2012) (Elisa Sheng)
  3. Han L, Abney M (2011) "Identity by descent estimation with dense genome-wide genotype data." Genetic Epidemiology 35: 557-567.
Gary Chan
  1. Qin J., Zhang B and Leung DHY (2009) Empirical likelihood in missing data problems. JASA 104 (488), 1492-1503 (Cheng Zheng)
  2. Copas (1983) Regression, prediction and shrinkage. JRSS-B 45, 311-354
  3. Zhang, M., Tsiatis, A.A., and Davidian, M. (2008) Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics, 64, 707-715 (Rui Zhang)
  4. Koenker R. (2004) Quantile regression for longitudinal data. Journal of Multivariate Analysis 91(1): p.74-89.
  5. Chan KCG, Chen YQ and Di C (2012). Proportional Mean Residual Life Model for Censored Length-biased Data of Prevalent Cohort. Biometrika 99 (4): 995-1000.
  6. Chan KCG (2013). Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation. Biometrika 100 (1): 269-276.
  7. Chan, KCG (2013). Survival analysis without survival data: connecting length-biased and case-control data. Biometrika, in press.
Adrian Dobra
  1. Dobra, A. (2009). Variable selection and dependency networks for genomewide data Biostatistics, 10, 621-639
  2. Hans, C., Dobra, A. and West, M. (2007). Shotgun stochastic search for 'large p' regression. Journal of the American Statistical Association, 102, 507-516. (Kirk Le)
Mathias Drton
  1. B"uhlmann, Peter. Statistical significance in high-dimensional linear models, to appear in Bernoulli, link to paper
  2. Chen, Jiahua; Chen, Zehua. Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95 (2008), no. 3, 759-771.
  3. Ravikumar, Pradeep; Lafferty, John; Liu, Han; Wasserman, Larry. Sparse additive models. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 (2009), no. 5, 1009-1030.
Scott Emerson
  1. Emerson SS, Fleming TR: Parameter estimation following group sequential hypothesis testing. Biometrika 77:875-892, 1990.
Mary Emond
  1. Pan, W and Chappell R. Estimation in the Cox PH Model with Left-Truncated and Interval-Censored Data. Biometrics 58, 64-70. March 2002
  2. Sun J and Wei LJ. Regression analysis of panel count data with coveriate-dependent observation and censoring times. JRSSB 62:292-302. 2000
Peter Gilbert
  1. Prentice RL, Kalbfleisch JD, Peterson AV, Jr, Flournov N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics 1978;34:541-554.
  2. Sun Y, Gilbert PB, McKeague IW. Proportional hazards models with continuous marks. Ann Stat. 2009;37(1):394-426.
  3. Jemiai Y, Rotnitzky A, Shepherd BE, Gilbert PB. Semiparametric estimation of treatment effects given base-line covariates on an outcome measured after a post-randomization event occurs. Journal of the Royal Statistical Society Series B-Statistical Methodology. 2007;69:879-901.
Betz Halloran
  1. Chan, ISF, Shu, H, Matthews, C et al. 2002; Use of statistical models for evaluating antibody response as a correlate of protection against varicella. Statistics in Medicine 21:3411-3430
  2. Gilks, WR , Wang, CC, Yvonnet B, and Coursaget, J. Random-effects models for longitudinal data using Gibbs sampling Biometrics 49:441-454, 1993
Patrick Heagerty
  1. Donnelly CA, Laird NM, Ware JH (1995). Prediction and creation of smooth curves for temporally correlated longitudinal data. JASA 90: 984-989 (Shirley You)
  2. Heagerty PJ, Pepe MS (1999). Semiparametric estimation of regression quantiles with application to standardizing weight for height and age in US children. JRSS-C 48:533-551 (Cheng Zheng)
Peter Hoff
  1. White (1982) Maximum likelihood estimation of misspecified models Econometrica Vol. 50, No. 1 (Jan., 1982), pp. 1-25 (James Harmon)
  2. Box and Cox (1964) An Analysis of Transformations Journal of the Royal Statistical Society. Series B, Vol. 26, No. 2, pp. 211-252 and
    Bickel and Docksum (1981) An Analysis of Transformations Revisited Journal of the American Statistical Association, Vol. 76, No. 374 pp. 296-311 and
    Box and Cox (1982) An Analysis of Transformations Revisited, Rebutted Journal of the American Statistical Association, Vol. 77, No. 377 pp. 209-210
  3. George and McCulloch (1993) Variable selection via Gibbs sampling Journal of the American Statistical Association, Vol. 88, No. 423 pp. 881-889 (Alex Volfovsky)
  4. Hoff PD: Extending the rank likelihood for semiparametric copula estimation. Ann. Appl. Stat., 1(1):265–283, 2007
  5. Owen AB: Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75 (1988), no. 2, 237–249.
Jim Hughes
  1. Hughes JP: Mixed effects models with censored data with application to HIV RNA levels. Biometrics, 55:625-629, 1999 (Leigh Fisher)
  2. Moore KL and van der Lann MJ: Covariate adjustment in randomized trials with binary outcomes: Targeted maximum likelihood estimation. Statistics in Medicine, 28:39 - 64, 2009
Lurdes Inoue
  1. Lawless, J.F. (1987). Regression methods for Poisson process data. JASA, 82 (399), 808-815 (Mark Wheldon)
  2. Kalbfleisch, J.D and Lawless, J.F. (1985). The analysis of panel data under a Markov assumption. JASA, 80(392), 863-871 (Wenying Zheng)
  3. Dror HA, Steinberg DM (2008). Sequential Experimental Designs for Generalized Linear Models. JASA 103:481, 288-298.
  4. Dixon DO, Simon R (1991). Bayesian subset analysis. Biometrics 47(3): 871-881.
Katie Kerr
  1. Smyth, Gordon K. (2004) Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Statistical Applications in Genetics and Molecular Biology: Vol. 3 : Iss. 1, Article 3
  2. Biostatistics. 2005 Jan;6(1):59-75 Improved statistical tests for differential gene expression by shrinking variance components estimates. Cui X, Hwang JT, Qiu J, Blades NJ, Churchill GA
Michael LeBlanc
  1. Logic Regression. Ingo Ruczinski, Charles Kooperberg and Michael LeBlanc Journal of Computational and Graphical Statistics Vol. 12, No. 3 (Sep., 2003), pp. 475-511
  2. Extreme regression. Michael LeBlanc, James Moon and Charles Kooperberg Biostatistics 2006 7(1):71-84
  3. (Jason Liang)
Brian Leroux
  1. Leroux BG, Mancl LA, DeRouen TA. Group sequential testing in clinical trials with longitudinal data on multiple outcome variables. Statist Meth Med Res 14:501-9, 2005 (Clara Domiguez-Islas)
  2. Stoner JA, Leroux BG: Analysis of clustered data: A combined estimating equations approach. Biometrika 89(3):567-578, 2002
Thomas Lumley
  1. The Weighted Residual Technique for Estimating the Variance of the General Regression Estimator of the Finite Population Total; Carl-Erik Sarndal, Bengt Swensson, Jan H. Wretman. Biometrika, Vol. 76, No. 3 (Sep., 1989), pp. 527-537
  2. Weighting for Unequal Selection Probabilities in Multilevel Models; D. Pfeffermann, C. J. Skinner, D. J. Holmes, H. Goldstein, J. Rasbash. Journal of the Royal Statistical Society. Series B, Vol. 60, No. 1 (1998), pp. 23-40
  3. Lipsitz et al (1994) Performance of Generalized Estimating Equations in Practical Situations Biometrics 50(1) 270-278 (Chris Jordan-Squire)
Robyn McClelland
  1. Lee WC (2011): Bounding the bias of unmeasured factors with confounding and effect-modifying potentials. Stat in Med 30:1007-1017.
  2. Nguyen T, Jiang J.(2011) Restricted fence method for covariate selection in longitudinal data analysis; Biostatistics. 13(2): 303-314.
  3. Sjolander A, Vansteelandt S. (2011) Doubly robust estimation of attributable fractions. Biostatistics. 12(1): 112-121.
Barbara McKnight
  1. Song and Nicolae (2009) Restricted Parameter Space Models for Testing Gene-Gene Interaction. Genetic Epidemiology 33: 386- 393. (Shizen Wang)
  2. Madsen BE and Browning SR. A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic PLoS Genet 5(2): e1000384 (Phillip Keung)
Diana Miglioretti
  1. Heagerty PJ, Kurland BF: Misspecified maximum likelihood estimates and generalized linear mixed models. Biometrika 88: 973-985, 2001 (Leila Zelnick)
Vladimir Minin
  1. Neal R, Regression and Classification Using Gaussian Process Priors Bayesian Statistics 6
  2. Hsu JSJ and Leonard T Hierarchical Bayesian Semiparametric Procedures for Logistic Regression Biometrika Vol. 84, No. 1, pp. 85-93 (Amanda Allen)
Martina Morris
  1. Goodreau S Kitts J and Morris M. Birds of a Feather, or Friend of a Friend? Using Exponential Random Graph Models Investigate Adolescent Social Networks Demography 2009 46(1)
Don Percival
  1. Peter Hall; Berwin A. Turlach; Interpolation Methods for Nonlinear Wavelet Regression with Irregularly Spaced Design, The Annals of Statistics Vol. 25, No. 5, Oct., 1997 pp. 1912-1925
  2. Hee-Seok Oh, Douglas W. Nychka, and Thomas C. M. Lee The Role of Pseudo Data for Robust Smoothing with Application to Wavelet Regression, Biometrika, Vol. 94, No. 4, 2007, pp. 893-904
  3. S. J. Koopman and K.M. Lee (2009) Seasonality with Trend and Cycle Interactions in Unobserved Components Models, Journal of the Royal Statistical Society Series C Volume 58, Pages 427-448 (Stefan Sharkansky)
Michael Perlman
  1. M. Drton, M. D. Perlman (2004). Model selection for Gaussian concentration graphs. Biometrika 91:3 591-602 (Theresa Smith)
  2. M. Drton, M. D. Perlman (2008). A SINful Approach to Gaussian graphical model selection. Journal of Statistical Planning and Inference 138, 1179-1200
Li Qin
  1. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties, Jianqing Fan and Runze Li, Journal of the American Statistical Association 2001, Vol. 96, No. 456
  2. Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Annals of Statistics, 36, 1509-1566.
Adrian Raftery
  1. Sloughter, J.M., Raftery, A.E. and Gneiting, T. (2007). Probabilistic Quantitative Precipitation Forecasting Using Bayesian Model Averaging. Monthly Weather Review, 135, 3209-3220.  (A similar paper is about to come out in JASA)
  2. Fraley, C. and Raftery, A.E. (2007). Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering. Journal of Classification, 24, 155-181 (Ricky Chielecki)
  3. Taylor J and Verbyla A (2008) Joint modelling of location and scale parameters of the t distribution Statistical Modelling July 4(2) 91-112 with Langa KL, Little RA and Taylor JMG (1989) Robust Statistical Modeling Using the t Distribution JASA 84(408) 881-896 (Nevena Lalic)
Ken Rice
  1. Microarrays, Empirical Bayes and the Two-Groups Model. Bradley Efron. Statist. Sci. Volume 23, Number 1 (2008), 1-22. (Yates Coley)
  2. Control of the Mean Number of False Discoveries, Bonferroni and Stability of Multiple Testing. Gordon et al. Annals of Applied Statistics 2007, Vol. 1, No. 1, 179–190 (Caitlin McHugh)
  3. Brown PJ, Fearn T and Vannucci M (1999) The Choice of Variables in Multivariate Regression: A Non-Conjugate Bayesian Decision Theory Approach. Biometrika 86(3) 635-648 (Rui Zhang)
  4. Zhang, M., Tsiatis, A.A., and Davidian, M. (2008) Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics, 64, 707-715
  5. Spiegelhalter DJ, Best NG, Carlin BP and Van der Linde A, "Bayesian Measures of Model Complexity and Fit (with Discussion)", Journal of the Royal Statistical Society, Series B, 2002 64(4):583-616.
  6. Gelman, A., Meng, X.-L. and Stern, H. (1996). Posterior Predictive Assessment of Model Fitness via Realized Discrepancies. Statistica Sinica 6, 733-807 (with discussion)
Barbra Richardson
  1. Vansteelandt, S., Goetghebeur, E. and Verstraeten, T. (2000) Regression models for disease prevalence with diagnostic tests on pools of serum samples Biometrics, 56, 1126-1133
  2. Beale, E. M. L. and Little, R. J. A. (1975) Missing values in multivariate analysis. Journal of the Royal Statistical Society, Series B, 37, 129-145 (Roddy Theobald)
Thomas Richardson
  1. M.H. Maathuis, M. Kalisch, P. Buehlmann (2009), Estimating high-dimensional intervention effects from observational data. Annals of Statistics 37, 3133-3164 (Chris Glazner)
  2. Hirano K, Imbens G, Rubin D, Zhou X (2000) Assessing the Effect of an Influenza Vaccine in an Encouragement Design with Covariates, Biostatistics 1, 69-88
  3. Richardson, T., Robins, J.M. and Evans, R.J. (2011) Transparent parametrizations of models for potential outcomes (with discussion), Valencia 9, pp 569-610 (WenWei Loh)
Carolyn Rutter
  1. Rutter CM, Miglioretti DL, Savarino JE. Bayesian calibration of microsimulation models, JASA 2009; 104(488):1338-1350
  2. Rutter CM, Gatsonis CA. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations, Statistics in Medicine, 2001; 20(19):2865-2884
Ali Shojaie
  1. Yuan M, Lin Y: Model selection and estimation in regression with grouped variables, JRSS-B 2006; 68(1): 49-67.
  2. Kalisch M, Buhlmann P: Estimating hig-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research (2007); 613-636.
Lianne Sheppard
  1. Pacoriak, CJ. The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators. Statistical Science 2010, Vol. 25, No. 1, 107-125
  2. Neuhaus JM, McCulloch CE, Separating between- and within-cluster covariate effects by using conditional and partitioning methods. JRSSB 2006, 68, 859-872.
  3. Hodges JS, Reich BJ. Adding spatially-correlated errors can mess up the fixed effect you love. American Statistician, 2010, 64, 325-334.
Galen Shorack
  1. Efron B (1987) Better bootstrap confidence intervals Journal of the American Statistical Association, Vol. 82, No. 397 pp. 171-185(Greg Imholte)
  2. Hampel (1974) The influence curve and its role in robust estimation Journal of the American Statistical Association, Vol. 69, No. 346 pp. 383-393
Adam Szpiro
  1. A. Gryparis, C. J. Paciorek, A. Zeka, J. Schwartz, and B. A. Coull. Measurement error caused by spatial misalignment in environmental epidemiology. Biostatistics, 10(2):258–274, 2009 (Laina Mercer)
  2. Szpiro AA, Sheppard L, and Lumley T, Efficient measurement error correction with spatially misaligned data, Biostatistics, in press (Silas Bergen)
  3. Peng R, Dominici F, Louis T. Model choice in time series studies of air pollution and mortality. JRSSA (2006) 169, 2:179-203
  4. Hodges J and Reich. Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love The American Statistician. November 1, 2010, 64(4): 325-334
  5. Szpiro AA, Rice KM, and Lumley T. Model-robust regression and a Bayesian 'sandwich' estimator. Annals of Applied Statistics, Vol 4(4), 2010 (David Benkeser)
  6. Bayesian effect estimation accounting for adjustment uncertainty. Wang C, Parmigiani G, Dominici F. (2012)
  7. Matern Cross-Covariance Functions for Multivariate Random Fields. Tilmann Gneiting, William Kleiber, and Martin Schlather (2010)
MaryLou Thompson
  1. De la Cruz-Mesía R, Quintana FA, Marshall G. Model-based clustering for longitudinal data. Computational Statistics & Data Analysis 2008; 52 :1441 – 1457.
  2. Marschner IC, Gillett AC. Relative risk regression: reliable and flexible methods for log-binomial models. Biostatistics 2012; 13:179-92
Timothy Thornton
  1. Kang, H. M., Sul, J. H., Service, S. K., Zaitlen, N. A., Kong, S.Y., Freimer, N. B., Sabatti, C. Eskin, E. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42, 348-354.
  2. Thornton, T., McPeek, M.S. (2007) Case-control association testing with related individuals: a more powerful quasi-likelihood score test. Am J Hum Genet 81, 321-337.
Jon Wakefield
  1. Prentice, R. L. and Sheppard, L. (1995). Aggregate data studies of disease risk factors. Biometrika 82, 113-125.
  2. Plummer, M. (2008). Penalized loss functions for Bayesian model comparison. Biostatistics, 9, 523-539.
Pei Wang
  1. Efron, Bradley; Hastie, Trevor; Johnstone, Iain and Tibshirani, Robert (2004). Least Angle Regression Annals of Statistics 32 (2): pp. 407–499
  2. Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Annals of Statistics 34, 1436-1462.
Jon Wellner
A list of 10 papers is available here

Daniela Witten
  1. Zou and Hastie (2005) Regularization and variable selection via the elastic net. JRSSB 67(2) 301-320 (Alison Kosel)
  2. Mazumder, Hastie, and Tibshirani (2010) Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research 11 2287-2322 (Amol Kapila)
  3. Friedman, Hastie and Tibshirani (2008) Sparse inverse covariance estimation with the graphical lasso Biostatistics 9(3) 432-441 with Yuan and Lin, Model selection and estimation in the Gaussian graphical model Biometrika (2007) 94(1):19-35 (Arie Voorman)
  4. Bien J, Tibshirani R: Sparse estimation of a covariance matrix. Biometrika (2011); 98(4): 807-820.
  5. Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K: Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society Series B (2004); 67(1): 91-108.
  6. Rothman AJ, Bickel PJ, Levina E, Zhu J: Sparse permutation invariant covariance estimation. Electron. J. Statist. (2008); Volume 2, 494-515. (David Prince)
  7. P.D. Hoff. Separable covariance arrays via the Tucker product, with applications to multivariate relational data. Bayesian Analysis, 6:179
  8. H Zhou, L Li, and H Zhu. (2013) Tensor regression with applications in neuroimaging data analysis, Journal of American Statistical Association, accepted. link to paper
  9. Richard Lockhart, Jonathan Taylor, Ryan Tibshirani and Robert Tibshirani. A significance test for the lasso (submitted) link to paper
David Yanez
  1. Pepe MS, Anderson GL (1994) A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Communications in Statistics and Simulation, 23, 939–51 (Jing Fan)
  2. Crouchley R, Davies RB (1999). A comparison of population average and random effects models for the analysis of longitudinal count data with baseline information. Journal of the Royal Statistical Society, Series A, 162, 331–47 - together with - Thomas Lumley, Margaret S. Pepe, Patrick J. Heagerty, Robert Crouchley, Richard Davies. Analysis and Interpretation of Disease Clusters and Ecological Studies (2001), Journal of the Royal Statistical Society. Series A, Vol. 164, No. 1, pp. 209-212.
Yingye Zheng
  1. Model-Checking Techniques Based on Cumulative Residuals, D. Y. Lin, L. J. Wei and Z. Ying, Biometrics, Vol. 58, No. 1 pp. 1-12
  2. Logistic disease incidence models and case-control studies, R. L. Prentice and R. Pyke, Biometrika 1979 66(3):403-411
Andrew Zhou
  1. Rodenberg CA, Zhou XH. ROC curve estimation when covariates affect the verification process. Biometrics 2000; 56: 1256-62.
  2. Zhou XH, Lin H, and Eric Johnson. Nonparametric heteroscedastic transformation regression models for skewed data with an application to health care costs. Journal of Royal Statistical Society Series B 2008; 70: 1029-1047