Some comments on the analysis of birth weight =================================================== MAJOR ISSUES: (1) The choice / determination of adjustment variables. In the first part of the question the goal is to perform a putative confirmatory analysis that evaluates the FirstSteps program. * Include those factors that previous research has found to be important potential confounders. * Consider the alternative roles that a variable may play -- either confounder, precision variable, or intermediate variable. + The issue of what's intermediate here is tricky. The program works through access to prenatal care. What does prenatal care do? Some literature review finds that prenatal care aims to impact: (1) smoking; (2) nutrition; and (3) general health. Thus, smoking status and weight gain are clear potential intermediates. + One student created a table to classify as precision, confounder, intermediate, with justification for the choice. Nice idea! + Here is where fitting more than a single model is useful. Estimate the treatment comparison both without adjustment for the potential intermediate variables, and then with -- and discuss the impact of additional adjustment on the measure. (2) Be sure to carefully interpret your adjusted regression coefficients. State in clear language what this estimate contrasts, and what is "held constant". (3) Since the data do not contain SES, or income the conclusions are limited. This is an important variable as it relates to eligibility for the program. One solution is to include as many surrogates for this as possible, but still acknowledge the limitation. Another approach is to limit analysis to the subset of data that are likely eligible -- for example, the welfare and low education group of women. The caveat with limiting the sample is the small number of cases (ie. low birth weight babies) that is left. One approach restricted to N=300 women, but this left only 22 low birth weight outcomes. The use of any regression analysis will be severely limited by the number of cases. One "guideline" is to not use more than (number of cases)/10 predictors in a regression model (this actually justified by predictive performance). (4) Know how to interpret a main effect in the presence of an interaction! This is no longer a simple comparison of "treated" versus "control", but rather a comparison for *a specific value of the interaction variable*! (5) Should we use some form of statistical variable selection to simplify the regression model? For confirmatory regression analysis it is often preferable to avoid using some model simplification based on statistical testing since this will impact the real significance level of tests based on the final model that is chosen. Second, epidemiologists consider the simplification of a model but look at the impact of dropping (adding) a variable in terms of the estimate for the contrast of interest. The goal of including potential confounders is to remove bias, and if dropping a non-significant confounder has a modest or large impact on the effect of interest, then bias may be present. (6) Always stay focused on the question of interest! Keep your analysis and presentation anchored in the scientific question. OTHER ISSUES: (1) Goodness of fit for logistic regression is difficult. Typically the use of the residual deviance is not possible since the number of "parameters" for this likelihood ratio comparison is increasing with sample size. See Hosmer and Lemeshow (2nd edition) pg 147. (2) In the evaluation of a predictive model for logistic regression it is often useful to consider all possible "thresholds" for a positive prediction. An ROC curve is a useful device for showing all possible combinations of sensitivity and specificity. See Hosmer and Lemeshow (2nd edition) pages 156 through 164. (3) Analysis/Presentation logic -- it often makes sense for presentation to consider: (a) crude summary measures (ie. a 2x2 table with LBW and First Steps). (b) commentary on the substantive issues that pertain to potential confounding factors. (c) description of the methods used for adjustment. (d) interpretation of the results.