*********************************************************************** * * * fram-2-do.txt * * * * PURPOSE: analysis of the Framingham data with continuous * * covariates * * (1) specifying a functional relationship * * (2) evaluating fitted probabilities * * (3) summarizing accuracy * * * * DATE: 01/05/03 * * * *********************************************************************** *** *** input and label the data *** #delimit ; infile lexam surv cause cexam chd cva cancer other sex age height weight chol1 chol2 dbp sbp mrw smoke using fram-dat.txt; #delimit cr generate male = sex recode male 2=0 generate bmi = (weight/2.2)/( (height/39)*(height/39) ) *** *** labels *** label variable chd "death from CHD" label variable cause "cause of death" label variable height "height (inches)" label variable weight "weight (pounds)" label variable chol1 "serum cholesterol - exam 1" label variable chol2 "serum cholesterol - exam 2" label variable dbp "diastolic blood pressure" label variable sbp "systolic blood pressure" label variable smoke "cigarettes/day" *** *** recode missing values *** mvdecode height weight chol1 chol2 smoke, mv(-1) *** *** subset for analysis = males 40+ w/o CHD at entry *** drop if sex>1 | age < 40 | cexam==1 *** drop subjects with missing values (not ideal, but...) drop if chol1 > 1000 | smoke > 1000 | bmi > 1000 | bmi > 1000 ******************** *** smoke quad *** ******************** generate smoke2c = (smoke-10)*(smoke-10) generate smoke1c = (smoke-10) *** assess smoking adjusting for other *** logistic chd age bmi chol1 dbp sbp smoke1c smoke2c logit lrtest, saving(1) logistic chd age bmi chol1 dbp sbp smoke1c logit lrtest, saving(2) logistic chd age bmi chol1 dbp sbp logit lrtest, saving(3) *** any smoking? -- testing X + X^2 versus nothing *** lrtest, using(1) model(3) *** quadratic? -- testing X + X^2 versus X *** lrtest, using(1) model(2) *** linear? -- testing X versus nothing *** lrtest, using(2) model(3) *** *** Based on the smoking analysis we may choose to simplify the regression *** model and exclude this covariate since it does not appear to be *** predictive. This would be particularly warranted if it were difficult *** to obtain the measurement - but it isn't - so I will keep a linear term *** in the predictive model. *** *** Model *** logistic chd age bmi chol1 dbp sbp smoke *** *** accuracy summaries *** lstat lsens lroc *** *** obtain fitted & look at a few subjects *** predict pfit list pfit chd age bmi chol1 dbp sbp smoke in 1/10