*************************** * Biostatistics 513 * Exercise Set 3, 2002 *************************** 1(a) To summarize whether AGE is associated with NEWTOB and with NEWALC we can construct a 2xC table (or Cx2 table) for each of these: | newtob Age Group | <10g/day >=10g/day | Total -----------+----------------------+---------- 25-34 | 70 46 | 116 | 60.34 39.66 | 100.00 -----------+----------------------+---------- 35-44 | 109 87 | 196 | 55.61 44.39 | 100.00 -----------+----------------------+---------- 45-54 | 104 109 | 213 | 48.83 51.17 | 100.00 -----------+----------------------+---------- 55-64 | 117 125 | 242 | 48.35 51.65 | 100.00 -----------+----------------------+---------- 65-74 | 99 62 | 161 | 61.49 38.51 | 100.00 -----------+----------------------+---------- 75+ | 26 18 | 44 | 59.09 40.91 | 100.00 -----------+----------------------+---------- Total | 525 447 | 972 | 54.01 45.99 | 100.00 Pearson chi2(5) = 11.5898 Pr = 0.041 From this table we see that the fraction of subjects that consume >=10g/day increases from approximately 40% in the 25-34 age group to 51% in the 45-54 and 55-64 age groups. However, the trend does not continue and we see that only 40% (approximately) of the 65+ subjects consume >=10g/day. | newalc Age Group | <80g/day >=80g/day | Total -----------+----------------------+---------- 25-34 | 106 10 | 116 | 91.38 8.62 | 100.00 -----------+----------------------+---------- 35-44 | 166 30 | 196 | 84.69 15.31 | 100.00 -----------+----------------------+---------- 45-54 | 159 54 | 213 | 74.65 25.35 | 100.00 -----------+----------------------+---------- 55-64 | 173 69 | 242 | 71.49 28.51 | 100.00 -----------+----------------------+---------- 65-74 | 124 37 | 161 | 77.02 22.98 | 100.00 -----------+----------------------+---------- 75+ | 39 5 | 44 | 88.64 11.36 | 100.00 -----------+----------------------+---------- Total | 767 205 | 972 | 78.91 21.09 | 100.00 Pearson chi2(5) = 27.9604 Pr = 0.000 Again we find a similar pattern -- the proportion of subjects that consume 80+ g/day of alcohol increases from a low of 9% in the 25-34 age group to a high of 29% in the 55-64 age group then declines to 23% in the 65-74 age group and 11% in the 75+ age group. 1(b) The following is a cross-tabulation of Y and NEWALC: | newalc | Proportion | Exposed Unexposed | Total Exposed -----------------+------------------------+---------------------- Cases | 96 104 | 200 0.4800 Controls | 109 663 | 772 0.1412 -----------------+------------------------+---------------------- Total | 205 767 | 972 0.2109 | | | Point estimate | [95% Conf. Interval] |------------------------+---------------------- Odds ratio | 5.614679 | 3.985032 7.911235 (Cornfield) Attr. frac. ex. | .8218954 | .749061 .8735975 (Cornfield) Attr. frac. pop | .3945098 | +----------------------------------------------- chi2(1) = 109.57 Pr>chi2 = 0.0000 From this we see that 48% of the cases fall in the 80+ g/day of alcohol category and only 14% of the controls fall in the 80+ g/day category. This results in a disease odds ratio of 5.61 (95% CI 3.98, 7.91, p-value <0.001) comparing 80+ g/day of alcohol consumption to <80 g/day of alcohol consumption. 1(c) We can now "adjust" for age by stratifying on AGE, calculating an odds ratio (disease odds comparing alcohol groups) for each strata, and then combine these estimates into a common estimate via Mantel-Haenszel. Age Group | OR [95% Conf. Interval] M-H Weight -----------------+------------------------------------------------- 25-34 | . 0 . 0 (Cornfield) 35-44 | 4.953846 1.348094 18.30423 .6632653 (Cornfield) 45-54 | 5.665025 2.815057 11.40813 2.859155 (Cornfield) 55-64 | 6.359477 3.45929 11.6933 3.793388 (Cornfield) 65-74 | 2.580247 1.22547 5.436149 4.024845 (Cornfield) 75+ | . 4.388738 . 0 (Cornfield) -----------------+------------------------------------------------- Crude | 5.614679 3.985032 7.911235 (Cornfield) M-H combined | 5.152126 3.558542 7.459349 -----------------+------------------------------------------------- Test of homogeneity (M-H) chi2(3) = 3.77 Pr>chi2 = 0.2870 Test that combined OR = 1: Mantel-Haenszel chi2(1) = 84.89 Pr>chi2 = 0.0000 We find that two strata (25-34 and 75+) have OR estimates of +infinity due to zero cells. The resulting age adjusted odds ratio is 5.15 (95% CI 3.56, 7.46, p-value<0.001). The age adjusted odds ratio compares the odds of disease among 80+ g/day of alcohol consumption to the odds of disease among <80 g/day alcohol consumption subjects for individuals from the *same age category*. That is, we compare exposed and unexposed that are comparable in age (same age group). The crude odds ratio simply compares the disease odds among the the 80+ g/day subjects versus the disease odds among the <80 g/day subjects. This comparison does not control for age so that the groups being compared may be different in their age distribution in addition to the alcohol consumption differences. We find that the adjustment for age makes a modest impact on the odds ratio estimate (a 100*(5.15-5.61)/5.61 = -8.2% change). The test of homogeneity tests whether H0: OR(1)=OR(2)=OR(3)=OR(4)=OR(5)=OR(6) H1: at least one strata has a different odds ratio where OR(j) is the odds ratio obtained from strata j. The resulting chi-square statistic is 3.77 on 3 degrees of freedom (two strata dropped) yielding a non-significant p-value of 0.287. Therefore we would not reject the homogeneity hypothesis. We can also inspect the strata specific odds ratios and see that there is no clear pattern in these estimates (ie. we may find increasing or decreasing odds ratios indicating some systematic effect modification). 1(d) The following is a cross-tabulation of Y and NEWTOB: | newtob | Proportion | Exposed Unexposed | Total Exposed -----------------+------------------------+---------------------- Cases | 122 78 | 200 0.6100 Controls | 325 447 | 772 0.4210 -----------------+------------------------+---------------------- Total | 447 525 | 972 0.4599 | | | Point estimate | [95% Conf. Interval] |------------------------+---------------------- Odds ratio | 2.151243 | 1.566262 2.954605 (Cornfield) Attr. frac. ex. | .5351524 | .3615372 .6615453 (Cornfield) Attr. frac. pop | .326443 | +----------------------------------------------- chi2(1) = 22.85 Pr>chi2 = 0.0000 Here we see that 61% of the cases report 10+ g/day of tobacco consumption while only 42% of the controls report 10+ g/day of tobacco consumption. This results in an estimated disease odds ratio of 2.15 (95% CI 1.57, 2.95, p-value<0.001). 1(e) We can now "adjust" for age by stratifying on AGE, calculating an odds ratio (disease odds comparing tobacco groups) for each strata, and then combine these estimates into a common estimate via Mantel-Haenszel. Age Group | OR [95% Conf. Interval] M-H Weight -----------------+------------------------------------------------- 25-34 | . 0 . 0 (Cornfield) 35-44 | 4.68125 1.06965 . .8163265 (Cornfield) 45-54 | 2.671614 1.339354 5.321494 5.061033 (Cornfield) 55-64 | 2.536216 1.441307 4.460941 7.644628 (Cornfield) 65-74 | 1.385399 .7159378 2.682346 7.31677 (Cornfield) 75+ | 2.121212 .5892296 7.650982 1.5 (Cornfield) -----------------+------------------------------------------------- Crude | 2.151243 1.566262 2.954605 (Cornfield) M-H combined | 2.267489 1.613024 3.187496 -----------------+------------------------------------------------- Test of homogeneity (M-H) chi2(4) = 3.27 Pr>chi2 = 0.5131 Test that combined OR = 1: Mantel-Haenszel chi2(1) = 22.65 Pr>chi2 = 0.0000 The age adjusted odd ratio is estimated as 2.27 (95% CI 1.61, 3.19, p-value<0.001). This is a comparison of the odds of disease among subjects who consume 10+ g/day of tobacco relative to the disease odds for subjects consume <10 g/day of tobacco *where the subjects are of equivalent ages* (here equivalent is in tems of the age group). We find that the adjustment for age does not make a large impact in the odds ratio estimate (a 100*(2.26-2.15)/2.26 = 4.9% change). The crude odds ratio estimate, 2.15, does not make the exposure comparison within each age group, but rather does not consider age at all in the comparison of the exposed and unexposed groups. The test of homogeneity tests whether H0: OR(1)=OR(2)=OR(3)=OR(4)=OR(5)=OR(6) H1: at least one strata has a different odds ratio where OR(j) is the odds ratio obtained from strata j. The resulting chi-square statistic is 3.27 on 4 degrees of freedom (one strata dropped) yielding a non-significant p-value of 0.513. Therefore we would not reject the homogeneity hypothesis. We can also inspect the strata specific odds ratios and find that there is a general pattern of decreasing odds ratios with increasing age groups. This is suggestive of effect modification but we have not formally tested this trend (we can using logistic regression!). 1(f) Consider the association between alcohol and tobacco. For this we can create a simple 2x2 table of NEWALC versus NEWTOB and the RxC table of ALC versus TOB: | newalc newtob | <80g/day >=80g/day | Total -----------+----------------------+---------- <10g/day | 440 85 | 525 | 83.81 16.19 | 100.00 -----------+----------------------+---------- >=10g/day | 327 120 | 447 | 73.15 26.85 | 100.00 -----------+----------------------+---------- Total | 767 205 | 972 | 78.91 21.09 | 100.00 Pearson chi2(1) = 16.4704 Pr = 0.000 | Alcohol Tobacco | <40g/day 40-79g/da 80-119g/d 120+g/day | Total -----------+--------------------------------------------+---------- 0-9g/day | 261 179 61 24 | 525 | 49.71 34.10 11.62 4.57 | 100.00 -----------+--------------------------------------------+---------- 10-19g/day | 81 85 49 18 | 233 | 34.76 36.48 21.03 7.73 | 100.00 -----------+--------------------------------------------+---------- 20-29g/day | 42 62 16 12 | 132 | 31.82 46.97 12.12 9.09 | 100.00 -----------+--------------------------------------------+---------- 30+g/day | 28 29 12 13 | 82 | 34.15 35.37 14.63 15.85 | 100.00 -----------+--------------------------------------------+---------- Total | 412 355 138 67 | 972 | 42.39 36.52 14.20 6.89 | 100.00 Pearson chi2(9) = 44.8060 Pr = 0.000 For each of these table we find that the Pearson chi-square rejects the null hypothesis of homogeneity. From the 2x2 table we see that only 16% of the <10 g/day tobacco consumers report 80+ g/day of alcohol consumption while 27% of the 10+ g/day tobacco consumers report 80+ g/day of alcohol consumption. A similar moderate pattern of increased tobacco use associated with increased alcohol use is seen in the RxC table. The statistical significance of the association can be addressed by the chi-square tests of homogeneity. Each rejects the null hypothesis of no association. The magnitude of the association is nicely summarized by the odds ratio estimate for NEWALC versus NEWTOB: 1.90 (95% CI 1.39, 2.59, p-value<0.001). This implies that the odds of 80+ g/day of alcohol consumption among the 10+ g/day tobacco consumers is 1.90 times the odds of 80+ g/day alcohol consumption among the <10 g/day tobacco consumers. 1(g) We can compute disease odds ratios for the exposure = NEWALC while controlling for both AGE and TOB using Mantel-Haenszel after stratifying in the combined levels of (AGE,TOB): [FIRST REVISIT THE AGE ADJUSTED OR FROM 1(c)]: Mantel-Haenszel estimate of the odds ratio Comparing newalc==1 vs newalc==0, controlling for age ---------------------------------------------------------------- Odds ratio chi2(1) P>chi2 [95% Conf. Interval] ---------------------------------------------------------------- 5.152126 84.89 0.0000 3.491326 7.602959 ---------------------------------------------------------------- [NOW LOOK AT AGE AND TOB ADJUSTED OR]: Mantel-Haenszel estimate of the odds ratio Comparing newalc==1 vs newalc==0, controlling for age by tob Note: only 18 of the 24 strata formed in this analysis contribute information about the effect of the explanatory variable ----------+-------------------------------------------------------------------- tob | Odds ratio chi2(1) P>chi2 [95% Conf. Interval] ----------+-------------------------------------------------------------------- 0-9g/day | 6.052225 43.33 0.0000 3.283540 11.15547 10-19g/d | 4.031735 18.93 0.0000 2.042358 7.958883 20-29g/d | 3.786274 7.72 0.0055 1.378896 10.39663 30+g/day | 5.849858 6.31 0.0120 1.223661 27.96595 ----------+-------------------------------------------------------------------- Mantel-Haenszel estimate controlling for age and tob ---------------------------------------------------------------- Odds ratio chi2(1) P>chi2 [95% Conf. Interval] ---------------------------------------------------------------- 4.852866 73.45 0.0000 3.253018 7.239526 ---------------------------------------------------------------- Test of homogeneity of ORs (approx): chi2(3) = 1.08 Pr>chi2 = 0.7825 The resulting AGE and TOB adjusted disease odds ratio comparing 80+ g/day alcohol use to <80 g/day alcohol use is estimated as 4.85 (95% CI 3.25, 7.24, p-value<0.001). This now compares the disease odds among subjects who consume 80+ g/day of alcohol to the disease odds among subjects who consume <80 g/day of alcohol where the comparison is for subjects that otherwise have the same age and the same tobacco consumption. The crude OR estimate didn't "fix" either age or tobacco use when making the alcohol group comparison, while the age adjusted OR in 1(c) only controlled for age. We see a modest change in the age and tobacco adjusted OR estimate relative to the crude estimate (a 100*(4.85-5.61)/5.61 = -13.5% change). Inspection of the odds ratios in the above table suggest that there is no systematic variation in the age adjusted odds ratios within tobacco strata. The homogeneity test above fails to reject the null hypothesis that OR(1)= OR(2)=OR(3)=OR(4). (This is a subtle test -- by pooling over the age groups within each level of tobacco we are assuming that the exposure odds ratio may depend on the level of tobacco but not on the age group within tobacco. Then the homogeneity tests whether these tobacco specific exposure odds ratios are equal. This is analagous to using a regression model with NEWALC + TOB(j)'s + AGE(k)'s + NEWALC*TOB(j)'s + TOB(j)*AGE(k)'s and then testing the interaction coefficients NEWALC*TOB(j). Tricky!). 1(h) We can compute disease odds ratios for the exposure = NEWTOB while controlling for both AGE and ALC using Mantel-Haenszel after stratifying in the combined levels of (AGE,ALC): [FIRST REVISIT THE AGE ADJUSTED OR FROM 1(e)]: Mantel-Haenszel estimate of the odds ratio Comparing newtob==1 vs newtob==0, controlling for age ---------------------------------------------------------------- Odds ratio chi2(1) P>chi2 [95% Conf. Interval] ---------------------------------------------------------------- 2.267489 22.65 0.0000 1.603275 3.206878 ---------------------------------------------------------------- [NOW LOOK AT AGE AND ALC ADJUSTED OR]: Mantel-Haenszel estimate of the odds ratio Comparing newtob==1 vs newtob==0, controlling for age by alc Note: only 18 of the 24 strata formed in this analysis contribute information about the effect of the explanatory variable ----------+-------------------------------------------------------------------- alc | Odds ratio chi2(1) P>chi2 [95% Conf. Interval] ----------+-------------------------------------------------------------------- <40g/day | 4.593489 13.72 0.0002 1.891396 11.15586 40-79g/d | 1.540453 2.23 0.1352 0.870096 2.727282 80-119g/ | 1.555876 1.20 0.2741 0.699999 3.458217 120+g/da | 1.317771 0.22 0.6414 0.411093 4.224156 ----------+-------------------------------------------------------------------- Mantel-Haenszel estimate controlling for age and alc ---------------------------------------------------------------- Odds ratio chi2(1) P>chi2 [95% Conf. Interval] ---------------------------------------------------------------- 1.901541 11.45 0.0007 1.302184 2.776766 ---------------------------------------------------------------- Test of homogeneity of ORs (approx): chi2(3) = 5.20 Pr>chi2 = 0.1577 The resulting AGE and ALC adjusted disease odds ratio comparing 10+ g/day tobacco use to <10 g/day tobacco use is estimated as 1.90 (95% CI 1.30, 2.76, p-value<0.001). This now compares the disease odds among subjects who consume 10+ g/day of tobacco to the disease odds among subjects who consume <10 g/day of tobacco where the comparison is for subjects that otherwise have the same age and the same alcohol consumption. The crude OR estimate didn't "fix" either age or alcohol use when making the tobacco group comparison, while the age adjusted OR in 1(e) only controlled for age. We see a modest change in the age and alcohol adjusted OR estimate relative to the crude estimate (a 100*(1.90-2.15)/2.15 = -11.6% change). Inspection of the odds ratios in the above table suggest that there is no systematic variation in the age adjusted odds ratios within alcohol strata although the lowest age category has an OR estimate that is much larger than the remaining estimates. The homogeneity test above fails to reject the null hypothesis that OR(1)=OR(2)=OR(3)=OR(4). 2(a) The dependent variable is LBW, or low birth weight (1=weight <= 2500g, 0 otherwise). pi(X) is the probability of low birth weight as a function of covariates, X. The hypothesized models are: *Model 1: pi(X) = ( exp( b0 + b1*SMOKE + b2*AnyPTL + b3*RACE(2) + b4*RACE(3) + b5*HYPER + b6*URIRR + b7*AGE20 ) ) / ( 1 + ( exp( b0 + b1*SMOKE + b2*AnyPTL + b3*RACE(2) + b4*RACE(3) + b5*HYPER + b6*URIRR + b7*AGE20 ) ) ) *Model 2: pi(X) = ( exp( b0 + b1*SMOKE + b2*AnyPTL + b3*RACE(2) + b4*RACE(3) + b5*HYPER + b6*URIRR + b7*AGE20 + b8*SMOKE*AGE20 ) ) / ( 1 + ( exp( b0 + b1*SMOKE + b2*AnyPTL + b3*RACE(2) + b4*RACE(3) + b5*HYPER + b6*URIRR + b7*AGE20 + b8*SMOKE*AGE20 ) ) ) The fitted models are: *Model 1: pi(X) = ( exp( -2.003 + .896*SMOKE + 1.317*AnyPTL + .962*RACE(2) + .951*RACE(3) + 1.364*HYPER + .769*URIRR - 0.051*AGE20 ) ) / ( 1 + ( exp( -2.003 + .896*SMOKE + 1.317*AnyPTL + .962*RACE(2) + .951*RACE(3) + 1.364*HYPER + .769*URIRR - 0.051*AGE20 ) ) ) *Model 2: pi(X) = ( exp( -1.905 + .713*SMOKE + 1.295*AnyPTL + .870*RACE(2) + .906*RACE(3) + 1.396*HYPER + .838*URIRR - .082*AGE20 + .066*SMOKE*AGE20)) / ( 1 + ( exp( -1.905 + .713*SMOKE + 1.295*AnyPTL + .870*RACE(2) + .906*RACE(3) + 1.396*HYPER + .838*URIRR - .082*AGE20 + .066*SMOKE*AGE20))) 2(b) Model 1: logit[pi(X)] = -2.003 + .896*SMOKE + 1.317*AnyPTL + .962*RACE(2) + .951*RACE(3) + 1.364*HYPER + .769*URIRR - 0.051*AGE20 Model 2: logit[pi(X)] = -1.905 + .713*SMOKE + 1.295*AnyPTL + .870*RACE(2) + .906*RACE(3) + 1.396*HYPER + .838*URIRR - .082*AGE20 + .066*SMOKE*AGE20 2(c) Smoker: X1 = ( 1, 0, 0, 0, 1, 0, 30-20 ) X1*beta = -2.003 + .896*(1) + 1.317*(0) + .962*(0) + .951*(0) + 1.364*(1) + .769*(0) - 0.051*(10) = -0.253 pi(X1) = ( exp( X1*beta ) ) / ( 1 + exp( X1*beta ) ) = exp(-0.253) / (1 + exp(-0.253)) = 0.437 Non-smoker: X0 = ( 0, 0, 0, 0, 1, 0, 30-20 ) X0*beta = -2.003 + .896*(0) + 1.317*(0) + .962*(0) + .951*(0) + 1.364*(1) + .769*(0) - 0.051*(10) = -1.149 pi(X0) = ( exp( X0*beta ) ) / ( 1 + exp( X0*beta ) ) = exp(-1.149) / (1 + exp(-1.149)) = 0.241 RR = .437 / .241 = 1.816 Smokers with hypertension, white, without history of PTL, without uterine irritability, age 30, are 1.816 times as likely to have a low birth weight child compared to non-smokers with with hypertension, white, without history of PTL, without uterine irritability, age 30. 2(d) Smoker: X1 = ( 1, 0, 0, 0, 1, 0, 30-20, 1*(30-20) ) X1*beta = -1.905 + .713*(1) + 1.295*(0) + .870*(0) + .906*(0) + 1.396*(1) + .838*(0) - .082*(10) + .066*(1)*(10) = 0.044 pi(X1) = ( exp( X1*beta ) ) / ( 1 + exp( X1*beta ) ) = exp(0.044) / (1 + exp(0.044)) = 0.511 Non-smoker: X0 = ( 0, 0, 0, 0, 1, 0, 30-20 ) X0*beta = -1.905 + .713*(0) + 1.295*(0) + .870*(0) + .906*(0) + 1.396*(1) + .838*(0) - .082*(10) + .066*(0)*(10) = -1.329 pi(X0) = ( exp( X0*beta ) ) / ( 1 + exp( X0*beta ) ) = exp(-1.329) / (1 + exp(-1.329)) = 0.209 RR = .511 / .209 = 2.441 Smokers with hypertension, white, without history of PTL, without uterine irritability, age 30, are 1.816 times as likely to have a low birth weight child compared to non-smokers with with hypertension, white, without history of PTL, without uterine irritability, age 30. The estimates are different due to the interaction term in model 2. 2(e) AGE = 20 Model 1: OR = exp( 0.896 ) = 2.450 Model 2: OR = exp( 0.713 + 0.066(20-20) ) = exp( 0.713 ) = 2.040 2(f) AGE = 30 Model 1: OR = exp( 0.896 ) = 2.450 Model 2: OR = exp( 0.713 + 0.066(30-20) ) = exp( 0.713 + 0.660 ) = 3.947 3(a) Dependent variable: Incident esophogeal cancer (case=1, control=0). pi(X) is the probability of esophogeal cancer 3(b) logit[pi(X)] = b0 + b1*NewAlc 3(c) OR = exp(b1) = exp(1.668543) = 5.304 95% CI = [3.658, 7.691] This OR estimate is very similar to the estimate obtained in 1.c and these odds ratio estimates have the same interpretation. After adjusting for age, the odds of esophogeal cancer among those who drink more than 80g of alcohol per day is 5.3 times that of the odds for those who drink less than 80g of alcohol per day. In the stratified analysis we explicitly "controlled" for age by separating the data into age groups, calculating ORs for each strata, and then combining them into a common estimate. In the regression setting, we "adjust" for age by allowing age dummy variables, and then obtain a common odds ratio estimate from the main effect for NEWALC. If we had put in NEWALC*AGE(j) interactions then the NEWALC odds ratio would be different for the levels of AGE -- ie. strata specific. Using regression we can structure the associations (common, strata specific) and we obtain summaries of the influence of the potential confounder via the coefficient estimates for that variable. 3(d) OR = exp(b1) = exp(1.6241) = 5.074 95% CI = [3.464, 7.432] This OR estimate is very similar to the estimate obtained in 1.g and these odds ratio estimates have the same interpretation. After adjusting for age and tobacco consumption, the odds of esophogeal cancer among subjects who drink more than 80g of alcohol per day is 5.1 times greater than the odds of disease for subjects who drink less than 80g of alcohol per day.