***************************
* Biostatistics 513 * Exercise Set 3, 2002
***************************
1(a) To summarize whether AGE is associated with NEWTOB and with NEWALC
we can construct a 2xC table (or Cx2 table) for each of these:
| newtob
Age Group | <10g/day >=10g/day | Total
-----------+----------------------+----------
25-34 | 70 46 | 116
| 60.34 39.66 | 100.00
-----------+----------------------+----------
35-44 | 109 87 | 196
| 55.61 44.39 | 100.00
-----------+----------------------+----------
45-54 | 104 109 | 213
| 48.83 51.17 | 100.00
-----------+----------------------+----------
55-64 | 117 125 | 242
| 48.35 51.65 | 100.00
-----------+----------------------+----------
65-74 | 99 62 | 161
| 61.49 38.51 | 100.00
-----------+----------------------+----------
75+ | 26 18 | 44
| 59.09 40.91 | 100.00
-----------+----------------------+----------
Total | 525 447 | 972
| 54.01 45.99 | 100.00
Pearson chi2(5) = 11.5898 Pr = 0.041
From this table we see that the fraction of subjects that consume >=10g/day
increases from approximately 40% in the 25-34 age group to 51% in the 45-54
and 55-64 age groups. However, the trend does not continue and we see that
only 40% (approximately) of the 65+ subjects consume >=10g/day.
| newalc
Age Group | <80g/day >=80g/day | Total
-----------+----------------------+----------
25-34 | 106 10 | 116
| 91.38 8.62 | 100.00
-----------+----------------------+----------
35-44 | 166 30 | 196
| 84.69 15.31 | 100.00
-----------+----------------------+----------
45-54 | 159 54 | 213
| 74.65 25.35 | 100.00
-----------+----------------------+----------
55-64 | 173 69 | 242
| 71.49 28.51 | 100.00
-----------+----------------------+----------
65-74 | 124 37 | 161
| 77.02 22.98 | 100.00
-----------+----------------------+----------
75+ | 39 5 | 44
| 88.64 11.36 | 100.00
-----------+----------------------+----------
Total | 767 205 | 972
| 78.91 21.09 | 100.00
Pearson chi2(5) = 27.9604 Pr = 0.000
Again we find a similar pattern -- the proportion of subjects that
consume 80+ g/day of alcohol increases from a low of 9% in the 25-34
age group to a high of 29% in the 55-64 age group then declines
to 23% in the 65-74 age group and 11% in the 75+ age group.
1(b) The following is a cross-tabulation of Y and NEWALC:
| newalc | Proportion
| Exposed Unexposed | Total Exposed
-----------------+------------------------+----------------------
Cases | 96 104 | 200 0.4800
Controls | 109 663 | 772 0.1412
-----------------+------------------------+----------------------
Total | 205 767 | 972 0.2109
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Odds ratio | 5.614679 | 3.985032 7.911235 (Cornfield)
Attr. frac. ex. | .8218954 | .749061 .8735975 (Cornfield)
Attr. frac. pop | .3945098 |
+-----------------------------------------------
chi2(1) = 109.57 Pr>chi2 = 0.0000
From this we see that 48% of the cases fall in the 80+ g/day of alcohol
category and only 14% of the controls fall in the 80+ g/day category.
This results in a disease odds ratio of 5.61 (95% CI 3.98, 7.91, p-value
<0.001) comparing 80+ g/day of alcohol consumption to <80 g/day of alcohol
consumption.
1(c) We can now "adjust" for age by stratifying on AGE, calculating an
odds ratio (disease odds comparing alcohol groups) for each strata, and then
combine these estimates into a common estimate via Mantel-Haenszel.
Age Group | OR [95% Conf. Interval] M-H Weight
-----------------+-------------------------------------------------
25-34 | . 0 . 0 (Cornfield)
35-44 | 4.953846 1.348094 18.30423 .6632653 (Cornfield)
45-54 | 5.665025 2.815057 11.40813 2.859155 (Cornfield)
55-64 | 6.359477 3.45929 11.6933 3.793388 (Cornfield)
65-74 | 2.580247 1.22547 5.436149 4.024845 (Cornfield)
75+ | . 4.388738 . 0 (Cornfield)
-----------------+-------------------------------------------------
Crude | 5.614679 3.985032 7.911235 (Cornfield)
M-H combined | 5.152126 3.558542 7.459349
-----------------+-------------------------------------------------
Test of homogeneity (M-H) chi2(3) = 3.77 Pr>chi2 = 0.2870
Test that combined OR = 1:
Mantel-Haenszel chi2(1) = 84.89
Pr>chi2 = 0.0000
We find that two strata (25-34 and 75+) have OR estimates of +infinity due
to zero cells. The resulting age adjusted odds ratio is 5.15 (95% CI 3.56,
7.46, p-value<0.001).
The age adjusted odds ratio compares the odds of disease among 80+ g/day of
alcohol consumption to the odds of disease among <80 g/day alcohol consumption
subjects for individuals from the *same age category*. That is, we compare
exposed and unexposed that are comparable in age (same age group). The crude
odds ratio simply compares the disease odds among the the 80+ g/day subjects
versus the disease odds among the <80 g/day subjects. This comparison
does not control for age so that the groups being compared may be different
in their age distribution in addition to the alcohol consumption differences.
We find that the adjustment for age makes a modest impact on the
odds ratio estimate (a 100*(5.15-5.61)/5.61 = -8.2% change).
The test of homogeneity tests whether
H0: OR(1)=OR(2)=OR(3)=OR(4)=OR(5)=OR(6)
H1: at least one strata has a different odds ratio
where OR(j) is the odds ratio obtained from strata j. The resulting chi-square
statistic is 3.77 on 3 degrees of freedom (two strata dropped) yielding a
non-significant p-value of 0.287. Therefore we would not reject the
homogeneity hypothesis. We can also inspect the strata specific odds ratios
and see that there is no clear pattern in these estimates (ie. we may find
increasing or decreasing odds ratios indicating some systematic effect
modification).
1(d) The following is a cross-tabulation of Y and NEWTOB:
| newtob | Proportion
| Exposed Unexposed | Total Exposed
-----------------+------------------------+----------------------
Cases | 122 78 | 200 0.6100
Controls | 325 447 | 772 0.4210
-----------------+------------------------+----------------------
Total | 447 525 | 972 0.4599
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Odds ratio | 2.151243 | 1.566262 2.954605 (Cornfield)
Attr. frac. ex. | .5351524 | .3615372 .6615453 (Cornfield)
Attr. frac. pop | .326443 |
+-----------------------------------------------
chi2(1) = 22.85 Pr>chi2 = 0.0000
Here we see that 61% of the cases report 10+ g/day of tobacco consumption
while only 42% of the controls report 10+ g/day of tobacco consumption. This
results in an estimated disease odds ratio of 2.15 (95% CI 1.57, 2.95,
p-value<0.001).
1(e) We can now "adjust" for age by stratifying on AGE, calculating an
odds ratio (disease odds comparing tobacco groups) for each strata, and then
combine these estimates into a common estimate via Mantel-Haenszel.
Age Group | OR [95% Conf. Interval] M-H Weight
-----------------+-------------------------------------------------
25-34 | . 0 . 0 (Cornfield)
35-44 | 4.68125 1.06965 . .8163265 (Cornfield)
45-54 | 2.671614 1.339354 5.321494 5.061033 (Cornfield)
55-64 | 2.536216 1.441307 4.460941 7.644628 (Cornfield)
65-74 | 1.385399 .7159378 2.682346 7.31677 (Cornfield)
75+ | 2.121212 .5892296 7.650982 1.5 (Cornfield)
-----------------+-------------------------------------------------
Crude | 2.151243 1.566262 2.954605 (Cornfield)
M-H combined | 2.267489 1.613024 3.187496
-----------------+-------------------------------------------------
Test of homogeneity (M-H) chi2(4) = 3.27 Pr>chi2 = 0.5131
Test that combined OR = 1:
Mantel-Haenszel chi2(1) = 22.65
Pr>chi2 = 0.0000
The age adjusted odd ratio is estimated as 2.27 (95% CI 1.61, 3.19,
p-value<0.001). This is a comparison of the odds of disease among
subjects who consume 10+ g/day of tobacco relative to the disease
odds for subjects consume <10 g/day of tobacco *where the subjects
are of equivalent ages* (here equivalent is in tems of the age group).
We find that the adjustment for age does not make a large impact in the
odds ratio estimate (a 100*(2.26-2.15)/2.26 = 4.9% change).
The crude odds ratio estimate, 2.15, does not make the exposure comparison
within each age group, but rather does not consider age at all in the
comparison of the exposed and unexposed groups.
The test of homogeneity tests whether
H0: OR(1)=OR(2)=OR(3)=OR(4)=OR(5)=OR(6)
H1: at least one strata has a different odds ratio
where OR(j) is the odds ratio obtained from strata j. The resulting chi-square
statistic is 3.27 on 4 degrees of freedom (one strata dropped) yielding a
non-significant p-value of 0.513. Therefore we would not reject the
homogeneity hypothesis. We can also inspect the strata specific odds ratios
and find that there is a general pattern of decreasing odds ratios with
increasing age groups. This is suggestive of effect modification but we have
not formally tested this trend (we can using logistic regression!).
1(f) Consider the association between alcohol and tobacco. For this we
can create a simple 2x2 table of NEWALC versus NEWTOB and the RxC table
of ALC versus TOB:
| newalc
newtob | <80g/day >=80g/day | Total
-----------+----------------------+----------
<10g/day | 440 85 | 525
| 83.81 16.19 | 100.00
-----------+----------------------+----------
>=10g/day | 327 120 | 447
| 73.15 26.85 | 100.00
-----------+----------------------+----------
Total | 767 205 | 972
| 78.91 21.09 | 100.00
Pearson chi2(1) = 16.4704 Pr = 0.000
| Alcohol
Tobacco | <40g/day 40-79g/da 80-119g/d 120+g/day | Total
-----------+--------------------------------------------+----------
0-9g/day | 261 179 61 24 | 525
| 49.71 34.10 11.62 4.57 | 100.00
-----------+--------------------------------------------+----------
10-19g/day | 81 85 49 18 | 233
| 34.76 36.48 21.03 7.73 | 100.00
-----------+--------------------------------------------+----------
20-29g/day | 42 62 16 12 | 132
| 31.82 46.97 12.12 9.09 | 100.00
-----------+--------------------------------------------+----------
30+g/day | 28 29 12 13 | 82
| 34.15 35.37 14.63 15.85 | 100.00
-----------+--------------------------------------------+----------
Total | 412 355 138 67 | 972
| 42.39 36.52 14.20 6.89 | 100.00
Pearson chi2(9) = 44.8060 Pr = 0.000
For each of these table we find that the Pearson chi-square rejects
the null hypothesis of homogeneity. From the 2x2 table we see that
only 16% of the <10 g/day tobacco consumers report 80+ g/day of alcohol
consumption while 27% of the 10+ g/day tobacco consumers report 80+ g/day
of alcohol consumption. A similar moderate pattern of increased tobacco use
associated with increased alcohol use is seen in the RxC table.
The statistical significance of the association can be addressed by
the chi-square tests of homogeneity. Each rejects the null hypothesis
of no association. The magnitude of the association is nicely summarized
by the odds ratio estimate for NEWALC versus NEWTOB: 1.90 (95% CI 1.39,
2.59, p-value<0.001). This implies that the odds of 80+ g/day of alcohol
consumption among the 10+ g/day tobacco consumers is 1.90 times the odds
of 80+ g/day alcohol consumption among the <10 g/day tobacco consumers.
1(g) We can compute disease odds ratios for the exposure = NEWALC while
controlling for both AGE and TOB using Mantel-Haenszel after stratifying
in the combined levels of (AGE,TOB):
[FIRST REVISIT THE AGE ADJUSTED OR FROM 1(c)]:
Mantel-Haenszel estimate of the odds ratio
Comparing newalc==1 vs newalc==0, controlling for age
----------------------------------------------------------------
Odds ratio chi2(1) P>chi2 [95% Conf. Interval]
----------------------------------------------------------------
5.152126 84.89 0.0000 3.491326 7.602959
----------------------------------------------------------------
[NOW LOOK AT AGE AND TOB ADJUSTED OR]:
Mantel-Haenszel estimate of the odds ratio
Comparing newalc==1 vs newalc==0, controlling for age
by tob
Note: only 18 of the 24 strata formed in this analysis
contribute information about the effect of the explanatory variable
----------+--------------------------------------------------------------------
tob | Odds ratio chi2(1) P>chi2 [95% Conf. Interval]
----------+--------------------------------------------------------------------
0-9g/day | 6.052225 43.33 0.0000 3.283540 11.15547
10-19g/d | 4.031735 18.93 0.0000 2.042358 7.958883
20-29g/d | 3.786274 7.72 0.0055 1.378896 10.39663
30+g/day | 5.849858 6.31 0.0120 1.223661 27.96595
----------+--------------------------------------------------------------------
Mantel-Haenszel estimate controlling for age and tob
----------------------------------------------------------------
Odds ratio chi2(1) P>chi2 [95% Conf. Interval]
----------------------------------------------------------------
4.852866 73.45 0.0000 3.253018 7.239526
----------------------------------------------------------------
Test of homogeneity of ORs (approx): chi2(3) = 1.08
Pr>chi2 = 0.7825
The resulting AGE and TOB adjusted disease odds ratio comparing 80+ g/day
alcohol use to <80 g/day alcohol use is estimated as 4.85 (95% CI 3.25,
7.24, p-value<0.001). This now compares the disease odds among subjects
who consume 80+ g/day of alcohol to the disease odds among subjects who
consume <80 g/day of alcohol where the comparison is for subjects that
otherwise have the same age and the same tobacco consumption. The crude
OR estimate didn't "fix" either age or tobacco use when making the alcohol
group comparison, while the age adjusted OR in 1(c) only controlled for
age.
We see a modest change in the age and tobacco adjusted OR estimate
relative to the crude estimate (a 100*(4.85-5.61)/5.61 = -13.5% change).
Inspection of the odds ratios in the above table suggest that there is no
systematic variation in the age adjusted odds ratios within tobacco strata.
The homogeneity test above fails to reject the null hypothesis that OR(1)=
OR(2)=OR(3)=OR(4). (This is a subtle test -- by pooling over the age
groups within each level of tobacco we are assuming that the exposure odds
ratio may depend on the level of tobacco but not on the age group within
tobacco. Then the homogeneity tests whether these tobacco specific exposure
odds ratios are equal. This is analagous to using a regression model with
NEWALC + TOB(j)'s + AGE(k)'s + NEWALC*TOB(j)'s + TOB(j)*AGE(k)'s and then
testing the interaction coefficients NEWALC*TOB(j). Tricky!).
1(h) We can compute disease odds ratios for the exposure = NEWTOB while
controlling for both AGE and ALC using Mantel-Haenszel after stratifying
in the combined levels of (AGE,ALC):
[FIRST REVISIT THE AGE ADJUSTED OR FROM 1(e)]:
Mantel-Haenszel estimate of the odds ratio
Comparing newtob==1 vs newtob==0, controlling for age
----------------------------------------------------------------
Odds ratio chi2(1) P>chi2 [95% Conf. Interval]
----------------------------------------------------------------
2.267489 22.65 0.0000 1.603275 3.206878
----------------------------------------------------------------
[NOW LOOK AT AGE AND ALC ADJUSTED OR]:
Mantel-Haenszel estimate of the odds ratio
Comparing newtob==1 vs newtob==0, controlling for age
by alc
Note: only 18 of the 24 strata formed in this analysis
contribute information about the effect of the explanatory variable
----------+--------------------------------------------------------------------
alc | Odds ratio chi2(1) P>chi2 [95% Conf. Interval]
----------+--------------------------------------------------------------------
<40g/day | 4.593489 13.72 0.0002 1.891396 11.15586
40-79g/d | 1.540453 2.23 0.1352 0.870096 2.727282
80-119g/ | 1.555876 1.20 0.2741 0.699999 3.458217
120+g/da | 1.317771 0.22 0.6414 0.411093 4.224156
----------+--------------------------------------------------------------------
Mantel-Haenszel estimate controlling for age and alc
----------------------------------------------------------------
Odds ratio chi2(1) P>chi2 [95% Conf. Interval]
----------------------------------------------------------------
1.901541 11.45 0.0007 1.302184 2.776766
----------------------------------------------------------------
Test of homogeneity of ORs (approx): chi2(3) = 5.20
Pr>chi2 = 0.1577
The resulting AGE and ALC adjusted disease odds ratio comparing 10+ g/day
tobacco use to <10 g/day tobacco use is estimated as 1.90 (95% CI 1.30,
2.76, p-value<0.001). This now compares the disease odds among subjects
who consume 10+ g/day of tobacco to the disease odds among subjects who
consume <10 g/day of tobacco where the comparison is for subjects that
otherwise have the same age and the same alcohol consumption. The crude
OR estimate didn't "fix" either age or alcohol use when making the tobacco
group comparison, while the age adjusted OR in 1(e) only controlled for
age.
We see a modest change in the age and alcohol adjusted OR estimate
relative to the crude estimate (a 100*(1.90-2.15)/2.15 = -11.6% change).
Inspection of the odds ratios in the above table suggest that there is no
systematic variation in the age adjusted odds ratios within alcohol
strata although the lowest age category has an OR estimate that is much larger
than the remaining estimates. The homogeneity test above fails to reject
the null hypothesis that OR(1)=OR(2)=OR(3)=OR(4).
2(a)
The dependent variable is LBW, or low birth weight (1=weight <= 2500g,
0 otherwise).
pi(X) is the probability of low birth weight as a function of covariates, X.
The hypothesized models are:
*Model 1:
pi(X) = ( exp( b0 + b1*SMOKE + b2*AnyPTL + b3*RACE(2) + b4*RACE(3) +
b5*HYPER + b6*URIRR + b7*AGE20 ) )
/ ( 1 + ( exp( b0 + b1*SMOKE + b2*AnyPTL + b3*RACE(2) + b4*RACE(3) +
b5*HYPER + b6*URIRR + b7*AGE20 ) ) )
*Model 2:
pi(X) = ( exp( b0 + b1*SMOKE + b2*AnyPTL + b3*RACE(2) + b4*RACE(3) +
b5*HYPER + b6*URIRR + b7*AGE20 + b8*SMOKE*AGE20 ) )
/ ( 1 + ( exp( b0 + b1*SMOKE + b2*AnyPTL + b3*RACE(2) + b4*RACE(3) +
b5*HYPER + b6*URIRR + b7*AGE20 + b8*SMOKE*AGE20 ) ) )
The fitted models are:
*Model 1:
pi(X) = ( exp( -2.003 + .896*SMOKE + 1.317*AnyPTL + .962*RACE(2) +
.951*RACE(3) + 1.364*HYPER + .769*URIRR - 0.051*AGE20 ) )
/ ( 1 + ( exp( -2.003 + .896*SMOKE + 1.317*AnyPTL + .962*RACE(2) +
.951*RACE(3) + 1.364*HYPER + .769*URIRR - 0.051*AGE20 ) ) )
*Model 2:
pi(X) = ( exp( -1.905 + .713*SMOKE + 1.295*AnyPTL + .870*RACE(2) +
.906*RACE(3) + 1.396*HYPER + .838*URIRR - .082*AGE20 + .066*SMOKE*AGE20))
/ ( 1 + ( exp( -1.905 + .713*SMOKE + 1.295*AnyPTL + .870*RACE(2) +
.906*RACE(3) + 1.396*HYPER + .838*URIRR - .082*AGE20 + .066*SMOKE*AGE20)))
2(b)
Model 1:
logit[pi(X)] = -2.003 + .896*SMOKE + 1.317*AnyPTL + .962*RACE(2) +
.951*RACE(3) + 1.364*HYPER + .769*URIRR - 0.051*AGE20
Model 2:
logit[pi(X)] = -1.905 + .713*SMOKE + 1.295*AnyPTL + .870*RACE(2) +
.906*RACE(3) + 1.396*HYPER + .838*URIRR - .082*AGE20 + .066*SMOKE*AGE20
2(c)
Smoker:
X1 = ( 1, 0, 0, 0, 1, 0, 30-20 )
X1*beta = -2.003 + .896*(1) + 1.317*(0) + .962*(0) +
.951*(0) + 1.364*(1) + .769*(0) - 0.051*(10)
= -0.253
pi(X1) = ( exp( X1*beta ) ) / ( 1 + exp( X1*beta ) )
= exp(-0.253) / (1 + exp(-0.253))
= 0.437
Non-smoker:
X0 = ( 0, 0, 0, 0, 1, 0, 30-20 )
X0*beta = -2.003 + .896*(0) + 1.317*(0) + .962*(0) +
.951*(0) + 1.364*(1) + .769*(0) - 0.051*(10)
= -1.149
pi(X0) = ( exp( X0*beta ) ) / ( 1 + exp( X0*beta ) )
= exp(-1.149) / (1 + exp(-1.149))
= 0.241
RR = .437 / .241 = 1.816
Smokers with hypertension, white, without history of PTL, without uterine
irritability, age 30, are 1.816 times as likely to have a low birth weight
child compared to non-smokers with with hypertension, white, without history
of PTL, without uterine irritability, age 30.
2(d)
Smoker:
X1 = ( 1, 0, 0, 0, 1, 0, 30-20, 1*(30-20) )
X1*beta = -1.905 + .713*(1) + 1.295*(0) + .870*(0) + .906*(0) +
1.396*(1) + .838*(0) - .082*(10) + .066*(1)*(10)
= 0.044
pi(X1) = ( exp( X1*beta ) ) / ( 1 + exp( X1*beta ) )
= exp(0.044) / (1 + exp(0.044))
= 0.511
Non-smoker:
X0 = ( 0, 0, 0, 0, 1, 0, 30-20 )
X0*beta = -1.905 + .713*(0) + 1.295*(0) + .870*(0) + .906*(0) +
1.396*(1) + .838*(0) - .082*(10) + .066*(0)*(10)
= -1.329
pi(X0) = ( exp( X0*beta ) ) / ( 1 + exp( X0*beta ) )
= exp(-1.329) / (1 + exp(-1.329))
= 0.209
RR = .511 / .209 = 2.441
Smokers with hypertension, white, without history of PTL, without uterine
irritability, age 30, are 1.816 times as likely to have a low birth weight
child compared to non-smokers with with hypertension, white, without history
of PTL, without uterine irritability, age 30.
The estimates are different due to the interaction term in model 2.
2(e) AGE = 20
Model 1:
OR = exp( 0.896 ) = 2.450
Model 2:
OR = exp( 0.713 + 0.066(20-20) ) = exp( 0.713 ) = 2.040
2(f) AGE = 30
Model 1:
OR = exp( 0.896 ) = 2.450
Model 2:
OR = exp( 0.713 + 0.066(30-20) ) = exp( 0.713 + 0.660 ) = 3.947
3(a)
Dependent variable: Incident esophogeal cancer (case=1, control=0).
pi(X) is the probability of esophogeal cancer
3(b)
logit[pi(X)] = b0 + b1*NewAlc
3(c)
OR = exp(b1) = exp(1.668543) = 5.304
95% CI = [3.658, 7.691]
This OR estimate is very similar to the estimate obtained in 1.c and these
odds ratio estimates have the same interpretation. After adjusting for
age, the odds of esophogeal cancer among those who drink more than 80g of
alcohol per day is 5.3 times that of the odds for those who drink
less than 80g of alcohol per day.
In the stratified analysis we explicitly "controlled" for age by separating
the data into age groups, calculating ORs for each strata, and then combining
them into a common estimate. In the regression setting, we "adjust" for age
by allowing age dummy variables, and then obtain a common odds ratio estimate
from the main effect for NEWALC. If we had put in NEWALC*AGE(j) interactions
then the NEWALC odds ratio would be different for the levels of AGE -- ie.
strata specific.
Using regression we can structure the associations (common, strata specific)
and we obtain summaries of the influence of the potential confounder via
the coefficient estimates for that variable.
3(d)
OR = exp(b1) = exp(1.6241) = 5.074
95% CI = [3.464, 7.432]
This OR estimate is very similar to the estimate obtained in 1.g and these
odds ratio estimates have the same interpretation. After adjusting for
age and tobacco consumption, the odds of esophogeal cancer among
subjects who drink more than 80g of alcohol per day is 5.1 times
greater than the odds of disease for subjects who drink less than
80g of alcohol per day.