*************************** * Biostatistics 513 * Exercise Set 6, 2002 *************************** 1(a) In the PBC study t=0 corresponds to the time that patients enrolled in the study -- "...was the date of determination of eligibility for the trials, and all of the clinical, biochemical and demographic risk factors were assessed on that date." (page 2). 1(b) Death from any cause was treated as the event (failure). 1(c) Censoring was due to: 1) end of study follow-up (July 1986) (n=160 alive and censored) 2) lost to follow-up (n=8 patients) 3) had undergone liver transplantation (n=19 patients) 1(d) Justify independent censoring: 1) End of study is a fixed, predetermined time and censoring due to this is always independent of the event time. 2) Hard to justify without more information. We would want to know if there was any cause to believe that these patients were not representative. In the end, this is a minor issues since it pertains to a small number of patients (and in light of the relatively large number of observed failures, n=125). 3) Hard to justify without more information. This censoring would be the most suspect in terms of relationship to event time -- are patients that are particularly well or particularly poor selected for transplantation? If so, this censoring would not be unrelated to the failure time. 1(e) The approximate S( 5 years ) for each of the groups in Figure 3 is: High: 20% Medium: 50% Low: 85% 2. The Kaplan-Meier Table is: At Conditional Survivor Time Risk Died Lost Survived Probability Function ---------------------------------------------------------------- 42 12 1 0 11 0.9167 0.9167 53 11 1 0 10 0.9091 0.8333 57 10 1 0 9 0.9000 0.7500 63 9 1 0 8 0.8889 0.6667 81 8 1 0 7 0.8750 0.5833 140 7 1 0 6 0.8571 0.5000 176 6 1 0 5 0.8333 0.4167 210 5 0 1 4 1.0000 0.4167 252 4 1 0 3 0.7500 0.3125 476 3 0 1 2 1.0000 0.3125 524 2 1 0 1 0.5000 0.1562 1037 1 0 1 0 1.0000 0.1562 ------------------------------------------------------------------------ 3(a) Univariate summaries: ********** * Age * ********** Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- age | 90 64.61111 10.79606 41 86 Variable | Obs Percentile Centile [95% Conf. Interval] ---------+------------------------------------------------------------- age | 90 10 49 47 52 | 25 56.75 52 61 | 50 65 63 68 | 75 72.25 69.69499 76 | 90 78.9 76 81.72632 ********** * Year * ********** ----------+----------- year of | entry | Freq. ----------+----------- 70 | 2 71 | 12 72 | 9 73 | 11 74 | 14 75 | 8 76 | 19 77 | 11 78 | 4 ----------+----------- ********** * Stage * ********** ----------+----------- stage at | diagnosis | Freq. ----------+----------- 1 | 33 2 | 17 3 | 27 4 | 13 ----------+----------- 3(b) We see a clear trend of an increasing fraction of patients observed to die with increasing stage at enrollment (stage 3, and stage 4). stage at | status diagnosis | 0 1 | Total -----------+----------------------+---------- 1 | 18 15 | 33 | 54.55 45.45 | 100.00 -----------+----------------------+---------- 2 | 10 7 | 17 | 58.82 41.18 | 100.00 -----------+----------------------+---------- 3 | 10 17 | 27 | 37.04 62.96 | 100.00 -----------+----------------------+---------- 4 | 2 11 | 13 | 15.38 84.62 | 100.00 -----------+----------------------+---------- Total | 40 50 | 90 | 44.44 55.56 | 100.00 3(c) We do not find any trends in the stage of diagnosis over time. A Pearson's chi-square test yields X2=24.73 on 24 degrees of freedom with a p-value of 0.420. year of | stage at diagnosis entry | 1 2 3 4 | Total -----------+--------------------------------------------+---------- 70 | 2 0 0 0 | 2 | 100.00 0.00 0.00 0.00 | 100.00 -----------+--------------------------------------------+---------- 71 | 6 2 3 1 | 12 | 50.00 16.67 25.00 8.33 | 100.00 -----------+--------------------------------------------+---------- 72 | 2 1 5 1 | 9 | 22.22 11.11 55.56 11.11 | 100.00 -----------+--------------------------------------------+---------- 73 | 5 3 2 1 | 11 | 45.45 27.27 18.18 9.09 | 100.00 -----------+--------------------------------------------+---------- 74 | 6 1 6 1 | 14 | 42.86 7.14 42.86 7.14 | 100.00 -----------+--------------------------------------------+---------- 75 | 5 1 1 1 | 8 | 62.50 12.50 12.50 12.50 | 100.00 -----------+--------------------------------------------+---------- 76 | 4 3 8 4 | 19 | 21.05 15.79 42.11 21.05 | 100.00 -----------+--------------------------------------------+---------- 77 | 2 4 2 3 | 11 | 18.18 36.36 18.18 27.27 | 100.00 -----------+--------------------------------------------+---------- 78 | 1 2 0 1 | 4 | 25.00 50.00 0.00 25.00 | 100.00 -----------+--------------------------------------------+---------- Total | 33 17 27 13 | 90 | 36.67 18.89 30.00 14.44 | 100.00 We find a similar summary of the age distribution for each level of stage. -> stage= 1 Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- age | 33 64.18182 11.25959 43 86 -> stage= 2 Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- age | 17 64.82353 10.64328 47 86 -> stage= 3 Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- age | 27 63.81481 10.29203 49 82 -> stage= 4 Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- age | 13 67.07692 11.7151 41 84 3(d) SEE web page for postscript files with the Kaplan-Meier curves. The point estimate at t=5 is 0.5313. This estimates the proportion of the laryngeal cancer patients that would survive beyond 5 years after enrollment. 3(e) SEE web page for postscript files. 3(f) Log rank test: H0: survival is the same in each group H1: survival is not the same in each group The results of the test are given below and inqpdicate that the survival curves are indeed significantly different. This is in agreement with what the Kaplan-Meier curves suggest. Log-rank test for equality of survivor functions ------------------------------------------------ | Events stage34 | observed expected --------+------------------------- 0 | 22 32.58 1 | 28 17.42 --------+------------------------- Total | 50 50.00 chi2(1) = 10.13 Pr>chi2 = 0.0015 3(g) SEE seb page for postscript files. 3(h) We can also use the log-rank test to test whether the survival functions differ by stage. This becomes a chi-square with 4-1=3 degrees of freedom. Below the results of the test indicate a significant difference. Comparing the observed and the expected numbers of events we see that many more events occurred in the higher stage categories than would be expected if there was no association with stage. Log-rank test for equality of survivor functions ------------------------------------------------ | Events stage | observed expected ------+------------------------- 1 | 15 22.57 2 | 7 10.01 3 | 17 14.08 4 | 11 3.34 ------+------------------------- Total | 50 50.00 chi2(3) = 22.76 Pr>chi2 = 0.0000 (optional) Defining a binary variable pre75=1 if year<75 and 0 otherwise allows us to create separate KM curves and log rank tests based on the value of pre75. These graphs and confirmatory tests do not suggest any change in the comparison of stage (3,4 versus 1,2) over calendar time. SEE web page for postscript plots. ************ * pre75==1 * ************ Log-rank test for equality of survivor functions ------------------------------------------------ | Events stage34 | observed expected --------+------------------------- 0 | 17 22.28 1 | 16 10.72 --------+------------------------- Total | 33 33.00 chi2(1) = 3.94 Pr>chi2 = 0.0472 ************ * pre75==0 * ************ Log-rank test for equality of survivor functions ------------------------------------------------ | Events stage34 | observed expected --------+------------------------- 0 | 5 10.16 1 | 12 6.84 --------+------------------------- Total | 17 17.00 chi2(1) = 6.69 Pr>chi2 = 0.0097 .