Set up/Background
Our goal here is to validate the accuracy of the survivalROC
package. Specifically, we want to see if we have a "true" ROC
curve at a given time of interest,
how close is the simulated ROC if we use the survivalROC package
to produce the ROC curve. We here describe the background and
how we create the "true" ROC curve and how we compare the "true"
and simulated ROC curves. For reference, we include the
simulation file, the output file and a pdf file containing the
two ROC curves. We also provide the link for the documentation
file for survivalROC package.
We start with the setup for simulation and how to create the
"true" ROC curve.
We simulate survival times and marker
values from a bivariate normal distribution with m = (0, 0)T, r = -0.70, and variance of 1. Censoring
time is generated independently as normal, so
that, about 40% of the observations are censored. For ease of
understanding, let month be the unit of time. Using this data we
find the False Positive (FP) and True Positive (TP) values at 1
month. Let n denote the sample size and Nsim
denote the number of repetitions. For each of Nsim number of
repetitions, we generate data for n individuals and repeat this
procedure Nsim times, to get Nsim set of (FP, TP)
values. We note that, the marker cutoff to produce (FP, TP)
values changes over simulation and hence the set of (FP, TP)
values also change over simulations. We want to compare the
"true" ROC curve with the simulated one, based on two sets of
points on the curves and it may be difficult to summarize raw
(FP, TP) pairs if no standardization was introduced. To create
the "true" ROC curve,
we fix the FP values to be 0.01,0.02,…, 0.99 and estimate TP values
at these false positive rates. To do so, we first estimate the
(FP, TP) pairs and then interpolate the TP rate corresponding
to the given FP rate for a given simulation. We then average TP values
over Nsim number of simulations to get an estimate of the true
TP value corresponding to the given FP rate between 0.01 to
0.99. So essentially, to
compare the two ROC curves, we fixed the FP rates and at a given
FP rate, we compared the true TP value to simulated TP rate.
Note that the true (FP, TP) pair can be obtained easily for a
given cut off value of the marker. However, what we needed was
the TP values at a given FP value and some more calculation was
necessary to get the TP values at the given FP rate.
The simulation file (
survivalROCsim.R) is
discussed next. Part 1 of the file obtains the TP values
corresponding to the FP values of 0.01, 0.02, …, 0.99 based on the
previously discussed set up. Part 2 of the file does the
simulations for sample sizes 200, 400, 800, 1600 and 3200. For a
given sample size, the simulated TP values are based on 500
simulations. Please be warned at this point that a single
simulation for large sample size (>500) takes considerable
amount of time, so it is advised that
Nsim in the file is
replaced by a much smaller number. We note that even from a
summary based on only 10 simulations the agreement between the
true TP rate and estimated TP rate is reasonable. After
downloading and installing the package survivalROC, the file can
be run as it is.
The true and estimated TP based on several
different sample sizes (200, 400, 800, 1600, 3200) are given in
the
out.txt file and the pdf file shows the corresponding
ROC
curves. Please see the documentation file for further details of
the functions and more background.
mail to: Paramita Saha
Last modified: Thu Jun 7 14:13:54 PDT 2007