Validation Files for survivalROC Package

Set up/Background

Our goal here is to validate the accuracy of the survivalROC package. Specifically, we want to see if we have a "true" ROC curve at a given time of interest, how close is the simulated ROC if we use the survivalROC package to produce the ROC curve. We here describe the background and how we create the "true" ROC curve and how we compare the "true" and simulated ROC curves. For reference, we include the simulation file, the output file and a pdf file containing the two ROC curves. We also provide the link for the documentation file for survivalROC package.

We start with the setup for simulation and how to create the "true" ROC curve. We simulate survival times and marker values from a bivariate normal distribution with m = (0, 0)^T, r = -0.70, and variance of 1. Censoring time is generated independently as normal, so that, about 40% of the observations are censored. For ease of understanding, let month be the unit of time. Using this data we find the False Positive (FP) and True Positive (TP) values at 1 month. Let n denote the sample size and Nsim denote the number of repetitions. For each of Nsim number of repetitions, we generate data for n individuals and repeat this procedure Nsim times, to get Nsim set of (FP, TP) values. We note that, the marker cutoff to produce (FP, TP) values changes over simulation and hence the set of (FP, TP) values also change over simulations. We want to compare the "true" ROC curve with the simulated one, based on two sets of points on the curves and it may be difficult to summarize raw (FP, TP) pairs if no standardization was introduced. To create the "true" ROC curve, we fix the FP values to be 0.01,0.02,…, 0.99 and estimate TP values at these false positive rates. To do so, we first estimate the (FP, TP) pairs and then interpolate the TP rate corresponding to the given FP rate for a given simulation. We then average TP values over Nsim number of simulations to get an estimate of the true TP value corresponding to the given FP rate between 0.01 to 0.99. So essentially, to compare the two ROC curves, we fixed the FP rates and at a given FP rate, we compared the true TP value to simulated TP rate.

The simulation file (survivalROCsim.R) is discussed next. Part 1 of the file obtains the TP values corresponding to the FP values of 0.01, 0.02, …, 0.99 based on the previously discussed set up. Part 2 of the file does the simulations for sample sizes 200, 400, 800, 1600 and 3200. For a given sample size, the simulated TP values are based on 500 simulations. Please be warned at this point that a single simulation for large sample size (>500) takes considerable amount of time, so it is advised that Nsim in the file is replaced by a much smaller number. We note that even from a summary based on only 10 simulations the agreement between the true TP rate and estimated TP rate is reasonable. After downloading and installing the package survivalROC, the file can be run as it is.

The true and estimated TP based on several different sample sizes (200, 400, 800, 1600, 3200) are given in the out.txt file and the pdf file shows the corresponding ROC curves. Please see the documentation file for further details of the functions and more background.