# README File for cmprskPHContinMark package:cmprskPHContinMark R Documentation # This code was written by Yanqing Sun, Peter Gilbert, and Ted Holzman # (completed in April, 2014). # This program implements Sun and Gilbert (2012, Scand J Statistics) and # Gilbert and Sun (2014, Journal of the Royal Statistical Society Series # C), to analyze an input data-set. # Description: # These papers provide estimation and testing methods for the # mark-specific proportional hazards model accommodating that some # failures have a missing mark, and allowing separate baseline # mark-specific hazard functions for different baseline subgroups. # Missing marks are handled via inverse-probability complete-case # weighting (IPW) or augmented IPW. # NOTE: This code as supplied here assumes only one baseline stratum. # The code would need slight modification by the user to accommodate multiple # baseline strata. # The R function for performing the analysis has been tested on a macintosh, # a linux workstation, and a PC. # Step 1: The user places the file cmprskPHContinMark_0.99.tar.gz in a directory where they wish to run this R package # Step 2: To use the code from Linux, within that directory execute the commands: mkdir Rlibs R CMD INSTALL cmprskPHContinMark_0.99.tar.gz --library=Rlibs # Step 3: Enter R, and type the command > library("cmprskPHContinMark",lib.loc="./Rlibs") # The package depends on another package called "stringr". If the # cmprskPHContinMark install fails because of lack of the stringr package, # then install stringr from CRAN like this: > install.packages("stringr") # Step 4: Having successfully installed cmprskPHContinMark, it can be used like this: R > library("cmprskPHContinMark") > cmprskPHContinMark(inputf,outpre,nboot,missmark,nauxiliary) # inputf is the name of the datafile. If you don't supply it, it defaults # to a practice data-set based on the RV144 data used in Gilbert and Sun (2014), # practicedataRV144.dat. A practice data-set is used both because it is much smaller # (greatly shortening run time) and it is not possible to share the real data at this # time in this forum. Running getPracticeData() in R generates the practice data-set # practicedataRV144.dat. # outpre is the prefix name of the output files. There are four output # files named _VE, _Power, _Plot1 and # _Plot2. The default value is the same as the value of inputf. # The files contain # _VE: the estimated mark-specific VEs with standard errors # _Power: the results of the hypothesis testing procedures # _Plot1: the data needed for graphical diagnostics of H_10 # _Plot2: the data needed for graphical diagnostics of H_20 # nboot is an integer, the number of bootstrap iterations. The default # (and the maximum) is 500. # missmark and nauxiliary are flags for options. 1 turns the options on. # 0 turns them off. Importantly, the program can work for data-sets with no missing # marks (by specifying missmark=0) and for data-sets with no auxiliary covariates # (by specifying nauxiliary=0). # Detailed help on the meaning of the input variables and how to assemble the input data file # inputf is provided by typing the following within R: ?cmprskPHContinMark # The cmprskPHContinMark itself function doesn't return anything useful; if you # examine the return value it will contain a list of the values of the # parameters you passed in. Rather, all the results of the calculations are # contained in the four output files, and in some of the material # written to the screen. # Any of the parameters can be defaulted, for example to run the program on # the data-set practicedataRV144.dat type: cmprskPHContinMark() # The following will also work, defaulting a subset of the parameters: cmprskPHContinMark("practicedataRV144.dat",,100,0,) # In practice, the important step of the user is to assemble the input data-set # inputf; given the importance of this the meaning of the eight space-separated # columns of data is provided here: # The input data file must contain eight space-separated columns of data: # 1. subject identifier 1 (could be arbitrary integers) # 2. subject identifier 2 (could be arbitrary integers) # 3. binary auxiliary (1 or 0) covariate used for predicting the mark V # 4. treatment assignment (1 or 0 for vaccine or placebo) # 5. infection status (1 or 0 for infected or right-censored, respectively) # 6. time (minimum of failure time or right-censoring time) # 7. R [indicator of observing the mark (1=infected and observed, # 0=infected and unobserved, 2=uninfected)] # 8. mark V (8888 = infected and missing; 99 = uninfected and thus # obviously missing) # If nauxiliary=0, then the variable in column 3 will not be used, but some (arbitrary) # binary variable still must be included. # If the input data file is placed in a file name "analysisdata.dat", # the program would be run in R with: cmprskPHContinMark("analysisdata.dat",,,,) # After the program is run, it is of interest to run the # cmprskPHContinMark_makeplots() function to report the results # The cmprskPHContinMark_makeplots() function takes eight arguments -- all can be defaulted > cmprskPHContinMark_makeplots( datafile_92="practicedataRV144_92.dat", datafile_cm="practicedataRV144_cm.dat", vefile_92="practicedataRV144_92.dat_VE", vefile_cm="practicedataRV144_92.dat_VE", plot1file_92="practicedataRV144_92.dat_Plot1", plot1file_cm="practicedataRV144_cm.dat_Plot1", plot2file_92="practicedataRV144_92.dat_Plot2", plot2file_cm="practicedataRV144_cm.dat_Plot2") # The defaults are included in the package, accessed with the command getPracticeData() # The output files are written to the directory from which the # cmprskPHContinMark_makeplots routine was run. ############################################################### # ILLUSTRATION # To illustrate the implementation above, suppose two different analyses are done for # two different distances. # Practice data-sets practicedataRV144_92.dat and practicedataRV144_cm.dat are provided # as defaults to allow this analysis. # The following is the entire sequence of R commands to create the output. # As a pre-requisite, obtain the practice data-sets; this function creates the # data files practicedata.dat, practicedataRV144_92.dat, and practicedataRV144_cm.dat # and places them in the directory where the coding files live: library("cmprskPHContinMark",lib.loc="./Rlibs") getPracticeData() # First set of distances: cmprskPHContinMark("practicedataRV144_92.dat",,500,1,1) # The outputted results on point estimates and standard error estimates are put in the file: # practicedataRV144_92.dat_VE # The outputted results on the testing procedures are put in the file: # practicedataRV144_92.dat_Power # Second set of distances: cmprskPHContinMark("practicedataRV144_cm.dat",,500,1,1) # The outputted results on point estimates and standard error estimates are put in the file: # practicedataRV144_cm.dat_VE # The outputted results on the testing procedures are put in the file: # practicedataRV144_cm.dat_Power #Plot the output cmprskPHContinMark_makeplots(datafile_92 ="practicedataRV144_92.dat", datafile_cm ="practicedataRV144_cm.dat", vefile_92 ="practicedataRV144_92.dat_VE", vefile_cm ="practicedataRV144_cm.dat_VE", plot1file_92="practicedataRV144_92.dat_Plot1", plot1file_cm="practicedataRV144_cm.dat_Plot1", plot2file_92="practicedataRV144_92.dat_Plot2", plot2file_cm="practicedataRV144_cm.dat_Plot2") # The outputted plots of interest are put in the files: # Figure1markmissingtestingmindistkwong.92cm_boxplots.eps RV144_TestProcs_A1h3P_kwong_92cm.eps # Figure1markmissingtestingmindistkwong.92cm_scatterplots.eps RV144_VECI_A1h3P_mindist_kwong.92cm.eps