Here you can find more details about how to use the SCRSELECT package, particularly the main function ReturnModel(). I will post a more advanced tutorial for users that want to manually do SCRSELECT() or DICTAUG() later here.
The link to the package documentation can be found on CRAN here. This code is related to a recent paper: “Bayesian variable selection for a semi-competing risks model with three hazards” which is joint work with Dr. Marina Vannucci, Dr. Peter Thall and Dr. Steven Lin.
This function performs Bayesian variable selection on a semi-competing risks model which gives the posterior probabilities of inclusion for each variable in each hazard model and determines the optimal model based on thresholds determined by the DIC statistic. More details on this can be found in the paper, which will appear in print in August.
The package has 4 main functions:
- SCRSELECT – Performs Stochastic Search Variable Selection (SVSS) on the semi-competing risks model and saves posterior samples to a given location and summaries of the variable selection parameters and MCMC acceptance rates.
- DICTAUG – takes posterior means from SCRSELECT and performs a grid search to find the optimal thresholds for each hazard variable inclusion.
- ReturnModel- Uses SCRSELECT and DICTAUG to determine the best final model, then this performs the final MCMC to obtain inference.
- SCRSELECTRUN – Used in the ReturnModel function
The most useful function here is ReturnModel() as it performs SVSS, finds the optimal final model (i.e. what variables are included) and then returns important summaries of the posterior along with writing the posterior to a desired folder location. I’ll briefly go over the inputs prior to looking at the output and explaining it.
- Y1: Vector of effusion or death time.
- I1: Vector of indicators for a non-terminal event.
- Y2: Vector of death or censoring times.
- I2: Vector of death indicators.
- X: Covariates ~ the last inc columns will not be selected and will be a part of all three hazards for any possible model.
- Hyper: Large vector of hyperparameters ~ see documentation.
- Inc: How many covariates should be excluded from selection.
- c: shrinkage hyperparameter, values between 10-50 are suggested.
- BSVSS: how many iterations to run the SVSS (suggested = 100,000)
- BDIC: how many iterations to run the DIC search (suggested = 10,000)
- Path = where to save posterior samples of final model
I will show what the output from the application in the paper looks like in the program. I used the example code given on CRAN, but ran these chains out for a more reasonable amount of time. I let BSVSS = 10,000 and BDIC = 1,000 indicating that the variable selection MCMC should be carried out for 10,000 iterations and the DIC-tau_g procedure should be run on 1,000 iterations. I will set inc=2, meaning that the SVSS will always include the last two variables in each hazard.
First this program runs two different chains through the SVSS procedure and save the relevant results. Progress of each of the two chains is printed every 5000 iterations. After each chain is finished, the important posterior quantities of the combined sample are saved and the marginal posterior probabilities of inclusion are printed for each hazard. Given that this example data is just simulated noise, it’s not surprising here that we don’t see some variables being more important than others. After these values are printed, the DIC-\tau_g proceeds by first searching over (.05,\tau_2,\tau_3) for the optimal DIC. This proceeds by increasing \tau_1 by .05.
Once the grid search finishes, the program outputs the DIC value for the optimal model(s). It prints more than 1 value if two tau vectors produced DIC values that were within 1 of each other. Next the final model variable inclusion is printed where a TRUE indicates that that variable is important in that specific hazard. We can see, for example, that only variable 4 is included in the first hazard.
Next the program takes this final model and performs MCMC to obtain important posterior quantities for inference. The first thing that is printed is the posterior probability that a covariate increases a hazard, which for example, we see that variable 7 (which was not allowed to be selected out) had a posterior probability of a non-terminal event of .73. Likewise, Variable 2 in the hazard of death after a non-terminal event was shown to be prognostic for death (P=.83). The last bit of output from the function is shown below (excuse the output format, I’ll fix that up!):
We are left with the posterior median hazard ratios in the first line of each hazard, followed by the .025 % and .975 % quantiles, respectively. Once again, notice that variables not included in the final model have NA listed in their posterior summaries.
I hope this helps clear up how to use the package! As a note of practical usage, I would suggest that the user pick a value of c and see what those marginal posterior probabilities of inclusion are after the first step. You want to encourage separation so a small value of c that leads to very large marginal posterior probabilities of inclusion will be as unhelpful as large values of c which will lead to very small marginal posterior probabilities of inclusion. Ideally, you want to see some variables with high inclusion probabilities and some with low in each of the three hazards.
I also want to note that while I do not have guides here for the functions SCRSELECT() and DICTAUG(), these are pieces of the ReturnModel() function. An experienced user of ReturnModel() might want to further investigate the posterior distribution of the SVSS procedure, which you would use SCRSELECT() to do as this saves all posterior samples. Then they might save the important quantities for manually using DICTAUG(), however this is not suggested to those who are not already experienced with using ReturnModel().
If you have any questions don’t hesitate to ask! Also if you enjoyed this post, check out some of my other blog posts.
-Andrew G Chapple