Adding common legends to multiple R plots

Recently I submitted a paper to a journal that required compound figures with multiple plots to be uploaded as one combined single .eps file. This challenge turned out to be daunting, as I was trying to combine 4 figures, each requiring a legend. Doing this in R using par(mfrow=c(2,2)) produced plots that were unreadable due to the shrinked sizes of each plot and the legend any one of the plots. Likewise there were not too many options in terms of combining the plots .eps files into one .eps file online.

The solution: xpd=”NA” in the legend options.

The following code was used to add a common legend to the

> legend(-.35,1.5, legend = c("0", "1","2","3"),
+ lty = c(2,3,4,5),
+ title = "Subgroup",xpd="NA")

It seems like most of the time, that this code produces an error or warning, but does what we want, producing the following graphic.


So we have the desired output, one common legend in the middle of four plots. Now the coordinates here for the legend were tried via trial and error. For some reason, different sets of graphics have different coordinates that correspond to (0,0) so you need to move the legend around until you get what you want. The following two graphics shows some of the trial and errors that it took to get the legend placed correctly.

It usually doesn’t take too long to get it placed right.

I hope this is useful! And if not, it is at least here as a reference for me.

If you liked this post, check out some of my other blog posts.

-Andrew G. Chapple


The SubTite Package Tutorial

This is a tutorial for the three main functions for using the subgroups specific time to event continual reassessment method (Sub-TITE), seen in the paper: Subgroup-specific dose finding in phase I clinical trials based on time to toxicity allowing adaptive subgroup combination (Currently in PRESS).

Code for this paper exists in the SubTite package available on CRAN here:

I’m going to briefly discuss the four main functions:

  1. GetPriorMeans(): Calibrates the prior means of the non-linear logistic regression parameters to match clinician elicited prior toxicity probabilities for each dose and subgroup considered in the trial.
  2. GetEss(): Takes output form GetPriorMeans() and for different values of $\sigma_\alpha^2$ and $\sigma_\beta^2$, this approximates the prior effective sample size by matching the mean and variance of the prior toxicity probabilities to a Beta distribution.
  3. GetSubTite(): For a set of patient data in a dose-finding trial, given hyperparameters, and matrix of doses tried for each subgroup in the trial, this function determines what dose each subgroup should be enrolled at when the next patient arrives.
  4. SimTrial(): Simulates a Sub-TITE trial based on some true time to event distribution and returns the operating characteristics of the trial under that scenario.

Two C++ functions are included in the package and are run in the background, but in general these are the three important functions to use for designing and running a Sub-TITE trial.


For a given reference time T where we desire to optimize dose based on the probability of toxicity for time T, we first need to obtain the prior means of the regression parameters used to make dose assignment decisions. The function GetPriorMeans() does this by taking an input for the vector of dose levels considered and a matrix containing the clinician elicited prior probabilities of toxicity by time T for each subgroup and dose level considered in the trial. Below is an example of what one of these prior matrices would look like for a SubTITE trial with 3 subgroups and 5 doses.

Clinician = matrix(c(.2,.3,.4,.5,.6,.1,.2,.3,.4,.5,.05,.1,.15,.2,.3),byrow=TRUE,nrow=3)
> Clinician
[,1] [,2] [,3] [,4] [,5]
[1,] 0.20 0.3 0.40 0.5 0.6
[2,] 0.10 0.2 0.30 0.4 0.5
[3,] 0.05 0.1 0.15 0.2 0.3

Each row in this matrix represents a subgroup and each entry in each row corresponds to the elicited toxicity probability by time T for a given dose level. It is important to note here that these toxicity probabilities increase monotonically as the dose increases (as you move right in the matrix) which guarantees the assumption that increasing the dose increases the risk of a toxicity. The different rows do not need to be ordered in any manner within a dose level. We input this matrix and the standardized dose levels into the function GetPriorMeans() below to obtain the baseline and subgroup specific prior means for the intercept and dose-slope.

> Dose=c(-1,-.5,0,.5,1)
> GetPriorMeans(Clinician,Dose)

Which gives the prior means for each logistic regression parameter below, with alpha and beta being the baseline intercept and slope and alpha1, alpha2, beta1, beta2 being the subgroup specific intercepts and slope.


Likewise, as an additional example, if we only wanted a trial with two subgroups and used only the first two rows of the matrix Clinician for the trial, we would obtain the hypermeans:


These prior means are next used in the function GetESS() to obtain the prior variances on the intercept and slope parameters.


This function takes the standardized dose values and prior means for each parameter, along with fixed values for \sigma_\alpha^2 and \sigma_\beta^2. These two values should be calibrated such that the outputted number from the program is approximately 1, while maintaining that  \sigma_\alpha^2 > \sigma_\beta^2.





Prior to using the function GetSubTite() within a trial, a responsibly planned Sub-TITE clinical trial should simulate replications of the trial under various scenarios and distribution families. This can be done in the function SimTrial() which uses c++ to perform the MCMC needed in the trial and additionally performs the entire trial in c++, vastly improving the time needed to simulate such a trial. Depending on sample size and number of trial simulations (>= 1000) length and the number of groups and doses to consider, the function takes 1-5 hours to run and return the operating characteristics of the design.

Let’s break down the function arguments with an example where we simulate toxicity times from the uniform distribution. We will simulate the first scenario shown in the paper by Chapple and Thall (2017) by imposing that up to 60 patients are enrolled with 2 subgroups and 5 doses to try. We also assume an accrual of 2 patients per month and use hyperparameters that were calibrated to the elicited toxicity probabilities seen in supplemental table 1. We will run 1000 simulation repetitions of the trial.
nSims=1000                                                 ##Number of Trial Simulations to run
nGroups=2                                                   ##Number of subgroups to consider in the trial
Dose=(x-mean(x))/sd(x)                             ##Vector of Standardized doses
DoseStart=1                                                 ##Starting Dose (Or vector of starting doses)
Nmax=60                                                     ##Maximum Number of patients to enroll
TGroup = 2                                                  ## Suspends a group for TGroup after 3 enrolled
Upper=c(.85,.95)                                         ## Cutoffs for stopping accrual in a subgroup
Accrue=2                                                     ## Expected Patient Accual per month
T1=6                                                             ## Toxicity Reference Time (in months)
Target=.3                                                     ## Targeted Toxicity Probability

##Next we collect the hyperparameans and vector of hypermeans into a list to be fed into the function. Here I am using values seen in the paper by Chapple and Thall (2017).

MeanInts = Means[[3]]

##Now let’s make our TRUE scenario to simulate the trial under.
GroupProb = matrix(rep(NA,nDose*nGroups),ncol=nDose)

GroupProb[1,]=c(.18,.25,.45,.66,.74)       ## These are the true toxicity probabilities
GroupProb[2,]=c(.10,.15,.30,.50,.60)       ## for each dose for subgroups 1 and 2

groupprob=c(.5,.5)                                    ##True proportion of patients in each group
Family=”Uniform”                                    ##We will use the uniform distribution here

##Note that the Family argument supports the Exponential, Weibull, Gamma, Uniform and Lognormal distributions (all with captital letters).

Param1= GroupProb                               ##Parameter matrix for uniform distribution are
Param1=GroupProb                                ##Just the true toxicity probabilities


After the simulations are finished, the program returns the following summary of operating characteristics (for example):


Let’s walk through some of the details to explain what is being displayed.

Subgroup Specific Selection Probability of the Optimal Dose    0.565    0.762

This is the proportion of times in the simulated trials that the optimal dose for groups 1 and 2 were selected. Recall that the optimal doses for this scenario setup are dose 2 for subgroup 1 and dose 3 for subgroup 2. The next part of the output shows the proportion of times that each dose was selected as optimal for each subgroup.

Frequencies of Optimal Dose Selected for each Subgroup
Group 1

1              2             3           4
0.126      0.565      0.307   0.002

Group 2

1            2             3              4
0.008     0.116     0.762      0.114

Notice that for both subgroups, dose 5 was never selected as optimal. The next operating characteristic displayed is the absolute difference between the true toxicity probability for a chosen dose in each subgroup and the truly optimal dose for that subgroup. This is displayed below:

Average true toxicity probability difference between chosen dose and the optimal dose

[1] 0.0712965     0.0418000

Finally, we have the number of toxicities on average of the trials, and the overall average trial time in terms of months.

Average number of Toxicities for each Subgroup

[1] 9.558     9.729

Average Trial Time

[1] 37.44127

These operating characteristics should be examined under a myriad of different scenarios and families of distributions prior to running a trial to justify using the trial.


After the principle investigator uses the function SimTrial() under a myriad of different scenarios to justify conducting the trial, we use the function GetSubTite during the trial to determine what doses to assign to the next patient for any subgroup and if any of the subgroups should have accrual temporarily suspended. To further illustrate the usefulness of the methodology, we will use the function in an example with 3 subgroups and 5 doses to test. Again we will generate toxicity times (for those that have toxicity prior to the reference time) from a uniform distribution. We used the hyperparameter values determined in the GetPriorMeans() example. The following is an example of using this function for dose assignment after 30 patients have been enrolled. NOTE: THE GROUPS MUST BE LABELED 0,1,2,….

T1=6                                                                       ##Toxicity Reference Time
Target=.3                                                               ##Target Reference Time Toxicity Probability

Upper=c(.9,.95,.95)                                               ##Cutoffs for subgroup suspension
Groups = sample(1:3,n,replace=TRUE) – 1      ##GROUPS MUST BE LABELED 0,1,2
Doses = sample(1:5,n,replace=TRUE)              ##Dose number assignment for each patient
Dose=(x-mean(x))/sd(x)                                      ##Standardized Vector of Doses
##Generate Toxicity times (or censoring) based on group and dose assignment
GroupProb = matrix(rep(NA,length(Dose)*3),nrow=3)
GroupProb[1,]=c(.18,.25,.45,.66,.74)                 ## These are the true toxicity probabilities
GroupProb[2,]=c(.10,.15,.30,.50,.60)                 ## for each dose and subgroup
GroupProb[3,]=c(.10,.15,.30,.50,.60) +.1

##Next lets fill in the current trial data based on our true toxicity probabilities

for(b in 1:n){
I[b]= rbinom(1,1 , GroupProb[(Groups[b]+1),Doses[b]])




##Doses Tried so far in trial

##This gives the following matrix of doses tried. Only doses 1 and 2 have been tried so far for subgroups 1 and 3.

##Next we fill in our hypermeans for the linear terms
meanslope= 0.8861634
MeanInts = c(-0.5205379, -1.3752529)
MeanSlopes = c(0.1888923, 0.1148791)


Doses=Dose[Doses]                                          ##Standardizes the patient doses

##Fill in Hyperparameter list for MCMC

##When we read this data into the function GetSubTite(), we get the following output:


The entries in [[2]] correspond to what groups should suspend accrual, here we note that none of the three subgroups are recommended for suspension of accrual. The entries in [[1]] show what dose each subgroup should be assigned if the next patient is a member of that subgroup. If the newly arrived patient is in subgroup 1, then they are enrolled at dose 2, otherwise they will be enrolled at dose 3.

Now let’s look at an example of the output of this function at the end of the trial, when all doses have been tried for each subgroup and 90 patients (=max sample size) have been enrolled.


This tells us that the optimal dose for subgroups 1 and 3 is dose 2 and the optimal dose for subgroup 2 is dose 3 based on the simulated data. These optimal doses are correctly chosen for this scenario.

I hope that this tutorial helps explain how to use the package SubTite() to design subgroup specific clinical trials and how to use the function GetSubTite() to actually use the program. If you are interested in conducting a subgroup specific time to event CRM trial (SubTite) please feel free to contact me for help on your trial design at

If you enjoyed this tutorial, check out some of the tutorials for my other packages or one of my data driven blog posts.

Happy coding!

-Andrew G Chapple

Who should win NBA MVP?

By now I’m sure you’ve seen a case for the four big candidates: Russel Westbrook, James Harden, Lebron James and Kawhi Leonard or maybe just a breakdown of who should win. With the voting process ending today at 3PM eastern, I’m going to throw my own hat into the who-should-win sweepstakes.

Even though I reside in Houston, I’m going to try my best to be unbiased here and just look at the numbers.

First I’m going to look at something most analysts haven’t, statistics versus playoff teams. How well do these candidates play against the top competition? I downloaded the data from and looked at the average statistics in those games against playoff teams, each player played at least 37 of these games.


This table should immediately disqualify James Harden based on his negative Real +/- statistics in these games, which measures how the point margin will change over 100 possessions when the player is in the game versus when they aren’t. It incorporates the teams defensive efficiency when that player is on the floor. When Harden was on the floor against playoff teams the average margin was -.3 points less than when he was off the floor. All other MVP candidates have a positive +/- stat against Playoff teams. Obviously since these are means, they are heavily influenced by especially poor performances but the medians tell the same story for James: Harden = 0, Leonard = 1, Lebron =2, Westbrook = 3. Harden’s defensive liability hurts him here. Westbrook had 6 poor games where he had a +/- of > -19.

Now that Harden is out, Lebron also gets the boot after having a losing record 12-15 since the All-Star break. You cannot collapse down the stretch like that.

Now we’re down to Westbrook versus Leonard.

As far as playoff teams, Leonard has a higher win percentage, a higher +/- mean (but not median), a higher average FG % and a lower turnover margin. Westbrook leads Leonard in the three major statistical categories by a pretty large margin in these games.

Obviously we know that Westbrook had the better season in terms of raw statistics throughout the season, averaging a triple double, leading the NBA in scoring and breaking the record for the most triple doubles in a season. It has been mentioned that his teammates helped Westbrook get these records. Leonard is the league’s best two way player on the second best team in the NBA while Westbrook’s Thunder finished 10th overall. But who’s more valuable to their team?

In games against playoff teams, when Westbrook had a +/- that was <0, they won only 5.26 % of those games compared to Leonard’s Spurs winning 31.25 % of their games. When Westbrook played awful and had a +/- < -10, the Thunder did not beat any playoff team (which happened 9 times) where the Spurs won once out of the three times Leonard played this poorly. Looking at the entire season, the Spurs won 34.78% of their games when Leonard had a (-) +/- while the Thunder won 13%.

This indicates that Westbrook’s play is much more important to the Thunder’s success. When he plays poorly, the Thunder win less than when the Spurs when Leonard plays poorly. It’s also worth mentioning that the Spurs won 5/6 games they played without Leonard this year. I’d venture to say that the Spurs are still a playoff team if Leonard is not on that team (you know they’d be better than the Trailblazers) but the Thunder are bottom feeders if Westbrook had left with Durant.

So for me the MVP is Russell Westbrook.

If you enjoyed this post, check out some of my other blogs!

-Andrew G Chapple






Package SCRSELECT: Using ReturnModel()

Here you can find more details about how to use the SCRSELECT package, particularly the main function ReturnModel(). I will post a more advanced tutorial for users that want to manually do SCRSELECT() or DICTAUG() later here.

The link to the package documentation can be found on CRAN here. This code is related to a recent paper: “Bayesian variable selection for a semi-competing  risks model with three hazards”  which is joint work with Dr. Marina Vannucci, Dr. Peter Thall and Dr. Steven Lin.

This function performs Bayesian variable selection on a semi-competing risks model which gives the posterior probabilities of inclusion for each variable in each hazard model and determines the optimal model based on thresholds determined by the DIC statistic. More details on this can be found in the paper, which will appear in print in August.

The package has 4 main functions:

  • SCRSELECT – Performs Stochastic Search Variable Selection (SVSS) on the semi-competing risks model and saves posterior samples to a given location and summaries of the variable selection parameters and MCMC acceptance rates.
  •  DICTAUG – takes posterior means from SCRSELECT and performs a grid search to find the optimal thresholds for each hazard variable inclusion.
  •  ReturnModel- Uses SCRSELECT and DICTAUG to determine the best final model, then this performs the final MCMC to obtain inference.
  •  SCRSELECTRUN – Used in the ReturnModel function

The most useful function here is ReturnModel() as it performs SVSS, finds the optimal final model (i.e. what variables are included) and then returns important summaries of the posterior along with writing the posterior to a desired folder location. I’ll briefly go over the inputs prior to looking at the output and explaining it.

  • ReturnModel(Y1,I1,Y2,I2,X,hyper,inc,c,BSVSS,BDIC,Path)
    • Y1: Vector of effusion or death time.
    • I1: Vector of indicators for a non-terminal event.
    • Y2: Vector of death or censoring times.
    • I2: Vector of death indicators.
    • X: Covariates ~ the last inc columns will not be selected and will be a part of all three hazards for any possible model.
    • Hyper: Large vector of hyperparameters ~ see documentation.
    • Inc: How many covariates should be excluded from selection.
    • c: shrinkage hyperparameter, values between 10-50 are suggested.
    • BSVSS: how many iterations to run the SVSS (suggested = 100,000)
    • BDIC: how many iterations to run the DIC search (suggested = 10,000)
    • Path = where to save posterior samples of final model

I will show what the output from the application in the paper looks like in the program. I used the example code given on CRAN, but ran these chains out for a more reasonable amount of time. I let BSVSS = 10,000 and BDIC  = 1,000 indicating that the variable selection MCMC should be carried out for 10,000 iterations and the DIC-tau_g procedure should be run on 1,000 iterations.  I will set inc=2, meaning that the SVSS will always include the last two variables in each hazard.


First this program runs two different chains through the SVSS procedure and save the relevant results. Progress of each of the two chains is printed every 5000 iterations. After each chain is finished, the important posterior quantities of the combined sample are saved and the marginal posterior probabilities of inclusion are printed for each hazard. Given that this example data is just simulated noise, it’s not surprising here that we don’t see some variables being more important than others. After these values are printed, the DIC-\tau_g proceeds by first searching over (.05,\tau_2,\tau_3) for the optimal DIC. This proceeds by increasing \tau_1 by .05.


Once the grid search finishes, the program outputs the DIC value for the optimal model(s). It prints more than 1 value if two tau vectors produced DIC values that were within 1 of each other. Next the final model variable inclusion is printed where a TRUE indicates that that variable is important in that specific hazard. We can see, for example, that only variable 4 is included in the first hazard.

Next the program takes this final model and performs MCMC to obtain important posterior quantities for inference. The first thing that is printed is the posterior probability that a covariate increases a hazard, which for example, we see that variable 7 (which was not allowed to be selected out) had a posterior probability of a non-terminal event of .73. Likewise, Variable 2 in the hazard of death after a non-terminal event was shown to be prognostic for death (P=.83). The last bit of output from the function is shown below (excuse the output format, I’ll fix that up!):


We are left with the posterior median hazard ratios in the first line of each hazard, followed by the .025 % and .975 % quantiles, respectively. Once again, notice that variables not included in the final model have NA listed in their posterior summaries.

I hope this helps clear up how to use the package! As a note of practical usage, I would suggest that the user pick a value of c and see what those marginal posterior probabilities of inclusion are after the first step. You want to encourage separation so a small value of c that leads to very large marginal posterior probabilities of inclusion will be as unhelpful as large values of c which will lead to very small marginal posterior probabilities of inclusion. Ideally, you want to see some variables with high inclusion probabilities and some with low in each of the three hazards.

I also want to note that while I do not have guides here for the functions SCRSELECT() and DICTAUG(), these are pieces of the ReturnModel() function. An experienced user of ReturnModel() might want to further investigate the posterior distribution of the SVSS procedure, which you would use SCRSELECT() to do as this saves all posterior samples. Then they might save the important quantities for manually using DICTAUG(), however this is not suggested to those who are not already experienced with using ReturnModel().

If you have any questions don’t hesitate to ask! Also if you enjoyed this post, check out some of my other blog posts.

Happy Coding!

-Andrew G Chapple

Audio Slides: Bayesian Variable Selection for a semi-competing risks model with three hazard functions

This is the link for a 5-minute video summary of my latest paper: Bayesian Variable Selection for a semi-competing risks model with three hazard functions.

You can find the link to the paper below (once it’s uploaded, it will be sometime soon).

If you enjoyed this check out some of my other posts!

-Andrew G Chapple

Interesting Statistics Papers: Hidden Markov Models in Tennis

I’ve been taking a computational biology class where we use Hidden Markov Models to predict things like whether a part of a DNA sequence is a promoter, coding region, polya or non-coding (intergenic) region. You can read a little more about HMMs here. They  also have big applications in finance but the paper I’m discussing here is more related to the biological models.

For the DNA the basic idea is for each of the 4 states= promoter, coding region, polya, intergene they can each emit nucleotides = A,T,G,C. So the HMM takes a DNA sequence ( for example x=AACGTAGCC…) and for each letter tries to predict if it’s a promoter, coding region, polya or intergene region.

I’ve been thinking about how these types models can be applied to sports. For example, can we tell if a team is on a hot streak or slump based on game play or game to game play?

This paper: Tracking the evolution of a tennis match using Hidden Markov Models

By Kolonias et al. uses hidden markov models to predict who will receive a tennis point based on events that happen within each round. The big idea is what moves can a tennis player make in terms of returning balls or serving that will increase (decrease) their (their opponents) probability of scoring.

They do this by making 5 major states within their HMM: 1st serve, second serve, ace, rally and point scored where the first 4 each have sub markov models.

The main transitions are determined by the rules of tennis as follows. After a serve (1st or second), you can either fault and serve again, score on the serve (ace), have the serve returned to start a rally, or double fault and get a point scored against you. From the first serve state you can transition to any of the other states – 2nd serve, Ace, Rally, Point while from the second serve you can’t transition back to the first serve. Ace and Rally states are always ended by a point scored.

The real wow factor of this model comes in the sub markov models.

Let’s take a look at the graphic for the Rally sub model shown in the paper:

The ball can rally back and forth until an unforced error is made or a point is forced. Surely this could be complicated by adding direction of the shot or how hard they hit it back.

I’m interested in how this type of model works in predicting whether or not a point is scored in basketball. For example how many passes is too many and how many will lead to a high scoring probability. How effective are ball screens followed by a pass? How much does different amounts of spacing effect scoring in several common plays (give and go, hero ball etc)

These are questions that could be answered using this type of model.

Anyways, I thought this was a neat idea and it was nice to see Hidden Markov Models outside of The human genome.

If you liked this check out some of my other posts!

-Andrew G Chapple

Quick Tutorial for a Reversible Jump MCMC sampler

The following slides are a presentation that I made for a class involving constructing a reversible jump markov chain monte carlo sampler – Which everyone seems to be terrified of for some reason. All we have to do is to be careful about our storage matrix and that it is formatted to keep track of the matrix dimension. An additional variable storage of this dimension is recommended.


















If you would like this presentation for your class please email me!

If you liked this, check out some of my other posts!

-Andrew G. Chapple