Historically, equal patient randomization to two or more treatment arms has been the gold standard in clinical trials. This eliminates potential bias created by patient heterogeneity. However, if one treatment (say treatment A) is believed to be better than another, randomizing patients between a less effective treatment violates ethical considerations. For randomization to be justified, clinical equipoise must be satisfied, that is both treatments must be believed to be equivalent. On the other hand, assigning all patients to treatment A may be better for the patients’ health but will not produce an unbiased comparison of treatments A and B, the data would be observational. Outcome Adaptive Randomization provides a compromise between these two extremes in that patients will be assigned to a superior treatment based on the previous patient outcomes with higher probability. These trials assign patients to the better performing (best performing in multiple arms) treatment group with higher probability as the trial advances. They have become used much more frequently recently, particularly at M.D. Anderson in success stories like the BATTLE trial for patient biomarker heterogeneity and the I-SPY trials for continuously updating the treatment lists.
An example of a simple outcome adapted for two treatment arms is the BAR(c) design described by Thall and Wathen (2007). The trial is for a binary response variable of efficacy with true response probabilities \theta_A and \theta_B for treatments A and B. They assign non-informative beta prior distributions to \theta_A and \theta_B and base the treatment allocation probabilities on
the posterior probability that treatment B is better than treatment, and
, the posterior probability that treatment A is better than treatment B. They suggest to randomize patients to treatment B with probability:
For some c > 0. If we let c=1, this design is based solely on the posterior probability, as introduced by Thompson (1933), which has higher variability. They recommend c=n/2N where n is the sample size at that interim analysis which constrains the probability to be near ½ at smaller fractions of the total patients, which helps keep the treatment allocation from becoming too extreme early on. They performed a simulation study where \theta_B=.4 and \theta_A=.2 to see how the trial allocated patients in this paper and Thall and Wathen (2015). The results are quite concerning, over 14% of the simulations treated at least 20 more patients at the inferior treatment arm (treatment A), but the mean number of patients treated at B were higher than treated at A. Studies of these designs typically list the latter quantity as a reason for their usage, but this ignores the substantial probability that we will actually treat more patients at the inferior arm-totally defeating the purpose of outcome adaptive randomization. Korn and Friedman (2011) were the first to be skeptical about this design, noting that outcome adaptive randomization treats more patients in total and more non-responders to treatments, even though more patients are treated at the better treatment than K:1 fixed randomization.
While there are other outcome adaptive randomization schemes, Yuan and Yin responded to Korn and Friedman (2011) by looking at the maximum percent reduction in non-responding patients for the optimal outcome adaptive model. This removes the specificity of the BAR(c) design and concentrates on all possible outcome adaptive randomization schemes. Their results showed that for the scenario listed above with \theta_A=.2 and \theta_B=.4 that there was a 3-4 % reduction in non-responders, which is a small reduction for the chance of imbalancing patients in favor of the inferior treatment.
So the benefits of outcome adaptive randomization are small, while the potential mishaps are devastating. Any outcome adaptive randomization scheme will shift more patients to the arm that is performing better in terms of response rate throughout the trial. Think of how this comes into play in the following scenario: say there’s a new treatment B that is vastly superior to the standard treatment A. If the first 10 randomized patients to B all have poor outlooks, while the first 10 randomized patients to A all have good health outlooks, we could have a higher response rate in the inferior treatment A (totally due to the patients assigned to each treatment). Then no matter what adaptive randomization scheme is used, the next cohort of patients will be randomized more to treatment A. With less patients going to treatment B, it’s conceivable that treatment B will receive less of a representative sample of the patient population (again) and while the response rate of treatment A will decrease the response of patients in B may stay stagnant. Potentially this game could be played for a number of patient cohorts until the beginning effect is erased.
This is one of the primary reasons most outcome adaptive trials have a long “burn-in” period where we equally randomize patients. For example, the famous BATTLE trial equally randomized 97 patients to the various treatments before the randomization scheme was allowed to be adapted. A long burn-in period erases some of the potential effects of early chance imbalances towards the inferior treatment but this also reduces the potential patient benefit, if for example one treatment is seen to be superior after 50 patients have been enrolled. A burn-in also does not guarantee this effect is erased, as can be seen in a quick simulation study using \theta_A=.2 and \theta_B=.4 that about 2% of simulations have a higher response rate for treatment A after 100 patients have been enrolled (which would randomize more patients to A afterwards).
Finally, it’s worth mentioning some of the ethical challenges [mentioned in Hey and Kimmelman (2015)] of outcome adaptive clinical trials for both patients and clinicians. Patients who enroll in the trial may have the false perception that they will automatically get the better treatment, which is not the case. Furthermore, after one treatment appears to be better there is still a probability that patients will be randomized to the seemingly inferior treatment arm-something that may be devastating to patients if they become aware of. Additionally, clinicians may choose to not enroll patients in the trial once it appears one treatment is superior due to this non-zero probability of getting the worse treatment. Patients are more willing to enroll in outcome adaptive trials because they know that there is a higher chance of “playing the winner”
over time, but they could take advantage of this by waiting to enroll in the trial ‘til the end, where there is an apparent superior treatment. This could introduce substantial bias and irrespective of this potential issue, outcome adaptive randomization tends to bias the treatment effect much more than K:1 fixed randomization.
It was disheartening to read these issues because my initial thoughts were that outcome adaptive randomization made the more sense ethically and statistically than fixed randomization. Many statisticians (over 50% based on some polls!) still believe in their usage and the area of debate is a hot topic, particularly for clinicians who want to protect patients while advancing human knowledge. I see more advantages to this approach in trial designs like the highly controversial BATTLE design, but it’s not clear how well these perform in those cases. Hopefully, these concerns with either be mitigated or reinforced in the coming decade and will eliminate the discrepancies in opinion on the matter.