# Two-stage Adaptive Sample Size Decisions

Recently I went to ENAR in Austin and saw a very interesting tutorial by Franz Konig called “Adaptive Designs for Confirmatory Clinical Trials”. Since adaptive designs are one of the hot topics in clinical trials, the session was packed. He discussed fully adaptive designs, where you can change the trial entirely at some interim analysis. This generalized the usual interim decision rules like dropping or adding a treatment, switching up the allocation ratio and changing the maximum sample size. Essentially, it allows you to observe as many patients as you want before changing the trial, in however you want, without losing statistical power. Needless to say, the room was skeptical about this-as are the FDA who haven’t approved one of these designs in the U.S.. He specifically discussed the interim decision of increasing or decreasing sample size for the second cohort if the first cohort didn’t end the trial. This could allow trials where you observe a small number of patients, say 20, and then decide if it’s worth to enroll many more patients to resemble a phase III trial, to enroll a few more patients to resemble a Phase II trial or end the trial entirely. This can be extremely efficient for drug companies who have many molecules/compounds waiting in the wings for clinical trials-allowing them to play the molecules that appear to be winners based on a smaller cohort.

This presentation discussed two methods for a single interim analysis, each of which stops the trial and rejects H_0 at the first interim analysis if p < \alpha_1 where p is the p-value under H_0 of the data and ends the trial due to futility if p> \alpha_2.  The first method is the combination test which computes a p-value for a test at both interim analyses, denoted p and q and rejects H_0 at the second analysis if C(p,q) \leq c for some combination function C(). A common choice is C(p,q)=pq or a inverse normal weighting combination function. c is specified in advance based on first test decision boundaries for rejecting H_0. \alpha_1 and accepting H_0, \alpha_2 and the desired overall type I error of the test. Formally, we set c such that \alpha_1 + P(\alpha_1 < p < \alpha_2, c(p,q) \leq c | H_0) = \alpha which gives us a level \alpha test. For example, this produces c= (\alpha-\alpha_1)/(ln(\alpha_0)-ln(\alpha_1)) for the product test.

As a numerical example, say we let \alpha=.05 to control the overall type I error and we set \alpha_1=.02 and \alpha_2=.98. Then we will reject H_0 at the second analysis if pq \leq c= 0.007708475 (I know steep!) and this trial will be done overall at level \alpha. If we observe p=.01 after n_1 patients have been observed, we stop the trial and reject our null hypothesis. If we observe p=.99, we stop the trial due to futility, that is there is not enough evidence against H_0 to warrant continuing the trial. If we observe p=.1, we enroll n_2 more patients in the trial, where we can pick n_2 however we want at the second interim. Then say we observe q=.01, then we have C(p,q)=pq=.001<.007 so we reject H_0. Maybe you pick n_2 to be especially large if you are very confident the trial will lead to treatment improvements based on the first n_1 patients. The flexibility of this trial make you wonder about it’s properties and validity as a statistical design, as many in the room were skeptical. I’m certainly convinced about the overall type I error since the type I error for the first test is \alpha_1 and the type I error in the second combined test depends on the combination function we choose, critical value and the conditional event that we continue from the first trial until the second. Because we can specify c to be whatever we like (along with \alpha_1 and \alpha_2), if we have a good combination function, we can always construct a overall \alpha test. However, my skepticism comes in with the non-restrictiveness of changing the sample size and even the treatment allocation ratio at the interim analysis. This adaptive two-stage design reminds me of the Simon optimal design (1989) which leaves a little bit of a bad taste in my mouth, it just seems too simple for the complexity of the testing structure.

This talk only gave this brief example so I wonder how other adaptive decisions like dropping/adding treatments or multiple interim decisions could be implemented in this setting (I’ll have to read up!) because in usual Bayesian multi-arm adaptive trials the sample sizes at each interim analysis are set in advance by either information criteria or fixed. It will be interesting to see if these types of test gain prevalence if they can improve the methodologies or show that it produces optimum operating characteristics. I might take a look into their operating characteristics via a few simulations myself and follow up this post with the results. These methods have gained popularity in Europe over the past 20 years and now many different trials are implemented using this design.

-A.G.C.