Background: Planning an external control arm (ECA) for a treated arm in a clinical trial may require determining a minimum size of the ECA to have adequate statistical power in comparative analyses. Sample size calculation in this case may be constrained if the size of the treated arm is already fixed. Furthermore, comparative analyses must account for confounding with methods like propensity score (PS) weighting, which can lead to loss of statistical information (i.e., power and precision).
Objectives: We describe a simulation-aided approach to determine sample size for an ECA for a fixed treated group size that accounts for the impact of confounding adjustment with PS (specifically, average treatment effect among the treated (ATT)) weighting.
Methods: A fixed treated group size imposes a limit on the minimum detectable effect size (MDES) from comparative analyses. A first step is to determine this threshold and assess whether the effect size targeted by the study is detectable. If it is, the number of control patients needed for a balanced comparison (n[b]) is derived. PS-weighting will lead to information loss, however, which can be quantified as the effective sample size (ESS). To approximate the ESS, information on the expected distribution of confounders between the two sources can be used to simulate virtual patient profiles from statistical distributions. PS analysis can then be carried out on the simulated profiles to estimate ESS. The target sample size for the ECA is then given by inflating n[b] by a factor of n[b]/ESS. A proof of concept is provided using simulation of a simplified scenario.
Results: We considered a comparison for a treated arm of 200 patients with an expected event rate of 0.5. The MDES in a balanced comparison with this group is an odds ratio of 0.67. We set the target effect size of 0.6 for illustration, which requires 317 controls for sufficient power (80%) in a balanced comparison. Two confounders were generated from normal distributions with means differing by one standard deviation in the treated and control groups. Data sets including confounder values and outcome status were simulated and a PS model was fitted in each replication to obtain ATT weights. Re-weighting the ECA patients reduced the original n of 317 to an average ESS of 78 (or 25% of the original n) and 47% power in tests of significance of the PS-adjusted OR estimate. To counter the loss of information, the original sample size was increased in each replication by ratio of starting n[b]/ESS (e.g., 317/78 = 4.06), which yielded an average ESS of 322 and power of 75%.
Conclusions: The proposed approach can provide useful insights to guide sample size planning for ECAs when background data are available to inform simulations. Uncertainty in assumptions should be factored in when interpreting obtained results.