Background: In external control arm (ECA) analyses of single-arm clinical trials, real-world controls may have multiple eligible lines of treatment (LOT) at which the study can be indexed. In these situations, index dates can be selected randomly, at the first or last LOT, or at all eligible LOTs. Confounding and selection bias may arise depending on the index selection method used. There is limited evidence on which index selection method is least biased and how different data generating processes, such as different baseline hazards, effect sizes, or associations between the length of LOTs and the time to the endpoint can affect bias in these approaches.
Objectives: To assess the impact of index selection methods on the estimation of log hazard ratios when LOTs are associated with overall survival (OS) using simulation.
Methods: We simulated line and OS data informed by an ECA study of a single-arm trial in multiple myeloma. Data were simulated from a shared frailty model that jointly models LOT transitions and OS. The ECA was constructed by selecting either the first, last, a random or all eligible LOTs. We varied alpha which links the intensity of LOT transitions to OS from -1 to 1. When alpha is negative, the baseline OS hazard decreases with increasing frailty, while the opposite is true for positive alpha. Performance of the log hazard ratio (HR) was assessed by bias. For each index date method, the log HR was estimated by applying an inverse odds weight obtained from a propensity score model to a Cox proportional hazards model. A second simulation using large samples was conducted to explore the sensitivity of results to sample size.
Results: A total of 300 simulated patients were included for analysis in a 2:1 ECA-to-trial ratio. Results showed that the all and first index methods performed best for each alpha. For these methods, no to low bias was observed when the treatment effect was null and alpha was negative (-0.023; -0.016), equal to 0 (-0.020; -0.004) or positive (-0.012; 0.031). Substantial bias was observed with a large non-null treatment effect, but this bias was limited or nonexistent in large samples (n=3000). The last and random methods had substantial bias in all scenarios.
Conclusions: In our simulation of a single-arm trial with ECA, a negative (-1) or positive alpha (1) led to low bias under a null treatment effect for first and all methods. Bias increased when a treatment effect was present, and bias always favored a stronger treatment effect. The all or first index methods are recommended. The random or last index methods cannot be recommended as they lead to substantial bias under the null.