(B48) Causal forests vs Inverse probability weighting for addressing cluster-level confounding in medical device and surgical epidemiology: A simulation study
Student University of Oxford University of Oxford Oxford, United Kingdom
Background: Causal forest (CF) is a machine learning inspired method previously shown as a promising data-driven option for estimating treatment effects. However, little research has been done on its application and performance in the presence of multilevel confounding, e.g., in the study of procedures or implantable devices.
Objectives: To estimate the accuracy and precision of CF compared to Inverse probability weighting (IPW) for a range of different clustered data structures and cluster-level confounding scenarios.
Methods: Monte Carlo simulations (1,000 iterations) with 10,000 patients were conducted. Five patient-level confounders, one instrumental variable and one risk factor, were generated, with a fixed true treatment effect odds ratio (OR) of 1.5. Additionally, two surgeon-level confounders were generated, one binary and one continuous, with OR = 1.5 for treatment effect on outcome, and OR ranging between 1.01, 1.25, 1.5 and 2.5 for treatment allocation. Furthermore, for each scenario, five different cluster data structures were generated by varying the average ratio of patients per surgeon (ratio 20:1, 50:1, 100:1,500:1 and 1000:1). The performance of three different CF strategies were tested ( i) CF without modification ii) CF allowing the use of cluster label as covariates for splitting iii) CF with propensity scores as the tuning parameter for splitting.) and compared to IPW estimated as the inverse of PS. The PS for IPW and CF were estimated using a random effects model. The average relative bias, 95% CI coverage and empirical standard error (EmpSE) of the treatment effect estimates for all the methods were calculated by comparing them with the true simulated treatment effect.
Results: The accuracy and precision of the treatment estimates for CF were slightly worse than IPW all scenarios apart when the cluster confounding were strong (OR = 2.5). For example, for (ii), the relative bias, model coverage and EmpSE were 11.5%, 84% and 0.055 for OR = 1.5 with cluster ratio 50:1 compared to 19.8%, 95% and 0.0464 for IPW, respectively. However when OR = 2.5, the relative bias, model coverage and EmpSE were the relative bias, model coverage and EmpSE were 11.6%, 84% and 0.049 compared to 19.78%, 65% and 0.0446 for IPW.
Conclusions: In our simulation of clustered data with cluster-level confounding typically seen in surgical or medical device epidemiology, CF performed worse than an optimal and previously tested PS-based IPW strategy in most scenarios except when the cluster-level confounding was strong. More research is needed to study the pitfalls of CF for treatment estimation in surgical epidemiology and medical device safety research.