(070) Application of machine learning methods in risk assessment of fracture associated with antihypertensives in older people with a large number of confounders
PhD student University of Oxford Oxford, United Kingdom
Background: Machine learning (ML) methods are considered as promising alternatives for data-driven propensity score (PS) estimation, through which the effect of exposure/s on treatment outcome/s may be estimated. We assessed the association between the risk of adverse events: fractures and the use of antihypertensives in older patients.
Objectives: Four ML methods LASSO, eXtreme Gradient Boosting (XgBoost ), Multilayer Perceptron (MLP), stacked ensemble method (Super learner), and one reference were applied to estimate the “large scale” PS, hence evaluating the adverse drug effect of antihypertensive therapies on negative control outcomes.
Methods: The study cohort contained patients aged >= 65, with at least 1 year of registration and not exposed to antihypertensives in the year before the study start. We identified fracture as the outcome event. We considered 637 covariates (mapped to the Observational Medical Outcomes Partnership Common Data Model) as potential confounders and estimated PS using a reference method (i.e. with a pre-selected subset of the covariates considered as confounders) and four ML methods, including cross-validated hyperparameter tuning. Matching was applied using each of the estimated PS to obtain effect estimation. Analysis of a set of negative control outcomes (70) was conducted to provide an estimate of residual bias after matching. Matching is evaluated by average standardised mean difference (ASMD). The performance of different models in treatment effect estimation was evaluated through coverage of the null effect for the negative control outcomes.
Results: All covariates’ ASMD is less than 0.1 after matching on PS estimated by XgBoost and LASSO. The average ASMD of all covariates for Reference, LASSO, MLP, XgBoost and SuperLearner are respectively 0.0394, 0.0167, 0.0480, 0.0150 and 0.0168. The coverage of null effect is the highest when using data matched on SuperLearner and XgBoost PS, with 65.2% and 63.8% coverage. Reference method and LASSO have 62.3% and 57.1% coverage, and MLP has the lowest coverage 53.3%..
Conclusions: XgBoost and SuperLearner estimated PS outperformed the reference approach using pre-selected covariates by experts with higher negative control outcome coverages. All methods tested indicating there is residual confounding in data.