Opportunities and challenges surrounding the incorporation of laboratory test-result information within high-dimensional confounder adjustment procedures
Background: Increased availability of laboratory test result information in electronic health records offers exciting possibilities surrounding the inclusion of these data for confounder adjustment, e.g., by further characterising the severity of conditions. Data is often available to researchers in two ways; the presence of a test being requested and continuous values representing the results of these tests. However, integration of these data into data-driven confounder adjustment methods (e.g. the high-dimensional propensity score (HDPS)) is challenging. Data quality is a concern; completeness and pre-processing of continuous values before inclusion must be considered. How best to automate variable selection and functional forms is also unclear and not automatically accommodated by the HDPS.
Objectives: To apply and compare methods for integrating test-result data within data-driven confounder adjustment procedures.
Methods: We reanalysed a recent cohort study conducted using primary care records from the UK Clinical Practice Research Datalink, which expected no differences in the short-term chronic obstructive pulmonary disorder (COPD) specific mortality between proton pump inhibitor (PPI) and h2-receptor antagonists (H2RA) users. Analyses estimated hazard ratios (HR) via weighted Cox models, varying the extent of adjustment for test result information in the HDPS. Pre-processed and cleaned continuous values from 35 blood-tests were incorporated, as follows. Firstly, we categorised values and extended the HDPS frequency-assessment to account for biologically plausible cut-offs. We also adjusted for the values continuously, using the missing-indicator approach to account for missing data. Analyses incorporating test data were compared to HDPS analyses incorporating only clinical, referral and prescription information.
Results: We identified 733,885 new users of PPIs and 124,410 new users of H2RAS. The association between PPI prescription and COPD-mortality among PPI and H2RA users after adjustment for the investigator covariates was 1.37 (95% CI:1.14-1.66). The final model incorporating all types of test information obtained results closer to the expected null association (HR 1.24; 95% CI: 1.00-1.54). Furthermore, 46% of the top 500 HDPS covariates selected from the test, referral and prescription domains were derived from test-related data dimensions. We discuss the assumptions under which our approaches are valid and compare alternative approaches for handling missing data and variable selection.
Conclusions: The inclusion of test results in the HDPS has the potential to improve confounder adjustment in UK EHRs. Future work could consider how to determine the functional forms of continuous covariates in these procedures.