Research Specialist Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital Chelmsford, United States
Background Understanding how electronic health record (EHR) data-continuity affects misclassification of patient characteristics in different populations is critical to appropriately evaluate the validity of study findings. Prior work has demonstrated that such misclassification can be mitigated by restricting EHR-based analyses to subjects with high predicted EHR-continuity on the basis of a simple algorithm. Objective To compare EHR-continuity in populations covered by Medicare, Medicaid, or commercial insurance using a previously developed EHR-continuity algorithm. Methods This study used claims linked EHRs from a multi-center care delivery network in Massachusetts, US, including Medicare claims data from 2007 to 2017; Medicaid claims data from 2000 to 2014; and TriNetX claims linked EHR data from 11 healthcare organizations across the US from 2010 to 2022. Patients were ≥65 years (Medicare) or ≥18 years (Medicaid and TriNetX) and had ≥365 days of continuous enrollment in insurance plans overlapping with an EHR encounter. Three continuity metrics based on prior literature were assessed: (1) EHR-continuity quantified by the proportion of encounters captured by the EHR system (capture proportion, CP); (2) area under curve (AUC) of the previously validated prediction model to identify high EHR-continuity patients (CP>0.6); (3) misclassification of 40 patient characteristics commonly used in comparative effectiveness research (CER), quantified by average standardized absolute mean difference (ASAMD). Results There were 319,740 patients in the Medicare (mean age=74 years, F= 59.2%, Black=2.9%); 95,113 patients in the Medicaid (mean age=39.2 years, F= 65.7%, Black =15.9%); and 1,319,218 patients in the TriNetX (mean age=44.7 years, F= 58.4%, Black =23.4%) cohorts. Mean CP was 0.30, 0.18 and 0.19 and AUC of the prediction model to identify high EHR-continuity patients was 0.92, 0.89 and 0.77 in the Medicare, Medicaid and TriNetX cohorts, respectively. Restricting to patients with predicted EHR-continuity percentile of top 20%, 50%, and 50% in the Medicare, Medicaid, and TriNetX cohorts reduced misclassification by 59-73% and resulted in cohorts with satisfactory levels of misclassification (ASAMD <0.1). Conclusions EHR-continuity varies substantially in Medicare vs. Medicaid and TriNetX populations. Using a prediction model to identify those with high EHR-continuity can significantly reduce misclassification of key variables when researchers only have access to EHR data. However, the cut-offs to achieve such goal vary by populations.