Background: We developed an iterative causal forest (iCF) method to identify subgroups with heterogeneous treatment effects using predefined covariates. However, such predefined covariates may lack granularity or miss important features, leading to inaccurate subgrouping.
Objectives: To develop a new subgrouping algorithm, hdiCF, using high-dimensional (HD) covariates in claims data.
Methods: The hdiCF algorithm has 2 steps: 1) identify variables by ICD-10 (3 digits), CPT (5 digits), and ATC (4th level) codes for top (e.g. 100) most prevalent claim codes in multiple dimensions (in/outpatient diagnosis, procedures, prescriptions) in 1-year baseline period, then create ordinal variables by frequency (zero, at least once, >=median number of times, >=75th percentile number of times) similar to HD propensity score (hdPS); 2) identify subgroups by iCF using selected important variables requiring > 95th percentile of variable importance values. We implemented hdiCF in an active comparator, new user study of 8075 initiators of Sodium-glucose Cotransporter-2 inhibitors (SGLT2i) and 7313 Glucagon-like Peptide-1 Receptor Agonists using 20% random sample of fee-for-service Medicare beneficiaries (2015-2019) with parts A, B, and D for ≥ 1 year and without end stage renal disease or chronic kidney disease stage 4 or 5. We ran hdiCF (1000 trees, 500 iterations) to identify subgroups. In each group, we computed the PS using 80 predefined covariates including demographic, comorbidities, and comedications, then assessed conditional average treatment effects (CATEs) by the adjusted risk difference (aRD) for hospitalized heart failure (HHF) using inverse probability treatment weight in initial treatment analysis for 2-year follow-up.
Results: The aRD% in overall population was 0.4 (1.1 to 0.2) (crude risk 3.4% vs 5.1%). The hdiCF selected 30 from 500 HD covariates to run iCF (homogeneity P-value = 0.06) and identified subgroups defined by Brain Natriuretic Peptide (BNP) test (CPT 83880), diuretics (ATC C03C), and other anemias (ICD-10 D64). BNP test, often used for HF diagnosis, is a proxy for heart failure (HF). The largest CATE (aRD% of 3.0, 5.3 to 0.8) was observed in the subpopulation without BNP test and on diuretics (94% on loop diuretics identified by predefined covariate). This is consistent with findings by iCF which identifies the subpopulation with edema (a proxy for HF) with the largest CATE.
Conclusions: Our finding is consistent with previous studies and shows our hdiCF algorithm can successfully identify heterogeneous subgroups in claim data. Further studies are needed to test a higher granularity of codes for variable identification (e.g, 5th level of ATC code may identify loop diuretics, a key treatment for HF).