Graduate Student Department of Pharmaceutical Outcomes and Policy, University of Florida, Gainesville, United States
Background: TreeScan is a statistical data mining tool to identify drug exposures preceding specific adverse outcomes. However, the decision of which tree levels to include in analysis for hypothesis testing is not straightforward, and the trade-off between the gain in hypothesis testing within more aggregated tree levels at the possible expense of missing an alert at lower levels is poorly understood.
Objectives: To compare TreeScan alerts of cardiac malformations at different levels of clustering of prenatal drug exposure and different levels of hypothesis testing.
Methods: We used 2005-2020 Marketscan data to identify a cohort of linked mother-infant pairs, among mothers aged 12-55 with continuous enrollment 90 days before conception through 90 days after live birth. Using a case-control design where mother-infant pairs with cardiac malformations (cases) were matched to live-birth outcomes without cardiac malformations (controls) on length of gestation, maternal age, and birth year using 1:3 nearest neighbor matching. We scanned drug exposures during pregnancy using a hierarchical Anatomical Therapeutic Chemical (ATC) code drug tree. We conducted 3 separate analyses to identify “alerts” setting the incidence level to the ATC 2nd level (therapeutic subgroup; e.g., diuretics), ATC 3rd level (drug class; e.g., high-ceiling diuretics), and ATC 4th level (chemical subgroup; e.g. sulfonamides), respectively. That is, hypothesis testing was not conducted at more aggregated levels than each pre-specified incidence level, and only the first drug dispensing within a given incidence level was counted. The threshold for alerts was set to p≤0.05.
Results: We identified 93,789 mother-infant pairs with a fetal cardiac malformation and 275,747 matched controls. We found 61, 71, and 85 alerts out of 1,093 nodes at the ATC 5th level (e.g., drug) when incidence was defined at the ATC 2nd, 3rd, and 4th level, respectively. Similarly, we identified 58, 68, and 72 alerts out of 685 nodes at the ATC 4th (e.g., chemical subgroup) level, respectively. Alerts included drugs with known teratogenic risk for fetal cardiac malformations. The ATC 3rd and 4th level incidence analyses identified 11 drugs with known teratogenic risk, but 4 alerts did not meet statistical significance in the ATC 2nd level incidence analysis.
Conclusions: TreeScan successfully produced alerts for prenatal teratogenic drugs with known risk for fetal cardiac malformations. Our results suggest that defining a pre-specified incidence level at the ATC 3rd or 4th level may be most appropriate to balance the gain of hypothesis testing within more aggregated tree levels without missing alerts at less aggregated levels.