Triple challenges – Small sample sizes in both exposure and control groups when scanning rare maternal outcomes in signal identification: A simulation study
Background: TreeScan™ is a signal identification approach that scans thousands of outcomes simultaneously while adjusting for multiple hypothesis testing to identify potential safety signals. The Poisson model used in TreeScan assumes that expected incidence proportions are known without error.
Objectives: Our simulation study aimed to evaluate how expected incidence proportions estimated from small control groups may affect TreeScan’s ability to identify signals for maternal adverse outcomes.
Methods: We used the Merative™ MarketScan Research Database (2015-2020) to identify livebirths and generate incidence proportions of maternal complications in the ICD-10-CM Chapter 15 (Pregnancy, childbirth, and the puerperium, O00-O9A) (assuming those proportions were known without error). We drew samples for control groups (1000, 2500, 5000, 10000, 50000) and calculated incidence proportions reflecting uncertainty due to sampling error. We also varied other parameters that might impact statistical power: exposure group sample size (1000, 2500, 5000, 10000), outcomes with selected incidence proportions (0.02, 0.01, 0.005) to do investigator-injected risk, and magnitude of injected risk (RR=1, 1.5, 2.0). We conducted unconditional Poisson TreeScan analyses for all combinations of these parameters and set the alerts threshold at alpha=0.05. We also conducted base case analyses using the expected incidence proportions without error.
Results: In simulated elevated relative risk scenarios, the statistical power and observed relative risk were more inaccurate than the base case when there was a higher exposure to control group ratio, a very rare outcome, or a smaller magnitude of injected risk. Under the null hypothesis scenarios, a higher exposure to control group ratio created a greater number of false alerts and inflated type I error. We found that the control sample size should be at least equal to that of the exposed using Poisson-based TreeScan. Finally, small control groups resulted in more nodes with observed zero counts which all occurred among outcomes with expected incidence proportions ≤0.005. The zero count nodes could not be evaluated in TreeScan.
Conclusions: We are commonly faced with low sample size among both exposure and control groups and rare outcomes in pregnancy research even if using large automated healthcare databases. Our simulation indicated that the control size should not be smaller than the exposure size, potentially maintaining by choosing a control group with an appropriate sample size or trimming the exposure group, to screen for rare maternal outcomes by Poisson-based TreeScan.