PhD student University of Florence, Florence, Italy Firenze, Italy
Background: The IMI-ConcePTION project aims to build an ecosystem to generate Real World Evidence to address the information gap of medication safety in pregnancy. In such studies, the actual value of the outcome of interest is measured with error through algorithms. Usually, it is recommended to choose a very specific algorithm, and study results rely on the assumption that sensitivity (SE) is non differential across exposure strata. Estimating SE across exposure strata is usually impossible hence non-differentiality is typically not tested. In safety studies, when the outcome is a suspected adverse effect of exposure, it is realistic to expect SE to be differential, and this may severely bias measures of association, in either direction.
Objectives: To introduce a statistical test for differential sensitivity across exposure groups.
Methods: The test is based on the PPV of two algorithms: the algorithm of interest A and an auxiliary algorithm B such that their union (A OR B) can be assumed to be non-differential. This assumption is realistic when B is highly sensitive. The test statistic is based on the observed occurrence (p) of A and B, and on PPVs obtained by validating cases in both exposure strata. We simulated a sample size (SS) for validation of 200, 400 and 600 cases and calculated the power of the test under multiple scenarios for p (.01, .05, .1), relative risk (RR =1.2, 2) and sensitivity ratios (SR = .6, .8, 1.2, 1.4). In all scenarios, the prevalence of exposure was 5%, SE of A OR B was 80% in both exposure groups, SE of A in the unexposed was 50%, and false positives rate of A and B were p/10 and p, respectively. PPVs ranged from 78% to 95% for A, and from 11% to 33% for B, depending on the scenario. The simulation is available on GitHub.
Results: The power of the test mainly depends on the SR and the SS. Specifically for SS = 200 or 400, the test power was >80% for extreme values of SR (.6 and 1.4), irrespective of RR and p. However, for SS = 600, the test power was > 80% even for values of SR closer to 1, namely.8 and 1.2, but only if RR=2 or p≥.05.
Conclusions: Validation of 600 cases allowed to detect with reasonable confidence a relatively small departure from non-differentiality, while larger departures could be detected with smaller samples. To increase the validity of the results of a study based on a highly specific algorithm, it is possible to execute a test for non-differentiality of its SE, exploiting a validation of a sample including cases detected by a highly sensitive algorithm.