DPhil student Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, United Kingdom
Background: Real-world studies with routinely collected health data are usually conducted using bespoke programming, where analytic codes were built from the ground up to serve a specific study. While the development of individual databases is simpler, study reproducibility and transparency were limited. Off-the-shelf software, on the other hand, is a ready-to-use tool that allows the user to specify several parameters and run the analysis accordingly. In a previous study, we estimated the incidence rates for COVID-19 vaccine adverse events of special interest (AESIs) across multiple databases that mapped to the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) using bespoke code. (DOI:10.1136/bmj.n1435) Within the DARWIN-EU®, we developed a well-tested and user-friendly R package, named "IncidencePrevalence", (darwin-eu.github.io/IncidencePrevalence) to compute incidence rates in data mapped to the OMOP CDM.
Objectives: To emulate and compare the incidence rate estimation of AESIs from "IncidencePrevalence" to the estimation from the previous study.
Methods: With the "IncidencePrevalence" package, we specified the same settings as described in the previous paper, including age group, study period, prior observation period, and washout window. We included 10 events: deep vein thrombosis, pulmonary embolism, anaphylaxis, Bell’s palsy, myocarditis or pericarditis, narcolepsy, appendicitis, immune thrombocytopenia, disseminated intravascular coagulation, and transverse myelitis. Age-sex specific incidence rates of the AESIs was calculated using the same database (CPRD GOLD, primary care records from the UK). The age-sex specific rates calculated by the new package were then compared to the published study and quantified using the Pearson's correlation coefficient and the intra-class correlation coefficient (ICC).
Results: The Pearson’s correlation coefficient for all age-sex specific estimates was 0.9999, and the ICC was 1 across all AESIs. We calculated the coefficients for each outcome as well, and the Pearson’s correlation coefficient ranged from 0.724 for transverse myelitis to 1 for pulmonary embolism, with a median of 0.988. For the ICC, the lowest value of 0.648 was observed for transverse myelitis and the highest of 1 for pulmonary embolism, with a median of 0.974.
Conclusions: The estimated incidence rates of AESIs using the "IncidencePrevalence" package were highly concordant with the published study, with less than one hundred lines of code needed. This R package was designed with deep involvement with epidemiologists, the functions of which are easily understandable, and enables standardised and reproducible analyses of large-scale datasets while providing flexibility to support different design choices.