Background: The demand for high-quality and fast-paced real-world evidence (RWE) studies for regulatory purposes is well known. The RECORD-PE and REPEAT-IT initiatives advocate for more transparency in documenting RWE studies. Analogue descriptions of the desired outputs within the Statistical Analysis Plans (SAP) must be extracted and translated into an analytical pipeline (AP), which can be a slow and expensive process and requires communication between the epidemiologist and programmer.
Objectives: Optimise transparency and implementation of the SAP created by epidemiologists (EP) using metadata created by scientific programmers (SP) to implement federated data transformation and analysis.
Methods: We developed and applied in several studies the RWE-BRIDGE; the methodology captures the EP's knowledge to create study-specific variables in a machine-readable format, standardise AP development, reduce the case-specific hard coding, and prevent information duplication. The methodology requires completing a series of core metadata documents based on the SAP and the characteristics of the data source. The central element of the methodology is the Study Variables Table, which defines a list of unique study variable names and their respective role (outcome, covariate, exposure, …) whose components are selected from the data using Codelists and Data Access Partner Specific Concept Map (DAP-SCM). Codelists relate study variables to diagnostics/prescription codes from different coding systems (e.g. ICD10,..), and DAP-SCM relate them to specific combinations of categorical variables in the origin data. Other additional documents are the Study Variables Algorithms Table (SVAT) - which logs the rules to derive new study variables from multiple others already collected, and the Dictionary - which standardises the categorical study variables across different DAPs. Furthermore, generic programming functionalities using the RWE-BRIDGE were developed in R language and used across studies.
Results: Across several studies, the RWE-BRIDGE supported the automatic generation of 679 study variables. Codelists defined 65.05%, DAP-SCM 11.69%, SVAT 15.99%, of the study variable, whereas 7.27% required DAP-specific definitions within the AP.
Conclusions: The RWE-Bridge methodology has achieved the expected objectives. It standardized and facilitated communication between epidemiologists and scientific programmers and enabled the analytical pipeline's optimisation while being successfully implemented in several studies. Moreover, the RWE-BRIDGE has provided transparency on the process by externalizing the metadata files, allowing for reproducibility and reusability.