Head of Unit at ARS Toscana, Florence, Italy ARS Toscana, Florence, Italy Florence, Italy
Background: Tools to improve transparency in reporting study design and variable definitions have been shared in the scientific community (e.g., STaRT-RWE). Tools to improve transparency on implementation in the study script are less common. Complex study protocols for multidatabase distributed studies require study-tailored scripts, but timeliness is needed to support regulatory decisions, so efficiency is required. VAC4EU is an international association of institutions in Europe that supports robust and timely evidence generation on the effects of vaccines.
Objectives: To illustrate a methodology supporting transparent documentation of programming implementation in study scripts
Methods: We report on the application of the methodology in a study on safety of COVID-19 vaccines funded by the European Medicines Agency (ROC 20-readiness). A sequence of intermediate datasets (IDs) were designed to go from the common data model (CDM) to the final tables with aggregated results to be shared centrally. Each ID was documented with a) unit of observation (UoO) b) number of observations for each UoO (NxUoO), classified as 1, >=1, or >=0 c) codebook, including variable names, format, vocabulary and rules for calculation. A direct acyclic graph (DAG) was drawn representing the program tree: steps were represented as circles, datasets as boxes. Circles had incoming arrows from input datasets, and outcoming arrows to output dataset(s). Based on specifications, synthetic versions of some IDs were generated before development started. Scientific programmers (SP) and statisticians (STAT) started programming in parallel from multiple points of the DAG using synthetic IDs. When all the steps were ready, the program was released to data partners (DPs) for local execution.
Results: 231 IDs were designed. Most of them (225, 97%) were selections from the instance of the CDM, based on lists of codes or strings: UoO was the original record. Out of the other 25 IDs, UoO was a person for 13 (52%), an event for 5 (20%), and a stratum of categorical variables for 7 (28%). In IDs where UoO was an event or a stratum, NxUoO was 1. Among the13 IDs having a person as UoO, NxUoO was 1 for 7 (54%), >= 0 for 4 (31%), >= 1 for 2 (15%). During the specification phase, investigators and SP/STAT could identify cases when the protocol was underspecified and add clarifications, possibly with support of DP. During execution, bugs were tracked back to steps, and DP could access locally generated ID and support SP/STAT in debugging.
Conclusions: We introduced and tested a tool to improve transparency, and allow a higher efficiency, in development and test of study scripts.