(136) Specifying and mapping design choices from the protocol to the study script in a complex multi-database study to improve transparency and efficiency: the experience of a demonstration project in IMI-ConcePTION
Head of Unit at ARS Toscana, Florence, Italy ARS Toscana, Florence, Italy Florence, Italy
Background: The IMI-ConcePTION project aims to build an ecosystem to generate Real World Evidence to address the information gap of medication safety in pregnancy. Five Demonstration Projects (DPs) were designed to test the tools developed in the project. The DP3 aims to provide data on prevalence of multiple sclerosis (MS) or systemic lupus erythematosus, and utilisation and safety of medications used by women during pregnancy, using new methods of analysis regarding the synthesis of information from disparate data sources.
Objectives: To illustrate a methodology supporting transparency and efficiency in development of study scripts applied to DP3
Methods: During two in-person meetings, a sequence of intermediate datasets was designed from the ConcePTION common data model (CDM) of choice to the analytic datasets of the first part of the study (algorithm and prevalence of MS). Each intermediate dataset was documented with a) unit of observation (UoO) b) how many observations had to be generated for each UoO c) codebook, including variable names, format, vocabulary and rules for calculation. Based on these specifications, synthetic versions of the analytic datasets were generated before development started.
Results: Three analytic datasets were found to be sufficient to populate the shell tables envisioned in the study protocol. 15 intermediate datasets were designed between the CDM and the analytic datasets. The UoO was a person in the data source (N = 2, 11%), an event recorded in the data source, such as a diagnostic code or a span of observation period (N = 5, 28%), a study subject (N = 5, 28%), a stratum of covariates (N = 6, 33%). Number of observations per UoO were 1 (N = 16, 89%) or multiple (N = 2, 11%). During the specification phase, investigators and programmers could identify cases when the protocol and/or statistical analysis plan (SAP) were underspecified and add clarifications, and specifically in the operationalization of the criteria to define the study population and of components of algorithms retrieving cases of MS. While data managers were developing the script to populate the analytic datasets, investigators could start developing the statistical script to populate the shell tables, using the synthetic analytic datasets.
Conclusions: Specifying intermediate datasets structure (data model) allowed to improve communication in the study team and better specify the study protocol/SAP. This ultimately improved transparency, and allowed a higher efficiency in development.