European Medicines Agency HS Amsterdam, Netherlands
Background: The delivery of valid and reliable Real-World Evidence (RWE) in a timely and transparent fashion will be key for the further adaption of Real-World Data (RWD) to support regulatory decision making. For most use cases, pooling of RWD sources are considered useful to compare results across diverse populations, healthcare settings, or to increase the statistical power through a larger sample size.
Objectives: The objective of this review was to provide an easy accessible checklist to understand data source characteristics that might support analysts in their comprehension and decision making on the methodologically validity of pooling RWD sources.
Methods: An iterative process was used to perform this review. In a first stage, we identified, through literature review and expert consultation, domains of heterogeneity that might arise when pooling RWD sources. Secondly, for each of these domains we prepared a checklist further identifying data source characteristics. Finally, above findings were piloted and revisited on available internal electronic health databases and use cases.
Results: We considered three domains; heterogeneity of data sources that might arise as a result in differences in the design and operation of healthcare settings and collection of data; heterogeneity in health outcomes that might arise as a result of true differences in exposure or vulnerability between different populations; heterogeneity in data quality that might arise as a result of either missing, erroneous data, or preponderance of data from a limited sources of data sources. Further identifying a list of data source characteristics, heterogeneity in healthcare systems and practices could be triggered by geographical coverage, temporal coverage, provenance of data source, source population, case definition, case ascertainment, timelines of information flows, legal status of reporting or population attrition. Heterogeneity in health outcomes could be related to intervention effectiveness, population characteristics, time periods (i.e., seasonality) or environment. Heterogeneity in data quality could be attributed to missing time periods, missing covariate data, underreporting of cases or missing data from a total population of interest.
Conclusions: Understanding differences among RWD sources and preconditions for pooling would facilitate the selection of the most suitable methodological approach and the communication around the obtained estimates. The checklist can also facilitate the understanding of the outcomes of a meta-analysis and the variability in the outcomes between data sources.