Director, Global Epidemiology IQVIA San Francisco, United States
Background: With the digitization of healthcare and the development of digital monitoring technologies, large volumes of data pertaining to the health and wellbeing of populations are being generated every day. Extracting rapid insights from raw data in an actionable manner can be challenging, and there is limited guidance on how to curate multiple sources of data in real-time to a standard suitable for research purposes.
Objectives: Provide a framework for the development of evidence hubs that process multiple raw data sources to yield robust analytical datasets and describe associated best practices to ensure that data are fit-for-purpose.
Methods: Our five-phase framework describes the evidence hub development process from its conception through generation and dissemination of insights, with the following steps. 1) Define the near-term research objectives for all stakeholders and conduct feasibility assessments to understand the data landscape. Alignment of the appropriate data and research needs ensures a fit-for-purpose evidence hub design. 2) Create a management plan that covers ingestion of data through to its harmonization, linkage, and transformation. 3) Apply data quality standards that provide transparency on data provenance. Ensure data linkage accuracy through real-time audits and query generation, to resolve discrepancies in the data. Continuous data quality assessment is important to ensure robust analytical datasets, whilst maintaining enough flexibility to respond to changing requirements, such as integration of additional data sources. 4) Generate insights and disseminate findings for real-time intervention. 5) Hold interdisciplinary discussions to interpret findings and gather feedback to incorporate into the next iteration of the evidence hub for continuous learning and improvement.
Results: This framework was put into practice in 6 case studies that cover a range of therapeutic areas from COVID-19 to injury epidemiology to orthopedics. Our framework has guided the integration of multiple data types, including genomic data, data from wearables and EHRs, monitoring >5,000 individuals with millions of data points. In the COVID-19 Active Research Experience (CARE) registry, validation of data collected directly from patients was enabled by linkage to medical claims and prescriptions datasets. The three COVID-19 evidence hubs successfully generated 6-7 publications each in the short span of 2 years.
Conclusions: Continual curation of new data from multiple sources can present an operational challenge for teams building an evidence hub. Our framework offers guidance that can enable the timely translation of data into evidence and implementable actions and ultimately make an impact on/upon public health.