DPhil student Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, United Kingdom
Background: During the COVID-19 pandemic, the UK Health Security Agency (UKHSA) maintained a COVID-19 dashboard (coronavirus.data.gov.uk) that provided an up-to-date authoritative summary of COVID-19 information. Cases were defined by a positive test, with test type information available. In parallel, real-world data (RWD), including electronic health records, was used to research many aspects of the pandemic, including vaccine effectiveness. Although researchers assumed RWD underestimated COVID-19 rates, there is a need to quantify such incompleteness.
Objectives: To compare and quantify the agreement between COVID-19 incidence rates estimated from UK RWD and official UKHSA figures.
Methods: We used the Clinical Practice Research Datalink Aurum dataset, which contains primary care records from England. The study period was from December 2020 to April 2021. The event of interest is incident COVID-19, defined as having a positive test or a recorded diagnosis of COVID-19 with 42 days clean window. Monthly incidence rates and 95% confidence intervals were estimated overall and stratified by age and sex. UKHSA official rates were obtained from the COVID-19 data dashboard and the age-sex-specific denominators from the UK Office for National Statistics. We used Pearson’s correlation coefficients to study the correlation between RWD and UKHSA rates. Intra-class correlation coefficients (ICC) were also calculated to examine agreement. In addition, we reported the percentage of age-sex-specific rates whose confidence intervals overlapped. All analyses were conducted in R.
Results: The Pearson’s correlation coefficient comparing UKHSA vs RWD-based overall monthly rates was 0.987 (95% confidence interval, 0.963 to 0.995) and ICC was 0.957 (0.883 to 0.985). For age-sex stratified rates, Pearson’s correlation coefficient ranged from 0.906 to 0.995, with a median of 0.978; ICC ranged from 0.866 to 0.979, with a median of 0.945. The rates from the dashboard were in general higher than RWD-based ones, especially during the last quarter of 2020. 25.4% of the age-sex-specific estimates’ 99% confidence interval overlapped.
Conclusions: Overall, the Pearson’s and intra-class correlation coefficients were indicative of high agreement between results from the electronic health records and the public data. Real-world data can provide reliable information in capturing COVID-19 cases and estimating incidence rates in related study.