Background: Early detection of symptoms predictive of a pandemic such as COVID-19 is critical in identification of infection, guide policies to contain the pandemic, and reduce public health burden. Recently, International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes have been used to estimate incidence of post-acute sequalae of COVID (PASC) in electronic health records (EHR) and claims datasets in the US. The accuracy of ICD-10-CM codes for identifying symptoms, however, is unclear.
Objectives: Quantify the accuracy of COVID-related ICD-10-CM symptom codes in EHRs.
Methods: We identified a set of 16 COVID and PASC-related symptoms that were reported in the literature with available validation data. For each symptom, we constructed a concept set of ICD-10-CM diagnosis codes in the symptom chapter and related disease codes. The concept sets were evaluated against an annotated registry of consecutive Massachusetts General Hospital COVID-19 hospital admissions during the early pandemic between March and July 2020. Clinical reviewers conducted a review of charts from the emergency room and hospitalization at presentation to care (PTC) and documented the presence of symptoms. We extracted ICD-10-CM symptom codes within 2 days of the PTC date from the research patient data registry (RPDR) to compare against the manually annotated symptoms. For each symptom, we computed the specificity (Spec), sensitivity (Sens), and positive predictive value (PPV).
Results: The validation population of 1,248 patients had a mean age of 60 years (IQR 46-73), were 58% male, and 38% non-Hispanic white. The most common symptoms at PTC were fever (66%), cough (65%) and dyspnea (55%). The least common symptoms were dysgeusia (4.1%), rhinorrhea (8.8%) and altered mental status (9.9%). Specificity and sensitivity of ICD-10-CM codes varied significantly by symptom type. Congestion (Spec: 0.99, Sens: 0.03, PPV: 0.50), sore throat (Spec: 0,98, Sens: 0.17, PPV: 0.64) and fatigue (Spec: 0.95, Sens: 0.16, PPV: 0.58) had high specificity but low sensitivity. Conversely, dyspnea (Spec: 0.63, Sens: 0.64, PPV: 0.74), fever (Spec: 0.83, Sens: 0.49, PPV: 0.85), and cough (Spec: 0.80, Sens: 0.49, PPV: 0.82) exhibited lower specificity but higher sensitivity. Across all 16 symptoms, ICD-10-CM codes tended to exhibit low sensitivity with variable specificity and PPV.
Conclusions: Our results demonstrate low sensitivity of ICD-10-CM COVID-related symptom codes in EHRs. Under coding of ICD-10-CM codes may bias symptom incidence in COVID and PASC observational studies. Researchers should consider enhancing symptom classification with computable phenotypes that include vital signs, medications, patient self-report or natural language processing of clinical notes.