associate director RWE Kite Santa Monica, United States
Background: Post-acute sequelae of SARS-CoV-2 infection (Long COVID) involves a wide range of heterogeneous symptoms without a straightforward definition. It is imperative to understand Long COVID and to identify the risk factors of developing Long COVID.
Objectives: To identify predictors associated with Long COVID and to predict risk of developing Long COVID at 3 months, 6 months, and 12 months since SARS-CoV-2 infection at the patient-level.
Methods: This was a retrospective cohort study using US Optum COVID-19 EHR data. The study included adult patients with first COVID-19 infection (positive in SARS-CoV-2 PCR or rapid antigen test, or presence of ICD-10-CM code U07.1) between 1/1/2020–7/20/2022. Long COVID was defined as presence of at least one of 3 symptom clusters, including fatigue, cognitive problems, and respiratory symptoms after 3 months of first COVID-19 infection (index date). Patients were followed from index date to Long COVID, loss to follow-up, 365 days, COVID-19 reinfection, whichever came first. Patients with follow-up time less than 3 months or with Long COVID symptoms during -183 to -15 days prior to index date were excluded. Extreme gradient boosting (XGBoost) models were used to identify important predictors associated with Long COVID by each year (2020-2022). An ensembled selection method was used by calculating the weighted average rankings of key predictors across years. The resulting top predictors were further reviewed according to clinical relevance and the final 25 variables were used for patient-level prediction of Long COVID using XGBoost model. The resulting model was validated with an independent testing set. C-index and dynamic AUCs were used to evaluate model performance.
Results: A total of 873,376 COVID-19 patients were included, of which 82,310 patients (9.4%) were identified with Long COVID. The top 10 predictors from ensembled selection method included age, Charlson comorbidity score, acute COVID-19 symptoms, healthcare utilizations, history of dorsalgia and albuterol use. The prediction model using XGBoost achieved a C-index of 0.69 in the training set and C-index of 0.68 in the test set. The dynamic AUC for patient level risk prediction was 0.76 at 3 months, 0.70 at 6 months, and 0.68 at 12 months. Individual patient risk was assessed for getting Long COVID overtime with the final ensemble predictors.
Conclusions: This study identified a set of important predictors associated with Long COVID and assessed individual risk of Long COVID using machine learning. Our results aim to help healthcare providers better evaluate and manage Long COVID conditions.