(A40) Identifying Generalized Pustular Psoriasis Flares Using Natural Language Processing of Unstructured Clinical Notes and Structured Procedure Codes
Head, Biostatistics & Data Science OMNY Health Atlanta, United States
Background: Generalized pustular psoriasis (GPP) is a rare severe multisystemic autoinflammatory skin disease. Patients with GPP experience flares, episodes of widespread eruptions of painful, sterile pustules often accompanied by systemic symptoms. GPP flares are unpredictable and can be life threatening. GPP flares are typically not documented explicitly in structured electronic health records (EHRs). Traditional methods to identify GPP flares in EHR data, such as Current Procedural Terminology (CPT) codes and string-based searches in clinician notes, often suffer from low performance metrics.
Objectives: To develop an algorithm that identifies GPP flares in structured EHRs and unstructured clinical notes through a transformer-based natural language processing (NLP) model.
Methods: Outpatient EHR data (2017-2022) from 5 specialty dermatology networks in the OMNY Health platform were accessed, and patients were selected if they had a GPP diagnosis code, clinical notes with non-templated text, and documented diagnosis status. Encounters with diagnosis statuses indicating worsening condition were labeled as ground truth flares and used as algorithm labels. Clinical notes from encounters were processed to filter templated text (prespecified auto-populated text). Encounters were split into training (70%), validation (10%), and test (20%) sets. A transformer-based NLP model pretrained on biomedical text was deployed on encounter-level clinical notes to predict flare labels. As reference, the following traditional algorithms were employed to indicate GPP flares on the test set: (1) presence of string combinations (e.g., flare, pustule, rash) or synonyms in the notes, (2) presence of structured CPT codes consistent with a moderate/high level of disease management (99204, 99214, 99205, 99215). Logistic regression models considering combinations of the NLP and traditional algorithms were implemented on the test set.
Results: From over 43M encounters, 6,301 had a diagnosis code for GPP, of which, 2,981 had clinical notes with non-templated text, and 1,005 had diagnosis status (41% ground truth flare). The area under the receiver-operating characteristic curve (AUC) for the NLP model was 0.67. AUCs for algorithms 1 and 2 (described in methods) were 0.52 and 0.66, respectively. The combination of the NLP model and algorithm 2 yielded the best fit and most discriminative model on the test set (AUC=0.73). The operating point that maximized F1 score yielded sensitivity of 91%, positive predictive value of 59%, and F1 score of 0.72.
Conclusions: A transformer-based NLP model provided additional predictive ability to identify GPP flares in EHR data. The combination of NLP and CPT-based algorithms yielded the best predictive ability to identify GPP flares in EHR data.