Background: Acromegaly is a rare endocrine disease characterized by a diagnostic delay ranging from 5 to 10 years from the onset of the symptoms.
Objectives: To develop and internally validate machine-learning algorithms to identify a combination of predictive variables for the early diagnosis of acromegaly using administrative claims databases.
Methods: This retrospective population-based study was conducted between 2011 and 2018 using data from the claims databases of Sicily Region in Southern Italy. For each identified acromegaly case, the date of the first registration of at least one of the acromegaly-related claim was considered as the index date. Acromegaly cases were matched with up to 10 controls by date of birth and gender. To identify possible combinations of candidate predictors associated with the diagnosis of acromegaly, two different logistic regression models (i.e., classical and unconditional) and three machine learning algorithms [i.e., the random forest (RF), the Recursive PArtitioning and Regression Tree (RPART) and the support vector machine (SVM)] were performed, and accuracy of different models were compared.
Results: The RF achieved the highest area under the receiver operating characteristic curve (AUC) value, with an AUC of 0.81 (95%CI: 0.77-0.85), followed by the RPART (AUC=0.75, 95%CI: 0.70-0.80), the classical logistic regression model (AUC=0.68, 95%CI: 0.65-0.72), the conditional logistic regression model (AUC=0.61, 95%CI: 0.56-0.67) and the SVM (AUC=0.59, 95%CI: 0.54-0.64). Models’ sensitivity in the validation set ranged from 38.1% for the classical logistic regression model to 74.0% for the classical RF. Overall, the predictors selected by all three machine learning algorithms were: (i) the number of pharmacy claims related to immunosuppressants, drugs for thyroid therapy, corticosteroids for systemic use and vitamins, (ii) the prescription of the following specialist visits/laboratory tests: prolactin level measurement and radiotherapy visit and (iii) the diagnosis of diabetes. When considering also the logistic regression models, the predictors selected by all five models were the number of pharmacy claims related to immunosuppressants and drugs for thyroid therapy.
Conclusions: Findings from the study showed that machine learning models may play a prominent role for the prediction of the diagnosis of rare diseases. According to the predictive models developed, the two predictors mostly associated with the presence of acromegaly were the number of pharmacy claims related to immunosuppressants and drugs for thyroid therapy, potentially indicating that systemic inflammation and/or autoimmune diseases may be key predictors of acromegaly diagnosis.