Background: Incident brain metastasis (BM) in patients with metastatic breast cancer (mBC) is associated with poor prognosis. Identifying patients with a higher risk of developing BM is needed, and developing predictive models based on real-world data may help in early detection.
Objectives: To develop a machine learning model to predict time to BM in patients with mBC within three years of the mBC diagnosis.
Methods: This study used the nationwide (U.S.-based) de-identified Flatiron Health-Foundation Medicine mBC clinico-genomic database (FH-FMI CGDB). The de-identified electronic health record (EHR)-derived data originated from approximately 280 U.S. cancer clinics (~800 sites of care) and were linked to genomic data derived from FMI comprehensive genomic profiling (CGP) tests. Female adult patients diagnosed with mBC after 1/1/2011 and free of BM before and at the time of diagnosis were selected. The index date for the predictive model was mBC diagnostic date, and the primary outcome was time to BM within three years. Patients were additionally restricted to those with CGP solid tumor tests before the occurrence of BM or censored in three years. The model was trained in ~80% of the study sample using the cox model with lasso penalties adjusting for left truncation bias, and tested in the ~20% remaining. The concordance index (C-index) and the hazard ratio of time to BM in predicted BM high-risk relative to low-risk population were assessed for the model performance.
Results: A total of 2,827 patients (305 human epidermal growth factor receptor 2-positive (HER2+), 1,739 hormone receptor–positive/HER2- and 783 triple negative breast cancer (TNBC)) were included of which 353 had a BM event, and 2,474 were censored within three years after mBC diagnosis. Patients were more likely to be white and have recurrent breast cancer at mBC diagnosis, and the median age were 59 (interquartile range: 50, 68). The predictive model demonstrated promising prediction accuracy in the testing dataset with a C-index of 0.76 (95% confidence interval (CI): 0.70, 0.81). Predicted BM high-risk patients were 5.11 times (95% CI: 2.98, 8.77) more likely to develop BM events within three years, compared to those predicted BM low-risk patients. Most important predictors related to a higher risk of BM included the presence of lung metastasis, TP53 alteration and TNBC, and younger age at mBC diagnosis.
Conclusions: The BM model is the first machine learning model to predict the time to BM in patients with mBC using nationwide data from routine CGP tests and chart-confirmed EHRs, showing promising performance. It may support enhancing the early detection of high-risk BM patients and informing enrichment strategies for trials investigating BM prevention.