Associate Director of Epidemiology IQVIA Frankfurt am Main, Germany
Background: Approximately 1 in 10 adults in the US have type 2 diabetes mellitus (T2DM), costing the health system over $300 billion annually. Early detection of patients at high risk of T2DM provides opportunities for lifestyle and pharmacologic interventions that may prevent progression to T2DM or limit disease severity.
Objectives: To assess the ability of a machine learning model to identify patients at high risk of being diagnosed with T2DM in the next 12 months.
Methods: A case control study was conducted using de-identified electronic health record data from over 100,000 ambulatory care physicians in the United States (IQVIA Ambulatory EMR). Cases and controls were ascertained between January 2018 and December 2021 from patients who were ≥18 years old and had at least one encounter in the previous 12 months. For cases, index date was defined as the first qualifying T2DM event (indicated by diagnostic codes, hemoglobin A1C ≥ 6.5%, or treatment with metformin, whichever came first). Controls were selected at a 10:1 ratio to cases and chosen randomly from those records without a qualifying T2DM event in the ascertainment window. An XGBoost model was trained to identify T2DM cases based on a 1-year lookback period was used to define 2,734 predictors, 25 of which were retained in the final model. Model performance was evaluated on a held-out test set (i.e., data not used in model training) at a 100:1 control to case ratio.
Results: 1.5 million patient records were used for model training and a separate 1.4 million were used for model evaluation. When the threshold for determining ‘high-risk’ patients was set to 5% sensitivity (i.e., the algorithm would identify 5% of cases), 16.6% of the patients identified by the model had a qualifying T2DM event diagnosis within the next 12 months.
Conclusions: Machine learning models can effectively identify patients at high risk of developing T2DM. Similar models can be leveraged at the point of care to provide targeted treatment and prevention, potentially improving patient outcomes and reducing overall disease burden.