Associate Director for RWE at Office of Surveillance and Epidemiology, CDER US Food and Drug Administration Silver Spring, United States
Background: Machine learning (ML) has increasingly been used in the field of observational research, including the application of complex algorithms to real-world data (RWD) sources to derive cohorts or groups of patients with specific disease or phenotype of interest in pharmacoepidemiologic studies. There is a need for a systematic review to assess the contribution, usefulness, and interpretability of these sophisticated algorithms in observational studies.
Objectives: To review the application of ML for patient phenotyping and selection in observational studies.
Methods: We conducted a systematic literature review (SLR) according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework. Potential articles published from 12/02/2016 to 12/02/2021 were included from MEDLINE via PubMed using a search query with appropriate ML Medical Subject Headings (MeSH) terms and key words. The literature search was open to observational studies in English and involving humans of all populations, ages, and diseases/conditions. Opinion articles, letters to editor, editorials, reviews, articles published in technical journals, preprints and papers not published under peer review were excluded. Title and abstract screening and final extraction of data from included full texts were performed by paired independent reviewers. All conflicts were resolved by consensus. A narrative summary of the included studies was compiled. Study protocol was registered on PROSPERO (CRD42022298638).
Results: Following screening, 47 eligible studies were selected in which a ML method or a combination of methods were employed to identify phenotypes. The top three countries where these studies were conducted were the US (9 studies; 19%), France (4 studies; 8%) and Japan (4 studies; 8%). Cluster analysis was the most frequently used ML approach (23 studies; 49%). Different variations of cluster analysis were applied (e.g., model-based, hierarchical, K-means, density-based, and resampling-based consensus clustering). ML was applied to identify clusters of patients based on asthma severity, circadian rhythm, depressive symptoms, genetic subgroups of diseases, pain levels, dietary behaviors, or step counts, among others. Supervised learning (e.g., classification and regression tree) was used to predict phenotypes. Other algorithms, such as deep learning and natural language processing, were also applied to supplement the phenotype identification.
Conclusions: ML algorithms are being used in diverse ways for defining phenotypes in support of patient selection and cohort building. However, there is a need for more transparency in reporting and validation of applied algorithms.