Background: Identification of unknown adverse events (AEs) is a core objective throughout the drug development lifecycle. A challenge facing pharmacovigilance is effectively extracting and integrating heterogeneous information from various data sources, including molecular structure, physico-chemical properties, nonclinical data including in vitro target, metabolic enzyme, and transporter assays, clinical and post marketing data, to enhance safety signal detection, prediction and refinement. Advances in computational science have stimulated research using artificial intelligence including machine learning to support safety signal management from bench-to-bedside.
Objectives: To present a proposed predictive workflow for supporting drug safety signal prediction, detection and evaluation management based on multimodal graph data and graph machine learning.
Methods: A knowledge graph (KG) containing data from Open Targets on drugs/compounds, (target proteins, diseases, and AEs) will be developed. The dataset from the Open Targets (Version 22.11) contains 12,854 drugs/ compounds, 62,678 targets, 22,274 diseases and phenotypes (nodes), leading to 14,611,717 relationships (links between nodes). The resulting knowledge will be embedded as machine readable vectors so that graph machine learning can be applied for node prediction (e.g., predict properties of drugs or targets), link prediction (i.e., identify missing relationship between drugs, targets, and adverse events).
Results: This project is ongoing. At the end of the project, a graph machine-learning based safety signal management model applicable to drugs in development as well as registered drugs will be developed. Preliminary results will be presented at the ICPE meeting. Predictive/classification performance will be evaluated using test datasets from Open Targets, Pfizer internal clinical datasets, and publicly available information. Performance will be compared with other existing safety signaling methods (e.g., disproportionality analysis).
Conclusions: Hopefully, the graph machine-learning based safety prediction model can improve safety signal prediction, detection and evaluation relative to legacy methods. Inclusion of target/pathway/phenotype data may generate hypotheses on potential mechanism, making the drug-AE prediction more interpretable, and providing cogent biological plausibility arguments for signal refinement within the Bradford Hill causality framework. In the next step we refine the prediction model by including additional types of data, such as nonclinical, in vitro and clinical data, in order to further enhance the safety signal prediction.