Chief Science Officer Uppsala Monitoring Centre Uppsala, Sweden
Background: Pharmacovigilance requires identification of adverse event terms related to the same clinical condition, and solely relying on hierarchies in medical terminologies is often not sufficient. Similar challenges exist for the analysis of medicinal products. Rapid advances in artificial intelligence and machine learning have led to language models that can identify semantically related words. With these approaches words are represented by vectors which are derived from their context in large text corpora. Adaptations to pharmacovigilance have been proposed, creating vector representations of adverse events and drugs based on reporting patterns.
Objectives: To evaluate stability and clinical relatedness of nearest neighbors identified via vector representations for adverse events and drugs derived from global pharmacovigilance reporting patterns.
Methods: vigiVec is an adaptation of the publicly available Word2Vec neural network language model. It creates vector representations of MedDRA preferred terms and WHODrug active ingredients from reporting patterns in VigiBase, the WHO global database of individual case safety reports. Our evaluation focused on nearest neighbors identified by the cosine similarity of the vector representations. Stability was measured as the average overlap in the ten nearest neighbors for each adverse event or drug, in repeated fitting of vigiVec. Clinical relatedness was measured through term intruder detection, where a medical doctor was asked to identify a random intruder among the four nearest neighbors to a specific adverse event or drug.
Results: Among the ten nearest neighbors, 1.8 adverse events were on average part of the same High Level Term (e.g. Coagulopathies), and 1.3 drugs were part of the same ATC level 3 group (e.g. Opioids). For neighbors and intruders chosen outside HLT the intruder detection rate was 79%. Within HLT it was 46%. By chance, we should expect 20%. Corresponding rates for the drugs were 64% outside ATC3 and 42% within. The stability of nearest neighbors was 80% for adverse events and 64% for drugs. For illustration, the ten nearest neighbors of Throat tightness all relate to a clinical context of oropharyngeal hypersensitivity, but only two belong to the same HLT (Upper respiratory tract signs and symptoms). However, eight share its SMQ (Oropharyngeal disorders). This highlights vigiVec’s ability to create clinically relevant representations, while complementing manually curated terminologies.
Conclusions: The semantic representations of vigiVec are stable and show a high level of clinical relatedness. Data-driven identification of clinically related adverse events and drugs may complement existing medical hierarchies, supporting domain experts in pharmacovigilance.