PhD Candidate in Social and Pharmacoepidemiology University of North Carolina Gillings School of Global Public Health, United States
Background: With recent calls to advance transgender (TG) health research methods, computational phenotypes (CPs) have become emerging tools to identify TG individuals within electronic healthcare databases. However, the validity of CP algorithms to identify TG patients is not well understood, and a review of validated CPs is necessary to prevent potential misclassification of TG people.
Objectives: We aim to summarize the current state of the literature on CPs to identify TG people within electronic healthcare data, to discuss potential gaps in accuracy and validity of these algorithms, and to provide future recommendations for their use in real-world evidence.
Methods: In September 2022, authors searched the National Library of Medicine’s PubMed, Scopus, and the American Psychological Association Psyc Info’s databases to identify studies published in the United States (US) that applied CPs to identify TG people within electronic health care data. Multiple combinations of search terms included: “transgender” “electronic health records” “computational phenotype” and “electronic medical records.” Using the Covidence software, our narrative review focused on original research articles that applied algorithms to electronic healthcare databases to identify TG patients in the US and measured the validity of their CP.
Results: Eleven studies met inclusion criteria. These studies were able to validate or enhance the positive predictive value (PPV) of their CP through manual chart reviews (n=5), hierarchy of code mechanisms (n=3), key text-strings (n=2), or self-surveys (n=1). CPs with the highest PPV to identify TG patients within their study population contained both diagnosis codes and key text-strings. For example, Roblin and colleagues found that the application of key text-strings only, diagnosis codes only, and both diagnosis and key text-strings led to PPVs of 45%, 56%, and 100%, respectively. However, if key text-strings were not available, researchers have been able to find TG patients through diagnosis codes alone depending on the electronic healthcare data source used. For example, Wolfe and colleagues found that gender identity disorder diagnosis codes had the highest PPV (83%) among EHR strategies to identify TG veterans in Veterans Health Administration (VHA) data.
Conclusions: CPs with the highest accuracy to identify TG patients contained both diagnosis codes and key text-strings. Among CPs relying on diagnosis alone, gender identity disorder codes had the highest PPV. Findings highlight the value of EHRs for studying healthcare use among TG people. Future work should explore individual and provider-level factors associated with barriers to documentation of gender identity disorder.