(195) The TAK-medoids, an algorithm to visualize healthcare pathway pattern on a very large cohort of patients: application on 50 000 French patients with COPD
Background: To improve the interpretation and visualization of medical events sequences and healthcare pathways for very large cohorts of patients, the TAK-medoids, method derived from meta-TAK (De Oliveira et al. - Meta-TAK: a scalable double-clustering method for treatment sequence visualization) is of special interest. For diseases affecting large number of patients, such as COPD, this method is indeed very interesting to identify and compare patients with similar healthcare pathways.
Objectives: To apply the TAK-medoids algorithm methodology on medical events sequence of a large cohort of COPD patients to visualize their healthcare pathway and to identify and describe groups with similar pathways.
Methods: The TAK-medoids algorithm consists in applying hierarchical agglomerative clustering only on a wisely chosen subsample of patients, referred as patient-medoids. This methodology was tested on a cohort of 54 545 patients with COPD identified through the French National Insurance Database (SNDS). Patient-medoids were selected after a K-means algorithm based on meta-features representing medical event characteristics. The challenge was to configure the K number of medoids and meta-features definition so that the Ward distance between a medoid and all the patients it represented was small (i.e. the cluster homogeneity score is high). The TAK algorithm was then applied on patient-medoids, which leads to an optimal ordering based on hierarchical clustering. To obtain the final representation of all the medical events occurring in the cohort, every patient-medoid, represented as a time vector was multiplied by the number of patients it represents.
Results: A total of 50 meta-features were selected corresponding to the number of observations of each pathway event, and the order of occurrence of these events. The optimal number of patient-medoids was 3 000, each of them representing 6 patients in median (Q1: 3; Q3: 12). One patient-medoid represented up to 8 314 patients while 417 medoids only represented themselves. Finally, using the hierarchical agglomerative clustering on patient-medoids, we identified and represented 4 main care pathways; one of them (which concerned 66% of the cohort) was represented by 458 (15%) patient-medoids.
Conclusions: By modifying the meta-features originally used in the Meta-TAK and by simplifying the resulting visualization, we manage to get a readable representation of medical events in a cohort of more than 50 000 patients. While the Meta-TAK was introduced in order to face the computation complexity issue, the TAK-medoids goes one step further by simplifying the resulting cohort visualization and by being performant on medical events.