VP, Global Head of Data Science Odysseus Data Services Inc. New York, United States
Background: Determining the precise date of cancer diagnosis (DtDx) is important for studying disease progression, survival and other outcome of treatment. While the patient history, medical imaging and lab tests provide the strong presumption, definitive cancer diagnosis is made through histological examination of a tumor tissue sample obtained through biopsy, fine needle aspiration or surgical sample and reported by the path lab. Structured longitudinal data from EHR have great potential for cancer research, but often do not provide those synoptic pathology reports, including the DtDx. Cancer diagnoses code records might start much earlier at the initial encounter. The correct date therefore must be inferred, with diagnostic procedure dates such as those of biopsies as the most promising approach. Institutional tumor registries (TR) typically collect the correct diagnosis date and could be used to guide the optimal heuristic.
Objectives: To characterize (i) the discrepancy between DtDx from tumor registry (TR) and the date of 1st encounter with a cancer code, and (ii) the utility of biopsy date in correctly identifying DtDx.
Methods: Patients with breast, pancreas and prostate cancer from two academic centers with linkage data to the institutional TR were included: Northwestern University (NW) and Memorial Sloan Kettering (MSK), and the time interval distribution between TR DtDx to both the date of 1st cancer encounter and the closest biopsy were assessed.
Results: 14,139 breast cancer, 1,066 pancreatic cancer and 5,976 prostate cancer patients with data in both EHR and TR and at least one biopsy procedure performed inside the institution were included. The date of 1st EHR encounter was within 30 days of the date of TR DtDx in 68-70% of breast, 80% of pancreas and 90% of prostate cancer patients. The median (IQR) ranges spanned from 0 (-15, 2) to 0 (-12, 0) for breast, 0 (-7, 2) to 0 (0, 2) for pancreas and 0 (0, 0) to -7 (-15, -7) for prostate cancer in MSK and NW, respectively. Date of biopsy was closer to DtDx in 31-34% of breast cancer, 43-79% of pancreas and 13-33% of prostate cancer patients. Using the date of biopsy instead of 1st EHR encounter would overestimate TR DtDx in 31-34% of breast, 4-13% of pancreas and 2-58% of prostate cancer patients.
Conclusions: The utility of diagnostic procedures in correctly identifying DtDx is limited. Pending tumor type, 1st initial EHR encounter can be reliably used as the DtDx.