Postdoctoral Research Scientist Columbia University Carrboro, United States
Background: Measurement error is a threat to the validity of study results. Routinely collected health care data, such as electronic health records, are particularly susceptible to measurement error, especially outcome misclassification. Many measurement error-correction approaches require validation data internal to the study sample, which may be unavailable and costly to collect. Secondary use of existing data as validation is an attractive alternative. However, to use such data, we may need to address systematic differences between validation data and the study sample.
Objectives: To develop estimators of the marginal risk and the average treatment effect that leverage external validation data to account for outcome misclassification.
Methods: To account for misclassification, validation data are used to estimate misclassification probabilities (i.e., sensitivity and specificity). When validation data are external, these misclassification probabilities need to be transported to the study sample. If misclassification is nondifferential, we can transport marginal misclassification probabilities. If misclassification is differential with respect to treatment and confounders only, we can transport misclassification probabilities conditional on treatment and confounders. However, if misclassification is differential with respect to other variables that differ in distribution between the validation and the study sample, we need to account for these variables to transport the misclassification probabilities. We introduce two ways to account for these other variables: 1) condition on these variables or 2) weight the validation data (using stabilized odds weights) to match the study sample distribution with respect to these variables. We assessed performance of the two approaches in simulation, under scenarios of nondifferential and differential misclassification. We also implemented these approaches in an example using electronic health record data and validation data from two prospective cohorts.
Results: The approaches were unbiased when misclassification was nondifferential, differential with respect to exposure and confounders, and differential with respect to other measured variables. In contrast, when misclassification was differential with respect to other measured variables, approaches that ignored the need to transport were biased. Of the two approaches accounting for the other measured variables, conditioning was more precise than weighting.
Conclusions: Quantitative transportability methods can be used to leverage external validation data to address measurement error.