Director, Data Management and Statistics Target RWE, United States
Background: Performing multiple Negative Control Outcome analyses can be a valuable way to cast a wide net when trying to assess the possibility of violated statistical assumptions in causal analyses, but it can be challenging to synthesize the information provided by the analyses. In this talk a novel Bayesian method is proposed that can be used to construct a distribution characterizing the observed bias from the analyses.
Objectives: This methodology provides a mechanism to make statements about the distribution of negative control outcome risk differences. The observed risk differences are estimates of the bias since the risk differences are 0 when the model assumptions are met. By having such a distribution one can make statements about the threshold at which the magnitude of the bias will be below with some high degree of probability. One application of having such a measure is to use it to make a principled a priori decision rule about whether to proceed with a follow-up study using an outcome of interest; if the bias threshold is below some preselected amount then the follow-up study may proceed, otherwise it is determined that there is too much bias to continue.
Methods: The negative control outcome bias distribution is modeled using a Bayesian model. At the top level the true bias values are assumed to follow a hierarchical Normal distribution conditional on a Bernoulli positivity parameter, and where the mean of the Normal distribution is either positive or negative value depending on the value of that parameter. The motivation for this distribution is that if a model is biased, then depending on the negative control outcome the mean of the bias may be either positive or negative. Indeed, switching the outcome and non-outcome roles for a negative outcome variable will flip the sign of the true risk difference.
The true bias values are latent variables, and the available data are the estimates from the negative control outcome analyses. The model assumes that the estimated values are normally distributed with mean given by the true bias values, and covariance a hyperparameter that can be specified using an empirical Bayes approach where an estimated covariance matrix can be constructed by using results from bootstrap iterations.
Results: Constructing the posterior predictive distribution for the negative control outcomes under this model on simulated data resulted in distributions that fit the true distribution of the data generating mechanism better than nonparametric kernel density estimates for input sizes of 10-20 negative control outcomes.
Conclusions: Characterizing the results from multiple negative control outcome analyses as a distribution provides a valuable way of summarizing the information provided by the analyses.