(042) The Size Paradox: Comprehensive Data and Better Methods Can Yield Significant Results with Smaller Sample Sizes in Real World Data (Size Doesn’t Always Matter)
President Better Health Worldwide, Inc Newfoundland, United States
Background: Patient information in many databases are limited to age, gender, and region and use unadjusted comparison techniques. Some databases have comprehensive information that facilitate robust comparisons. Few studies compare impacts of comprehensive information and regression methods.
Objectives: Understand the sample-size impact of basic vs comprehensive (COMP) employee-patient information and regression vs non-adjusted comparisons.
Methods: Retrospective analysis of WorkPartners' Database for employees (EMP) with Hepatitis-C and controls (non-HCV EMP). Random sampling of HCV-EMP-patients and 3X controls (all with ≥1 year continuous eligibility). Unadjusted measures (means, t-tests) were compared with 2-stage (logistic followed by generalized linear models) stepwise regression controlling for basic descriptive regressors (age, gender, region, Charlson Comorbidity Index [CCI] scores) and COMP descriptive regressors (basic metrics plus self-reported race, job-related data [salary,full-/part-time status, exempt-/non-exempt status]). Outcomes included direct (medical,prescription) costs, indirect absence costs (from payroll records) due to sick-leave (SL), short-/long-term disability (STD/LTD), workers’ compensation (WC) and lost-time (from employer records for SL, STD, LTD, WC). Means, standard errors (SEs), and confidence intervals [CIs] were compared using sensitivity analysis to identify sample sizes needed for the three methods.
Results: The two stage regressions using COMP descriptive components consistently had the smallest SEs, the narrowest CIs, and the highest likelihood of identifying significant between cohort differences. Significant differences in direct costs were achieved in samples of 50 HCV EMP-patients. STD and WC costs required samples of 200 HCV EMP-patients. WC days required 500 HCV-EMP-patients. For HCV EMP-patients, the (range [minimum—maximum], average) SE ratios between methods were: unadjusted / basic (1.42—55.40, 6.64); unadjusted/COMP (1.63—58.64, 7.84); basic / COMP (0.97—2.27, 1.41). For control EMP (with 3x the HCV sample), SE ratios between methods were: unadjusted / basic (1.16—8.70, 2.73); unadjusted / COMP (1.27—6.45, 2.69); basic / COMP (0.54—1.27, 1.08). The stepwise process frequently selected CCI, salary, age, and gender.
Conclusions: Comprehensive (COMP) descriptive information used in regression models consistently outperformed basic descriptive regression models and non-adjusted methods. Two stage regression better controlled for outliers reducing SEs by over 50X. Significant results are achievable with smaller sample sizes using comprehensive descriptive data and two stage regression techniques.