Quantifying Missingness in Real World Data for Enhanced Data Quality Assessment

Speaker(s)

Marks-Anglin A1, Kao YH2
1Merck & Co., Inc., West Point, PA, USA, 2Merck & Co., Inc., Stamford, CT, USA

OBJECTIVES: Missingness in real world data can lead to biased and less precise estimates in non-interventional studies. The impact of missingness is often considered at the analytic stage, after the study sample has been formed. However, missingness may also occur at the sample formation stage with inclusion/exclusion variables. This research aims to quantify selection bias associated with missingness at the attrition stage.

METHODS: To study the impact of missingness, we identify a stage I-III breast cancer cohort with initial diagnosis from 2016, and consider the availability of BRCA1 and BRCA2 gene testing among patients in Syapse’s enriched analytic dataset. We then assess differences in clinical and demographic characteristics between tested and untested patients to better understand the impact of including BRCA1/2 availability and status in cohort selection.

RESULTS: Among 23,183 patients in the Syapse enriched breast cancer cohort, only 41% were tested for BRCA variants (71% among those indicated for testing). Testing was associated with younger age (57 vs. 66 yrs., p<0.001), being married (59% vs. 50%, p<0.001), having private insurance (56% vs. 51%, p<0.001), higher median household income (p<0.001), negative ER and PR status (p<0.001) and family history of breast and ovarian cancer (p<0.001).

CONCLUSIONS: Sample loss due to missingness of inclusion or exposure variables can result in loss of power, unrepresentative samples and selection bias. Multiple imputation approaches including random forest, LASSO regression and deep learning will be used to impute the missing test results. We will then compare attrition, sample characteristics and overall survival in the imputed and non-imputed sample to quantify bias and power loss due to missingness.

Code

MSR189

Topic

Methodological & Statistical Research, Organizational Practices, Study Approaches

Topic Subcategory

Best Research Practices, Confounding, Selection Bias Correction, Causal Inference, Electronic Medical & Health Records, Missing Data

Disease

Drugs, Oncology