Current Applications of Machine Learning for Causal Inference in Healthcare Research Using Observational Data
Author(s)
Onasanya O, Hoffman S, Harris K, Dixon R, Grabner M
Carelon Research, Wilmington, DE, USA
Presentation Documents
OBJECTIVES: Machine learning (ML) approaches have the potential to facilitate causal inference using large, multidimensional real-world data (RWD). However, the implementation of ML-based approaches in answering causal questions can be conceptually and computationally complex. To address this challenge, we used findings from a literature review to visually illustrate the relationships between different types of barriers to causal inference in healthcare research. We further examined the landscape of ML applications which have been targeted at addressing these barriers.
METHODS: We conducted a comprehensive review of published literature to identify RWD studies with ML applications for causal research. We classified the applications into three broad domains (to be presented via an infographic); created a glossary of commonly used terms; and generated a list of illustrative case studies. Lastly, we identified key assumptions and computational issues that researchers need to be aware of within each domain.
RESULTS: The identified ML applications were classified into three domains based on their potential to strengthen causal inference. The domains are: (1) reduction of exposure/outcome misclassification bias through algorithm development, natural language processing of medical records for outcome validation, and probabilistic bias analysis; (2) determination and reduction of measured confounding from RWD through identification of conditional distributions between variables, propensity score methods, doubly robust effect estimation (e.g., ensemble learning targeted maximum likelihood estimation), and recurrent neural networks; and (3) reduction of unmeasured confounding using high-dimensional proxy confounder adjustment, data-driven automated negative control estimation, identification of machine-learned instrumental variables, and subset calibration methods.
CONCLUSIONS: ML applications can strengthen causal research using real-world healthcare data. However, the range and complexity of applications limits wider use. To overcome this limitation, we provide a visual roadmap to relevant ML applications to help researchers quickly identify the appropriate tools given their specific research question.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
MSR36
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Confounding, Selection Bias Correction, Causal Inference, Electronic Medical & Health Records
Disease
No Additional Disease & Conditions/Specialized Treatment Areas