Enhancing Healthcare Expenditure Prediction in Diabetes: A Machine Learning Approach
Speaker(s)
Kim HS1, Fu YH1, Huang PL1, Zafari Z2
1University of Maryland School of Pharmacy, Baltimore, MD, USA, 2The University of Maryland School of Pharmacy and Institute for Health Computing, Baltimore, MD, USA
Presentation Documents
OBJECTIVES: Diabetes imposes a substantial financial burden on society with $1 out of every $4 in United States (US) healthcare costs allocated to its management. Despite this, few studies have explored the potential of machine-learning (ML) algorithms to enhance predictions of healthcare expenditure. This study aims to develop ML algorithms to predict total healthcare expenditures among diabetics.
METHODS: We identified diabetics from the full-year consolidated data from the 2021 Medical Expenditure Panel Survey, a cross-sectional study on individuals aged 18 years and above. Thirty variables including demographics, socioeconomic status, comorbidities, and diabetes treatments were considered. Total healthcare expenditures were log-transformed and adjusted to 2023 US dollars using the consumer price index. The predictive performances of traditional regression, lasso, random forests (RF), tree-based boosting, neural networks (NN), and stacked ensemble methods were compared with 10-fold cross-validation. The data were divided into a 3:7 ratio. Performance metrics included mean square error (MSE) and correlation between the models’ predicted outcomes and observed outcomes. Model complexity was assessed through computation time.
RESULTS: The study included 3,184 individuals. Linear regression yielded an MSE of 0.60 and a correlation of 0.63. Lasso, with a similar computation time, yielded an MSE of 0.58 and a correlation of 0.61. ML algorithms showed similar performances to that of linear regression. Stacked ensemble models did not improve performance but required 10 times the computation time of linear regression’s. Tree-based boosting, RF, and NN performed the best with MSE (0.53 – 0.55) and correlations (0.65 – 0.68) at the expense of nearly 7 to 20 times computation time.
CONCLUSIONS: In the context of our study, ML algorithms offered minimal improvements over linear regression at the expense of substantially increased computation time. ML models can effectively predict outcomes but may be more appropriate in scenarios where capturing complex non-linear relationships between variables is paramount.
Code
MSR221
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
Diabetes/Endocrine/Metabolic Disorders (including obesity), No Additional Disease & Conditions/Specialized Treatment Areas