IDENTIFYING PREDICTORS OF HIGH-COST MULTIPLE SCLEROSIS PATIENTS: A MACHINE LEARNING APPROACH
Author(s)
Burns SM1, Icten Z1, Menzin J2
1Boston Health Economics, Boston, MA, USA, 2Boston Health Economics, LLC, Boston, MA, USA
OBJECTIVES: Multiple sclerosis (MS) is the leading cause of non-traumatic neurological disability in young adults. High healthcare resource utilization in MS patients is linked to relapses and disease progression. The goal of this study is to identify predictors of future high-cost MS patients through machine learning (ML) methods. METHODS: Newly diagnosed MS patients were identified using ICD-9/10 codes in Medicare Part A/B claims (5% sample) between 2011Q1-2015Q1. The total Medicare expenditures during 12 months after the index diagnosis were summed, with the highest decile of average monthly spending defined as “high cost”. Data were partitioned using a 60%/25%/15% split to train, validate and test performance on unseen data. A traditional logistic regression model was estimated as a reference model. Random forest (RF), XGBoost and support vector machine (SVM) models were developed and the best model was selected through assessment of the area under the ROC curve (AUC). Accuracy, recall, precision and F1 scores were also reported. Features that were consistently important for prediction based on model specific importance metrics across all models were noted. RESULTS: The study population included 5,863 MS patients (mean age=63.1 years; females=67%). All ML methods (AUC RF: 78.1%, SVM: 77.4%, XGBoost: 74.6%) outperformed the logistic regression model (AUC: 64.4%). The best performing ML model was RF, which had accuracy: 70.4%; recall: 72.7%; precision: 21.3% and F1: 0.33. The top 15 features identified as positively associated with high-cost patients across all ML methods were being male, selected medication use (epoetin alfa, heparins, insulin), limitations of mobility, musculoskeletal deformities, indicators of past inpatient service utilization, stroke, and comorbidities including chronic kidney disease, hypertension, and pulmonary diseases. CONCLUSIONS: In predicting high-cost patients, ML methods provided a significant improvement over the logistic regression model. Our findings can be used to guide further research to proactively manage care and reduce healthcare spending.
Conference/Value in Health Info
2020-05, ISPOR 2020, Orlando, FL, USA
Value in Health, Volume 23, Issue 5, S1 (May 2020)
Code
PND117
Topic
Economic Evaluation, Methodological & Statistical Research, Real World Data & Information Systems
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Health & Insurance Records Systems
Disease
Neurological Disorders