CAN WE IMPROVE THE ESTIMATION OF CENSORED COST DATA USING RANDOM FORESTS?

Author(s)

Rueda JD1, Valencia C2, Mullins CD1, Onukwugha E1, Zhan M3, Slejko JF1
1University of Maryland School of Pharmacy, Baltimore, MD, USA, 2Andes University, Bogotá, CUN, Colombia, 3University of Maryland School of Medicine, Baltimore, MD, USA

OBJECTIVES Right censoring is a common problem in the analysis of cost data. Available techniques can produce biased estimates in the presence of informative censorship. Our objective was to compare two popular techniques (i.e. Kaplan-Meier sample average estimator [KMSA] and the inverse probability weighting estimator [IPTW]) against a random forest estimator. Random forest is easily implemented algorithm that has been shown to have high predictive accuracy.

METHODS We identified a cohort of individuals diagnosed with multiple myeloma (MM) from 2007-2013 using the Surveillance, Epidemiology and End Results – Medicare database, consisting of cancer registry data linked to Medicare claims. The 5-year total medical costs were estimated as the sum of inpatient and outpatient claims. All individuals in the cohort were uncensored as they died within the 5-year follow-up. Using a discretized gamma (6-month intervals) function (shape=1, rate=0.7, scale=1.43), we created artificial censoring of 30%, a typical value for the censored proportion. We compared the estimates from the KMSA, IPTW, and random forest against the mean 5-year cost of the uncensored cohort. In order to fit and validate the random forest, we used the out-of-bag error.

RESULTS The mean total 5-year cost of the uncensored cohort of 7398 MM patients was $132,977. The median survival time of this cohort was 345 days. After the 30% censoring was applied, the median survival time was 710 days. The estimates for the 5-year costs were $190,246, $135,877, $114,250, for KMSA, IPTW, and random forest, respectively. The time needed to run the random forest was 267 minutes.

CONCLUSIONS Censored costs needs to be correctly managed in order to avoid bias. Random forests show promise as an alternative with minimal assumptions and effortless application. The biggest shortcoming of this new method is the amount of time required for its estimation.

Conference/Value in Health Info

2019-05, ISPOR 2019, New Orleans, LA, USA

Value in Health, Volume 22, Issue S1 (2019 May)

Code

PNS206

Topic

Economic Evaluation, Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Cost/Cost of Illness/Resource Use Studies, Missing Data, Modeling and simulation

Disease

Oncology

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×