A Comparison of of Symbolic Regression Machine Learning Methods for Mapping EQ-5D Utilities

Author(s)

Crott R
Regulatory Scientific and Health Solutions, Birmingham, West Midlands, UK

OBJECTIVES: Many Symbolic Regression algorithms have been published using a variety of methods. This research aims at comparing mapping utilities in cancer with different Symbolic Regression (SR) approaches. SR is an established Machine Learning technique for identifying the optimal mathematical expressions that can describe relationships within a given data structure.

METHODS: We retrieved the individual patient data from three data sets previously used for mapping the EORTC QLQ-C30 to the EQ-5D-3L in Non-Small Cell Lung Cancer patients (Jang 2010, Crott 2018, Khan & Morris 2014).The SR analyses were performed using TuringBot, Mathematica and the GPlearn and PySR modules in Python.The best fitting equation was identified by minimizing the RMSE as loss function, without limits to the equation complexity score. Goodness-of-fit (GOF) was further assessed by MAE and R² and compared with that of an ordinary least squares regression (OLS) including all QLQ-C30 function scores and items.

RESULTS: Widely different best-fitting equations were obtained in all three datasets depending on the algorithm used and the software applied. Increased complexity of the model did not always result in a better fit.

CONCLUSIONS: Symbolic Regression methods can improve the predictive accuracy of the mapping algorithms compared to more traditional OLS. However, this depends very much on the software and input parameters. We found that the genetic algorithm in Mathematica provided the better fit; however, the equations can become extremely complex resulting in poor interpretability. It remains to be seen if more recent alternative SR methods like Bayesian, neural networks, or transformer-based ones could improve our results. Further research in other disease areas and other generic quality of life measures is warranted.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

MSR223

Topic

Economic Evaluation, Methodological & Statistical Research, Patient-Centered Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Cost-comparison, Effectiveness, Utility, Benefit Analysis, Health State Utilities, PRO & Related Methods

Disease

No Additional Disease & Conditions/Specialized Treatment Areas, Oncology

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×