A Comparison of of Symbolic Regression Machine Learning Methods for Mapping EQ-5D Utilities
Speaker(s)
Crott R
Regulatory Scientific and Health Solutions, Birmingham, West Midlands, UK
Presentation Documents
OBJECTIVES: Many Symbolic Regression algorithms have been published using a variety of methods. This research aims at comparing mapping utilities in cancer with different Symbolic Regression (SR) approaches. SR is an established Machine Learning technique for identifying the optimal mathematical expressions that can describe relationships within a given data structure.
METHODS: We retrieved the individual patient data from three data sets previously used for mapping the EORTC QLQ-C30 to the EQ-5D-3L in Non-Small Cell Lung Cancer patients (Jang 2010, Crott 2018, Khan & Morris 2014).The SR analyses were performed using TuringBot, Mathematica and the GPlearn and PySR modules in Python.The best fitting equation was identified by minimizing the RMSE as loss function, without limits to the equation complexity score. Goodness-of-fit (GOF) was further assessed by MAE and R² and compared with that of an ordinary least squares regression (OLS) including all QLQ-C30 function scores and items.
RESULTS: Widely different best-fitting equations were obtained in all three datasets depending on the algorithm used and the software applied. Increased complexity of the model did not always result in a better fit.
CONCLUSIONS: Symbolic Regression methods can improve the predictive accuracy of the mapping algorithms compared to more traditional OLS. However, this depends very much on the software and input parameters. We found that the genetic algorithm in Mathematica provided the better fit; however, the equations can become extremely complex resulting in poor interpretability. It remains to be seen if more recent alternative SR methods like Bayesian, neural networks, or transformer-based ones could improve our results. Further research in other disease areas and other generic quality of life measures is warranted.
Code
MSR223
Topic
Economic Evaluation, Methodological & Statistical Research, Patient-Centered Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Cost-comparison, Effectiveness, Utility, Benefit Analysis, Health State Utilities, PRO & Related Methods
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, Oncology