Assessing the Generalizability of Automating Adaptation of Excel-Based Cost-Effectiveness Models Using Generative AI
Author(s)
Rawlinson W1, Teitsson S2, Reason T3, Malcolm B4, Gimblett A1, Klijn S5
1Estima Scientific Ltd, London, UK, 2Bristol Myers Squibb, Uxbridge, LON, UK, 3Estima Scientific Ltd, South Ruislip, LON, UK, 4Bristol Myers Squibb, Middlesex, LON, UK, 5Bristol Myers Squibb, Utrecht, ZH, Netherlands
Presentation Documents
OBJECTIVES: A previous study (ISPOR 2024, P48) described a method ‘LLMAdapt’ that uses a large language model (LLM) to automatically adjust an Excel-based cost-effectiveness model (CEM) from the setting of one country to another. The authors found a high level of accuracy (97%) for one test case. Assessment of generalizability is an important step for uptake and acceptance of AI-based methods by decision-makers. The objective of this study was to assess the generalizability of LLMAdapt across two distinct disease areas and countries.
METHODS: LLMAdapt (powered by Generative Pre-trained Transformer 4 [the gpt-4-1106-preview model]) was used to automatically adjust two HTA-ready Excel CEMs from the setting of one country to the setting of another. To support the adaptations, GPT-4 was provided with tabular data for each of the target countries in a format that mimicked the output of a targeted literature review. Prior to conducting the study, each CEM received minor updates to improve its interpretability, such as clarifying vague descriptive text. The models spanned the following disease areas: muscle-invasive urothelial carcinoma (MIUC) and myelodysplastic syndrome (MDS) and were adapted to the following countries: the Czech Republic and the United States. All automated adaptations were manually checked by a human health economist to assess accuracy.
RESULTS: The adaptations were performed without human intervention in 132 and 207 seconds. LLMAdapt performed 101/102 and 198/199 required updates successfully, resulting in accuracy scores of 99.0% and 99.4%. Two errors were identified, in which required parameter value changes were missed.
CONCLUSIONS: We found that the accuracy of LLMAdapt was maintained across two distinct disease areas and countries, demonstrating the generalizability of LLM-based methods to automate the adaptation of Excel-based CEMs. This is an important step towards uptake of these methods.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
HTA156
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, Oncology