Can Large Language Models Generate Conceptual Health Economic Models?
Author(s)
Chhatwal J1, Yildirim I2, Balta D2, Ermis T2, Tenkin S2, Samur S3, Ayer T4
1Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA, 2Value Analytics Labs, Boston, MA, USA, 3Value Analytics Labs, Chantilly, VA, USA, 4Georgia Institute of Technology, Boston, MA, USA
OBJECTIVES: The widespread adoption of Large Language Models (LLMs) could disrupt health economic modeling. Our objective was to evaluate the feasibility and accuracy of using two publicly available LLM-based tools, Bing Chat and ChatGPT-4, in developing a conceptual model for health economics analysis of a chronic diseases with multiple health states.
METHODS: We designed prompts to aid in developing a cost-effectiveness analysis model for hepatitis C treatment. We used the Zero-Shot prompting method for designing the prompts. Because Bing Chat has internet access, it uses Retrieval Augmented Generation (RAG) for generating answers. These prompts were tested in five separate experiments using Bing Chat and ChatGPT-4 to create a conceptual model. The models' structures, including health states and transitions, were evaluated against published sources for data accuracy and expert opinions for relevance and face validity.
RESULTS:
Both LLMs were effective in providing a relevant summary of the background of hepatitis C disease and treatment. The quality of model conceptualization was highly dependent on the prompts. Bing Chat, benefitting from internet access, generally provided more relevant responses and consistently provided model parameters. Conversely, ChatGPT-4 sometimes produced hallucinated parameters or failed to generate parameters entirely. Both LLMs showed variability in their output quality; the number of health states (range: 5-11) and transitions between states varied across experiment. Bing Chat generally demonstrated high-quality model conceptualization and parameter generation. In contrast, ChatGPT-4 occasionally generated clinically implausible transitions (e.g., hepatitis C cure after liver cancer).CONCLUSIONS: LLMs can be invaluable tools for developing conceptual health economic models. However, for those with limited domain knowledge or lacking experience in creating effective prompts, these tools have the potential to yield misleading information. Our study highlights the importance of expert guidance in utilizing LLMs for HEOR model development to ensure accuracy and reliability of the outcomes.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
EE355
Topic
Economic Evaluation, Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Cost-comparison, Effectiveness, Utility, Benefit Analysis, Decision Modeling & Simulation
Disease
No Additional Disease & Conditions/Specialized Treatment Areas