Large Language Models for Outcomes Research: A Targeted Review
Author(s)
Dolin O, Lim V, Hepworth T, Gonçalves-Bradley D, Langford B
Symmetron Limited, London, LON, UK
Presentation Documents
OBJECTIVES: Interest in using large language models (LLMs) for outcomes research has increased in recent years; however, feasibility of LLM integration within research workflows remains unclear. A targeted review was conducted to identify case studies and guidance on LLM usage in outcomes research (including qualitative/quantitative evidence synthesis and real-world data analysis).
METHODS: Embase, congress abstracts (ISPOR, HTAi, Cochrane Colloquium), Health Technology Assessment (HTA) guidance and ISPOR good practice guidelines were reviewed. Quantitative and qualitative studies (evaluation studies, observational studies, discussion papers and preprints) published after November 2022 were included.
RESULTS: Sixty-nine studies were included. Case studies (N=64) primarily examined LLM use for data extraction from electronic health records (EHRs) and clinical trials (31/64), title and abstract screening (14/64), and risk of bias assessment (RoB) (6/64). Most studies assessed LLMs’ ability to replicate pre-existing findings. Only six studies performed new research using LLMs (where any LLM validation was a secondary focus); none mentioned informing HTA submissions. Studies identified barriers to practical use (32/69), including inaccuracies in data extraction, particularly with complex data, and challenges in interpreting subjective questions (e.g., RoB assessment). Most studies emphasized LLMs’ potential; few recommended immediate implementation, due to existing limitations. No HTA guidance for LLM usage was identified. An ISPOR good practice guideline noted limited regulation for using LLMs to extract EHR data, indicating training and updates to their checklist would likely be necessary.
CONCLUSIONS: Research efforts centered on validating LLMs by replicating existing findings; <10% of case studies produced new outcomes research. LLM performance and reliability concerns remain a roadblock to implementation. Future research examining approaches to manually validate automated actions, for example using sampling, could help mitigate variations in LLM performance and improve efficiency gains while maintaining research quality. Usage guidance and standardized validation approaches are currently lacking; both would facilitate LLM use for research.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
MSR76
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Electronic Medical & Health Records, Literature Review & Synthesis
Disease
No Additional Disease & Conditions/Specialized Treatment Areas