Large Language Models for Outcomes Research: A Targeted Review

Author(s)

Dolin O, Lim V, Hepworth T, Gonçalves-Bradley D, Langford B
Symmetron Limited, London, LON, UK

Presentation Documents

ISPOREurope24_Dolin_MSR76_POSTER143516.pdf

OBJECTIVES: Interest in using large language models (LLMs) for outcomes research has increased in recent years; however, feasibility of LLM integration within research workflows remains unclear. A targeted review was conducted to identify case studies and guidance on LLM usage in outcomes research (including qualitative/quantitative evidence synthesis and real-world data analysis).

METHODS: Embase, congress abstracts (ISPOR, HTAi, Cochrane Colloquium), Health Technology Assessment (HTA) guidance and ISPOR good practice guidelines were reviewed. Quantitative and qualitative studies (evaluation studies, observational studies, discussion papers and preprints) published after November 2022 were included.

RESULTS: Sixty-nine studies were included. Case studies (N=64) primarily examined LLM use for data extraction from electronic health records (EHRs) and clinical trials (31/64), title and abstract screening (14/64), and risk of bias assessment (RoB) (6/64). Most studies assessed LLMs’ ability to replicate pre-existing findings. Only six studies performed new research using LLMs (where any LLM validation was a secondary focus); none mentioned informing HTA submissions. Studies identified barriers to practical use (32/69), including inaccuracies in data extraction, particularly with complex data, and challenges in interpreting subjective questions (e.g., RoB assessment). Most studies emphasized LLMs’ potential; few recommended immediate implementation, due to existing limitations. No HTA guidance for LLM usage was identified. An ISPOR good practice guideline noted limited regulation for using LLMs to extract EHR data, indicating training and updates to their checklist would likely be necessary.

CONCLUSIONS: Research efforts centered on validating LLMs by replicating existing findings; <10% of case studies produced new outcomes research. LLM performance and reliability concerns remain a roadblock to implementation. Future research examining approaches to manually validate automated actions, for example using sampling, could help mitigate variations in LLM performance and improve efficiency gains while maintaining research quality. Usage guidance and standardized validation approaches are currently lacking; both would facilitate LLM use for research.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

MSR76

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Electronic Medical & Health Records, Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic

Methodology

Presentation