Evaluating a Large Language Model Approach for Full-Text Screening Task in Systematic Literature Reviews With Domain Expert Input

Author(s)

Huang WH, Poojary V, Hofer K, Fazeli MS
Evidinno Outcomes Research Inc., Vancouver, BC, Canada

OBJECTIVES: To assess the performance of a large language model (LLM) in conducting full-text screening for systematic literature reviews (SLRs) with domain expert input.

METHODS: We developed a custom system using the GPT-4o model for full-text screening automation. The system was evaluated on 420 studies across 10 SLRs, previously screened by two independent human reviewers (gold standard). A domain expert translated the PICO criteria into machine-understandable formats. The LLM then performed screening using these criteria without further human intervention. Performance metrics included sensitivity, positive predictive value (PPV), and negative predictive value (NPV).

RESULTS: The LLM demonstrated high NPV (99.4%, 177/178) in correctly identifying studies to be excluded and high sensitivity (99.0%, 96/97) in correctly identifying studies to be included. The PPV was 39.6% (96/242). The system showed a tendency for under-exclusion, with 146 false positives and only had 1 false negative. The LLM provided valid reasons for study exclusion or inclusion based on the given PICO criteria.

CONCLUSIONS: The LLM approach, with domain expert input limited to initial PICO criteria preparation, shows promising capabilities in SLR screening, particularly in correctly identifying studies for exclusion. It achieves high NPV and sensitivity comparable to human reviewers. While the system excels at exclusion decisions, its tendency for under-exclusion suggests that human review of non-excluded studies remains beneficial. This approach has the potential to significantly reduce the manual workload in the SLR process, especially in the initial screening of large volumes of studies.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

MSR217

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×