Evaluating a Large Language Model Approach for Full-Text Screening Task in Systematic Literature Reviews With Domain Expert Input
Author(s)
Huang WH, Poojary V, Hofer K, Fazeli MS
Evidinno Outcomes Research Inc., Vancouver, BC, Canada
Presentation Documents
OBJECTIVES: To assess the performance of a large language model (LLM) in conducting full-text screening for systematic literature reviews (SLRs) with domain expert input.
METHODS: We developed a custom system using the GPT-4o model for full-text screening automation. The system was evaluated on 420 studies across 10 SLRs, previously screened by two independent human reviewers (gold standard). A domain expert translated the PICO criteria into machine-understandable formats. The LLM then performed screening using these criteria without further human intervention. Performance metrics included sensitivity, positive predictive value (PPV), and negative predictive value (NPV).
RESULTS: The LLM demonstrated high NPV (99.4%, 177/178) in correctly identifying studies to be excluded and high sensitivity (99.0%, 96/97) in correctly identifying studies to be included. The PPV was 39.6% (96/242). The system showed a tendency for under-exclusion, with 146 false positives and only had 1 false negative. The LLM provided valid reasons for study exclusion or inclusion based on the given PICO criteria.
CONCLUSIONS: The LLM approach, with domain expert input limited to initial PICO criteria preparation, shows promising capabilities in SLR screening, particularly in correctly identifying studies for exclusion. It achieves high NPV and sensitivity comparable to human reviewers. While the system excels at exclusion decisions, its tendency for under-exclusion suggests that human review of non-excluded studies remains beneficial. This approach has the potential to significantly reduce the manual workload in the SLR process, especially in the initial screening of large volumes of studies.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
MSR217
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas