Automating Systematic Literature Review (SLR) Updates: A Comparative Validation Study of Artificial Intelligence (AI) Versus Human Screeners
Author(s)
Cichewicz A1, Pande A1, Borkowska K2, Mittal L3, Wittkopf P4, Slim M5
1Evidera, Waltham, MA, USA, 2Evidera, Cracow, Poland, 3Evidera, Bengaluru, India, 4Evidera, London, UK, 5Evidera, Montreal, QC, Canada
Presentation Documents
OBJECTIVES: Systematic reviewers are becoming increasingly inundated by the growing body of literature. The time and resource-intensive nature of conducting SLRs often result in searches being outdated by the time SLRs are completed. With market access strategies relying heavily on the most up-to-date evidence, there has been a growing interest in living SLRs and expediting the title/abstract screening process with AI-based algorithms. Therefore, we aimed to validate an AI algorithm against human reviewers for SLR updates.
METHODS: Robot Screener was trained on six SLRs evaluating clinical efficacy and safety (CES) or economic burden (EB). An 80% subset of records from each SLR formed the training set, with the remaining 20% constituting a testing set to simulate new records from an SLR update. AI screening decisions were compared with human dual-screening decisions (AI vs dual human). Differences in the mean recall, precision, and overall error rates between AI and human screeners were assessed using Mann-Whitney U-test.
RESULTS: Three CES (3,194 records [testing set=640]) and three EB (8,729 records [testing set=1729]) SLRs yielded comparable mean [SD] recall rates (AI: 0.82 [0.15] vs dual human: 0.75 [0.23]; p=0.59) and overall error rates (AI: 9.9% [5.9%] vs dual human: 7% [8.4%]; p=0.39). However, AI exhibited significantly lower precision rates (0.50 [0.15] vs 0.85 [0.16]; p=0.008). Similar trends were observed when analyses were stratified by SLR topic.
CONCLUSIONS: There were no significant differences between the AI and dual human screeners in recall and overall error rates. Dual screening with only human reviewers is error-prone at a rate comparable to that when AI was employed as a reviewer. Our study supports AI’s capability to expedite the screening process for SLR updates while emphasizing the need for continued model refinement to address precision limitations and enhance the overall model performance.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
MSR22
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas