Opportunities and Limitations in the Use of AI to Assist With Data Extraction in Systematic Literature Reviews

Author(s)

Roussi K1, Rice H2, King E2, Martin A2
1Crystallise, Basildon, ESS, UK, 2Crystallise, Stanford-le-Hope, UK

OBJECTIVES: Data extraction (DE) is a time-consuming and error-prone component of a systematic literature review (SLR). We aimed to assess technical factors affecting DE efficiency by humans and evaluate how far AI tools can increase DE accuracy and speed.

METHODS: Data on the study design, size, objective, inclusion/exclusion criteria, key findings and baseline characteristics (age, gender, ethnicity) were manually extracted from 10 conference abstracts, 10 editable full-text PDFs and 6 non-editable full-text PDFs (i.e. scanned/ photocopied version of the original document) by an experienced systematic reviewer. Free versions of the Elicit and Perplexity AI platforms and a subscription version of aiPDF were asked to extract the same data into the same Excel DE template or equivalent table.

RESULTS: The duration of manual data extraction increased with increasing complexity of the study type/format (conference abstract = mean 10.18 minutes each, editable PDF = 18.13 minutes, non-editable PDF = 28.0 minutes). Elicit allowed the selection of bespoke outcome categories for data extraction, which was generally accurate and performed within seconds. However, outcome selection had to be replicated manually for each study, and export to a CSV file was only possible with paid subscription. Perplexity required a prompt to specify the data to be extracted. Some data were extracted correctly, but other parts of the output were fabricated. The aiPDF platform was not able to complete the DE, due to an inability to cope with ambiguities in the DE template.

CONCLUSIONS: Despite the quick evolution of AI tools, there are still limitations pertaining to their use, delaying their effective incorporation into the SLR process. The results of this work highlight the need to reevaluate the structure and layout of current data extraction sheets, into a comprehensive and clearer format, which can be more easily understood by both online AI tools and human researchers.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

SA63

Topic

Study Approaches

Topic Subcategory

Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×