Frequency and Type of Errors in Data Extraction Within Systematic Literature Reviews
Author(s)
Rice H1, Roussi K2, King E1, Martin A1
1Crystallise, Stanford-le-Hope, UK, 2Crystallise, Basildon, ESS, UK
Presentation Documents
OBJECTIVES: To examine the frequency and type of errors in data extraction (DE) within systematic literature reviews (SLRs).
METHODS: We analysed DE checking sheets from eight SLRs varying in topic and size conducted previously by our organisation between 2022 and 2023. The proportion of papers with errors in each SLR and the total number and type of errors per paper and per project were calculated. A score-based approach was devised to assess the difficulty of extraction, based on the publication type (full text/ abstract), whether the file was editable, whether it was highlighted ahead of DE, the number of pages and whether it was a new or updated SLR.
RESULTS: In total, 59% of papers included in all SLRs had at least one error at initial DE that was corrected during checking. Data were extracted correctly in 85.52% of 96,675 data points evaluated. The most common error was misidentification (8.23%), when additional relevant data from the paper were identified by the checker. Incorrect data, where the original value was incorrect, occurred in 2.26% of data points. Other changes were made to the DE by the checker in 3.89% of data points (e.g. inserting comments). Data misidentification (e.g. correct value but in the wrong column) occurred in 0.49% of data points. No obvious pattern was found between the duration of DE or the paper DE difficulty score and the DE error rate.
CONCLUSIONS: Data extraction is an essential part of SLRs, however, it is error-prone. Other studies have identified DE error rates of 0.5% to 15% and at least one error in 66.8% to 99.3% of papers in published SLRs so the >85% accuracy in our process before pre-publication checking compares favourably. Methods to clarify all outcomes to be extracted before DE starts should be explored to reduce omissions.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
SA83
Topic
Study Approaches
Topic Subcategory
Literature Review & Synthesis
Disease
No Additional Disease & Conditions/Specialized Treatment Areas