Quality Appraisal of Randomized Controlled Trials Using Robins 2.0 Tool: A Case Study on Comparing the Performance of ChatGPTv4.0 With a Human Reviewer
Author(s)
Ohra S1, Siddiqui MK2, Siddiqui MT1, C RR1, Gupta J3
1EBM Health Consultants, New Delhi, DL, India, 2EBM Health, New Delhi, DL, India, 3EBM Health, Cleckheaton, West Yorkshire, UK
OBJECTIVES: The integration of artificial intelligence has the potential to simplify the systematic literature review (SLR) process including the risk of bias (RoB) assessment, thereby reducing the time and human efforts required. This study aimed to evaluate the performance of ChatGPTv4.0 in assessing the RoB in randomized controlled trials (RCTs) versus a trained human reviewer using the ROBINS-2.0 checklist.
METHODS: A list of RCTs was randomly chosen from an existing SLR and each trial was anonymized to ensure an unbiased assessment by ChatGPT and the human reviewer. We developed standardized prompts using an iterative process including explicit instructions, the ROBINS-2.0 guidance document, and a response template to capture the responses. The primary outcome was the level of agreement between ChatGPT and the human reviewer in the RoB assessment using agreement statistics (Kappa).
RESULTS: We piloted the prompt using five studies and further expanded the assessment to another 10 studies. After standardization of the prompt, the meantime, in minutes, to complete RoB assessment was significantly shorter with ChatGPT compared to the human reviewer (6.2 mins/study vs 20 mins/study, a 69% reduction). Our assessment indicated that the level of agreement between ChatGPT and the expert reviewer across the five domains ranged between 33.33% to 93.33%. A comparison of the overall assessment indicated an agreement level of 70% with Kappa statistics of 0.19 (p=0.08). In the randomization assessment domain, the response for the method of randomization was not correctly judged by ChatGPT and it required additional input to rectify the responses. Within the outcome measurement domain, more subjective questions required further examination by a human reviewer.
CONCLUSIONS: ChatGPT has the potential to speed up the risk of bias assessment. Nevertheless, it is important to note that using ChatGPT as a complementary reviewer would be beneficial rather than solely relying on its judgment to perform the RoB assessment.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
PT24
Topic
Methodological & Statistical Research, Organizational Practices, Study Approaches
Topic Subcategory
Academic & Educational, Artificial Intelligence, Machine Learning, Predictive Analytics, Best Research Practices, Literature Review & Synthesis
Disease
Drugs, Oncology