Quality Appraisal of Randomized Controlled Trials Using Robins 2.0 Tool: A Case Study on Comparing the Performance of ChatGPTv4.0 With a Human Reviewer

Author(s)

Ohra S¹, Siddiqui MK², Siddiqui MT¹, C RR¹, Gupta J³
¹EBM Health Consultants, New Delhi, DL, India, ²EBM Health, New Delhi, DL, India, ³EBM Health, Cleckheaton, West Yorkshire, UK

OBJECTIVES: The integration of artificial intelligence has the potential to simplify the systematic literature review (SLR) process including the risk of bias (RoB) assessment, thereby reducing the time and human efforts required. This study aimed to evaluate the performance of ChatGPTv4.0 in assessing the RoB in randomized controlled trials (RCTs) versus a trained human reviewer using the ROBINS-2.0 checklist.

METHODS: A list of RCTs was randomly chosen from an existing SLR and each trial was anonymized to ensure an unbiased assessment by ChatGPT and the human reviewer. We developed standardized prompts using an iterative process including explicit instructions, the ROBINS-2.0 guidance document, and a response template to capture the responses. The primary outcome was the level of agreement between ChatGPT and the human reviewer in the RoB assessment using agreement statistics (Kappa).

RESULTS: We piloted the prompt using five studies and further expanded the assessment to another 10 studies. After standardization of the prompt, the meantime, in minutes, to complete RoB assessment was significantly shorter with ChatGPT compared to the human reviewer (6.2 mins/study vs 20 mins/study, a 69% reduction). Our assessment indicated that the level of agreement between ChatGPT and the expert reviewer across the five domains ranged between 33.33% to 93.33%. A comparison of the overall assessment indicated an agreement level of 70% with Kappa statistics of 0.19 (p=0.08). In the randomization assessment domain, the response for the method of randomization was not correctly judged by ChatGPT and it required additional input to rectify the responses. Within the outcome measurement domain, more subjective questions required further examination by a human reviewer.

CONCLUSIONS: ChatGPT has the potential to speed up the risk of bias assessment. Nevertheless, it is important to note that using ChatGPT as a complementary reviewer would be beneficial rather than solely relying on its judgment to perform the RoB assessment.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)

Code

PT24

Topic

Methodological & Statistical Research, Organizational Practices, Study Approaches

Topic Subcategory

Academic & Educational, Artificial Intelligence, Machine Learning, Predictive Analytics, Best Research Practices, Literature Review & Synthesis

Disease

Drugs, Oncology

Explore Related HEOR by Topic

Methodology

Presentation