Leveraging Generative Artificial Intelligence for Assessing the Quality of Network Meta-Analysis: Methodological Considerations and Early Findings
Speaker(s)
Nevière A1, Friedrich G2, Papadimitropoulou K3, Le Nouveau P4, Gauthier A5
1Amaris Consulting, Saint Herblain, 44, France, 2Amaris Consulting, Barcelona, Spain, 3Amaris Consulting, Lyon, France, 4Amaris Consulting, Nantes, 44, France, 5Amaris Consulting, London, UK
Presentation Documents
OBJECTIVES: The emergence of large language models (LLMs) to complement human efforts has created potential for evaluating published scientific literature. In evidence synthesis, a few examples of how LLMs can automate tasks have been published. In this study, we employ Generative Pre-trained Transformer 4 omni (GPT-4o) to critically assess the quality of network meta-analyses (NMAs).
METHODS: We considered a checklist adapted by the PRISMA-NMA guidelines and Pacou et al., 2016 to evaluate published NMAs. The checklist includes 12 questions to appraise methodological rigor, 9 questions on description of results, and 3 questions on the discussion. Each question was ranked based on its importance for the quality of the NMA as low, medium, or high.We piloted this checklist on an NMA of biologic therapies for Crohn’s disease (CD) by generating multiple prompts to instruct the LLM to answer the questions and extract relevant parts verbatim.The authors independently assessed in total 100 published NMAs, and the performance of the LLM was evaluated based on its degree of agreement with the human assessment.
RESULTS: For the CD NMA, there was full agreement between human and the assessment of GPT-4o in 13 of the 24 questions and complete disagreement in 4. Across all NMAs, the most challenging component for the LLM was the methodological items, lacking details regarding which statistical tests were used (e.g., NMA methods, statistical test for heterogeneity, etc.). Disagreements were observed regarding the validation of data inputs and when the publication referred to supplementary materials (not provided). Qualitative review of results was almost aligned with experts’ findings.
CONCLUSIONS: LLMs could be used to support the assessment of the quality of published NMA. However, the prompts should be prepared cautiously to obtain accurate and detailed results. For copyright concerns, only free publication can be used.
Code
MSR184
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Meta-Analysis & Indirect Comparisons
Disease
Gastrointestinal Disorders, Oncology, Respiratory-Related Disorders (Allergy, Asthma, Smoking, Other Respiratory)