Using Retrieval Augmented Generation (RAG) and Agents' Workflow for Protocol Authoring
Author(s)
Teixeira B1, Gao C2, Wang C2, Malhi A1, Stein D1
1Bristol Myers Squibb, Uxbridge, London, UK, 2Bristol Myers Squibb Company, Princeton, NJ, USA
OBJECTIVES: Research protocols are a requirement for all non-interventional research (NIR) studies but are laborious to develop despite their standard structure. With the rapid advancement of Generative artificial intelligence (GenAI), we aimed to evaluate the utility of deploying a large language model (LLM)-backed protocol authoring tool to reduce the time spent writing protocols and to stabilize and/or improve content quality.
METHODS: A protocol authoring tool using OpenAI, RAG and agent workflow was developed. The agent workflow was chosen to incorporate multiple LLMs with different roles (drafter, writer, and reviewer) that interact with each other and the human user to mimic the protocol development process solely managed by humans. A vector embedding library was created using all final NIR protocols generated across the company since 2019. The tool was used to generate the “statistical analysis” and “limitations” sections of NIR protocols, and the subsequent time savings, and content quality were measured.
RESULTS: The tool enabled more rapid generation of the statistical analysis and limitations sections of the protocol when compared with traditional development by humans, with time savings estimated at around 80%. The quality of the content generated was superior following the interaction of LLMs via agent workflow vs. single prompt generation. The agents produced concepts and ideas the human had not considered which enhanced the overall content. However, in some instances, the content became overcomplicated, highlighting the importance of human user input to the feedback loop.
CONCLUSIONS: The use of an OpenAI, RAG and agent workflow protocol authoring tool enhanced the speed and quality of NIR protocol development and could become a standard method in which protocols and other scientific documents are developed. The next steps are to undertake further validation and generate more accurate estimates of time and cost savings that companies can glean by implementing such tools.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
SA102
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas