Exploring the Development of Briefing Books for Early Scientific Advice Using Large Language Models: A Proof-of-Concept Study
Author(s)
Thaliffdeen R1, Radford M1, Guerra I2, Leite J3, Pullagurla S4, Shah V2, Falla E2, Shankar R2, Asukai Y5
1Gilead Sciences, Inc, Foster City, CA, USA, 2IQVIA, London, UK, 3IQVIA, Lisbon, Portugal, 4IQVIA, Bangalore, India, 5Gilead Sciences, Ltd., Stockley Park, UK
Presentation Documents
OBJECTIVES: While Large Language Models (LLMs) have demonstrated efficiencies for various HEOR-related deliverables, early scientific advice (ESA) briefing books (BBs) pose unique challenges for LLMs, as BBs are generated earlier in a product’s lifecycle when evidence is limited. Additionally, BBs require strategic thinking to develop a company’s position and justification on questions for HTA. This proof-of-concept study aimed to assess LLM-based generation of BBs for ESA.
METHODS: Sections on approaches for trial comparator selection, economic modeling, and indirect treatment comparison (ITC) were created using GPT-4 via python API. To supplement the model’s pre-trained knowledge, retrieval-augmented generation (RAG) was used for content generation and answer retrieval. The knowledgebase included the trial protocol, internal strategic documents, previous HTA appraisals, HTA BB guidance, and published trial results in similar indications. Key evaluation metrics were output quality and human-led effort needed for revisions.
RESULTS: Overall, the LLM provided responses in the requested tone and format by correctly summarizing top-level knowledgebase information. However, each section was missing key information and sufficient depth for a BB. The comparator choice section provided certain details related to the comparator arm’s pivotal trial and regional considerations but failed to comment on the evolving treatment landscape. With the modeling section, the LLM recapitulated the model summary provided, being unable to meaningfully expand upon the summary or provide specifics related to parameters such as discount rate. Finally, the ITC section identified relevant comparators but could not rigorously perform ITC feasibility, including heterogeneity assessment and the assumptions for a connected network.
CONCLUSIONS: The LLM, although successful in retrieving information from the knowledgebase, could not generate an HTA-grade BB. Improving the knowledgebase with relevant literature and clinical feedback, coupled with expert prompting guidance, could enhance the LLM's performance. However, this improvement would come at the cost of considerable human effort.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
OP18
Topic
Health Technology Assessment, Organizational Practices
Topic Subcategory
Best Research Practices, Decision & Deliberative Processes, Industry, Value Frameworks & Dossier Format
Disease
Drugs, No Additional Disease & Conditions/Specialized Treatment Areas