Artificial Intelligence in HEOR
Gloria Macia, MSc, London School of Economics and Political Science, UK, F. Hoffmann-La Roche Ltd, Switzerland; Joshua Ray, MSc, MBA, F. Hoffmann-La Roche Ltd, Switzerland
Artificial intelligence (AI) was the new buzzword at ISPOR’s Europe Meeting 2023. While real-world evidence (RWE), the last big hype, still dominated the conference, with over 8 sessions, 1 short course on machine learning, and numerous posters on the topic, AI is rapidly catching up. The question is, what comes next for AI in health economics and outcomes research (HEOR)?
In this opinion article, we share our perspectives on what we think is a realistic potential trajectory of AI in HEOR in the coming years.
A particularly promising HEOR application of AI, and more specifically large language models (LLMs), is automating systematic literature reviews (SLR) and meta-analysis.3 While there exist already several software solutions that facilitate AI-powered SLR tools, these use small machine learning models focused on the screening step—that is, they convert a publication’s text into vectors (feature extraction) that are then used to rank publications on relevance (text classification). Simpler models include TF-IDF (weights words based on importance within documents) and Doc2Vec (creates vector representations for documents), used for feature extraction. These features can then be fed into classifiers like logistic regression or random forest for tasks like topic classification. In contrast, LLMs are more general-purpose models but much more expensive to train and run due to their size. Traditional models usually have up to thousands of parameters, whereas LLMs can have anywhere from hundreds of billions to trillions of parameters. We envision SLR software to increasingly integrate LLMs for purposes other than screening. One of these purposes will most likely be deriving the search strategy query from the PICO (Population, Intervention, Comparison, Outcome) and then translating it to the several available databases.
"A particularly promising HEOR application of AI, and more specifically large language models, is automating systematic literature reviews and meta-analysis."
Another use is likely to be data extraction. LLMs can perform optical character recognition (OCR), a task that traditional models often struggle with due to their focus on structured data. This is crucial for handling publications in PDF format, where unstructured data, such as tables and complex formatting, are common. Another challenge in data extraction today is the varied presentation of similar information, such as different units or slightly different measurement methods for the same underlying concept. LLMs could address this by actively managing and ensuring uniformity in the extracted data, thereby significantly aiding the researcher’s work. We also anticipate that SLR software will allow researchers to upload their own data extraction template. LLMs would then efficiently handle the task of populating this template, facilitating the developer to more easily make the extracted data in a downloadable format for other users. While we foresee LLMs will automate much of the process reducing work time, in our view human involvement in the loop will still be crucial, primarily focusing on quality control for verifying nuanced information and addressing potential biases and hallucinations. This collaborative approach, combining the efficiency of LLM automation with human expertise, could mitigate biases and enhance the overall quality and relevance of SLRs to ultimately ensure the job of the AI algorithm is as good as the one a human researcher could have performed. Although this is an emerging field, early empirical research on data extraction for evidence synthesis using different LLMs shows promising results. While achieving human-level performance remains a challenge today1,2 we foresee these early advancements to trigger amendments on SLRs guidelines to achieve greater transparency on the algorithms used.3
Another area where we see the potential of AI is in the development of economic models. Traditionally, health economic models have been constructed using specialized commercial software or spreadsheet tools, such as TreeAge or Microsoft Excel, respectively. However, the limitations of these tools, particularly in handling complex analyses, have raised concerns about the credibility and relevance of the assessments. In contrast, several experts advocate for the use of modern programming languages to reduce errors inherent in spreadsheet models.3,4,5 Although some see the adoption of modern programming languages in the HTA environment as pivotal, we argue that a barrier for many HEOR practitioners is their own programming knowledge. Thanks to LLMs, this barrier has now been lowered as they possess a remarkable ability to generate human-quality code in various programming languages including R and Python. We foresee AI pair programmers like Github Copilot to become widely adopted.6 As of today, Github Copilot already offers an extension for most code editors in Python and is available as an opt-in integration with RStudio.7 Alternatively, for the ones who prefer to continue building their models in Excel, since Github Inc (GitHub) is a subsidiary of Microsoft Corp, the company has also made Copilot available in Excel.8 Needless to say, AI pair programmers can also help generate code outside of the context of health economic modeling such as preparing data to run a network meta-analysis, writing the code of the network metal-analysis itself or visualizing its results for a scientific publication and broader dissemination.
"Another area where we see the potential of AI is in the development of economic models."
Amidst the somewhat sensationalistic yet valid concerns of AI displacing human jobs, we strongly believe HEOR professionals will not be replaced by AI. From our perspective, the real professional impact they are likely to experience lies in how they adapt their individual skillset to utilize these technologies. In our view, leveraging AI technologies is somewhat akin to the historical moment when spreadsheet software like Excel emerged as a digital tool that replicated and significantly enhanced the functionality of paper-based accounting systems, widening the gap of opportunities between tech-savvy individuals and those resistant to technological integration. While in the past lots of bookkeepers and accounting clerks were replaced by spreadsheet software, the number of jobs for accountants increased.9 In a similar vein, the integration of LLMs promises, in our belief, a substantial boost in productivity, emphasizing the need for professionals to embrace continuous learning to stay competitive in a rapidly evolving landscape.
HEOR professionals should have a high-level understanding of how large language models (LLMs) work before they can be used correctly. A recent publication concluded that current AI tools like ChatGPT did not match the quality of standard targeted literature review methods.10,11 According to the authors, ChatGPT failed to identify a great number of publications that should have been included in an SLR and, more worryingly, suggested others that did not exist. These results are flawed because the tool chosen is not fit for purpose: ChatGPT is not meant to replace a database like Embase. The reason ChatGPT can search some databases but not Embase is because these have enabled a free programmatic way of interaction named API (Application Programming Interface). In these cases, a large language model like ChatGPT can act as an agent and search specific databases by transforming the prompt of the user into a correctly formatted API query. Searching on Embase with AI is possible but researchers would have first required an API license with Elsevier. The message is that while it is important that HEOR practitioners embrace these new technologies, we all should be mindful to do the necessary background research, making efforts to understand which technologies are appropriate for their intended use. HEOR practitioners need to be aware of the risks of these new tools as well. A good example is Scite, an AI tool that helps researchers by showing how articles are cited, indicating if the citation supports or contradicts the claim.12 As we envision such AI tools to keep gaining popularity in HEOR, it is worth pointing out its risks. Scite’s metrics, like the total number of citations, can make already-cited papers in HEOR even more popular. This makes it increasingly challenging for new ideas to gain attention. This phenomenon is often referred to as the “echo chamber” effect and it is one of the main risks of AI recommendation algorithms on social media platforms, which tend to show users content similar to what they have previously engaged with or liked. As a result, users may be exposed to a limited range of ideas, reinforcing their existing beliefs and preferences. This can contribute to the amplification of popular or already-circulated ideas, potentially overshadowing new or diverse information.
"The integration of large language models promises a substantial boost in productivity, emphasizing the need for professionals to embrace continuous learning to stay competitive in a rapidly evolving landscape."
Our second point was the need for AI to be integrated into an overall strategy with a clear return on investment proposition. While a life sciences company could decide to develop their own AI-enabled SLR solution, venturing into the development of digital products rather than discovering new medicines should be a very conscious choice. Developing software requires time, effort, and specific expertise. Deviations from the core business models should be a careful long-term investment as it may prove more efficient to purchase these technologies directly from software vendors.
While SLRs or writing programming code for cost-effectiveness models are definitely very interesting uses of AI, other straightforward uses of LLMs like writing assistance or translation deserve some attention as well. While the challenges of adapting a broader evidence package to an HTA local body go beyond translation, this is still an easily implementable efficiency gain of LLMs to help speed up submissions.
Because most companies are cautious about sharing their confidential data with the tech companies behind AI tools, involving the legal department early is crucial. The legal team can navigate the complex landscape of data privacy regulations, intellectual property concerns, and liability clauses within contracts. Ensuring contracts clearly define data ownership, usage rights, and liability protections for both parties is vital. Secondly, if work is partially externalized to a vendor, choosing the right vendor is equally important. Companies should scrutinize potential vendors’ ability to offer robust data security guarantees. This includes secure data transfer protocols, reputable cloud storage solutions, and regular audits of their security practices.
"While it is important that HEOR practitioners embrace these new technologies, we all should be mindful to do the necessary background research, making efforts to understand which technologies are appropriate for their intended use."
Finally, while we believe that LLMs will soon become a productivity tool widely available and seamlessly integrated into web-based email services, word processors, and spreadsheets, they are unlikely to remain free. In the future, they may become available to paid customers only. The current development of LLMs has been largely supported by venture capital investments. The competing companies behind them are operating at a loss due to the substantial costs associated with training and running these models, which require massive amounts of data and computational resources.13 Hence, the reason tools like ChatGPT offer a wide range of functionalities for free is that such companies are betting on the potential of LLMs to revolutionize a wide range of industries and are willing to take a long-term view of their investments. Consequently, users should expect and be prepared for LLMs and research tools leveraging LLMs like Litmaps or ResearchRabbit to become more expensive.14,15
In conclusion, the integration of AI and more specifically LLMs is likely to have many applications in the field of HEOR. LLMs promise significant advancements, particularly in automating tasks like systematic literature reviews and writing code for economic models. However, realizing this potential necessitates a nuanced approach. A thorough understanding of both AI’s capabilities and limitations is essential if the benefits of these new technologies to deliver a more rapid and robust evidence base to inform better healthcare resource allocation decisions are to be realized. Practitioners of HEOR must make efforts to understand the underlying functionality of these new technologies, alongside careful consideration of data privacy and intellectual property concerns. Collaboration with legal professionals is crucial to ensure a responsible AI implementation strategy that should contribute to an organization’s existing objectives. We would like to encourage HEOR professionals to embrace AI thoughtfully as the field evolves rapidly, and we believe it can unlock substantial benefits for their present work, ultimately contributing to enhanced healthcare outcomes.
References
1. Gartlehner G, Kahwati L, Hilscher R, et al. Data extraction for evidence synthesis using a large language model: a proof-of-concept study. medRxiv. Posted online October 3, 2023. Accessed February 7, 2024. https://doi.org/10.1101/2023.10.02.23296415
2. Guerra I, Gallinaro J, Rtveladze K, Lambova A, Asenova E. Can artificial intelligence large language models such as generative pre-trained transformers be used to automate literature reviews? Poster presented at: ISPOR Europe 2023; November 2023; Copenhagen, Denmark. https://www.ispor.org/docs/default-source/euro2023/20231025isporaislrposterv1-0rtveladze-et-al131667-pdf.pdf?sfvrsn=f656268d_0
3. Charrois TL. Systematic reviews: what do you need to know to get started? Can J Hosp Pharm. 2015;68(2):144-148.
4. Smith RA, Schneider PP, Mohammed W. Living HTA: automating health economic evaluation with R. Wellcome Open Res. 2022 Oct 11;7:194.
5. Incerti D, Thom H, Baio G, Jansen JP. R You still using excel? The advantages of modern software tools for health technology assessment. Value Health. 2019;22(5):575-579.
6. GitHub Copilot. Accessed February 7, 2024. https://github.com/features/copilot
7. RStudio User Guide. GitHub Copilot. Accessed February 7, 2024. https://docs.posit.co/ide/user/ide/guide/tools/copilot.html
8. Microsoft. Copilot in Excel help & learning. Accessed February 7, 2024. https://support.microsoft.com/en-us/copilot-excel
9. Goldstein J. How the Electronic Spreadsheet Revolutionized Business. Planet Money. Accessed March 3, 2024. https://www.npr.org/2015/02/27/389585340/how-the-electronic-spreadsheet-revolutionized-business
10. Baisley W, Perriello L, Shoushi G, Nguyen K, Lahue B. Non-systematic literature reviews: can AI enhance current methods? Accessed March 3, 2024. https://www.ispor.org/docs/default-source/euro2023/isporeuaiposteralkemi27oct2023-final133209-pdf.pdf
11. ChatGPT. Accessed March 3, 2024. https://chat.openai.com
12. scite.ai. AI for Research. Accessed March 3, 2024. https://scite.ai
13. Oremus W. AI chatbots lose money every time you use them. That is a problem. Washington Post. June 21, 2023. Accessed February 7, 2024. https://www.washingtonpost.com/technology/2023/06/05/chatgpt-hidden-cost-gpu-compute/
14. Litmaps. Your Literature Review Assistant. Accessed February 7, 2024. https://www.litmaps.com/
15. ResearchRabbit. ResearchRabbit. Accessed February 7, 2024. https://www.researchrabbit.ai
Conflicts of Interests
The authors declare no financial or personal relationships with the entities behind the mentioned proprietary AI tools. The choice of these tools is based on independent use and judgement with the purpose of making the discussion more tangible.