Use of a Fine-Tuned, Clinical, Bidirectional Transformer (BERT) Large Language Model (LLM) to Classify the Patient and Caregiver Voice Through Their Social Media Health Posts: An Example in Non-Small Cell Lung Cancer (NSCLC)

Author(s)

Rai AK1, Ikoro V2, Berger A3, Shah M4, Gallie H5
1Evidera, Overland Park, KS, USA, 2Evidera, Hammersmith, London, UK, 3Evidera, Bethesda, MD, USA, 4GSK, Dexter, MI, USA, 5GSK, Brentford, Middlesex, UK

Presentation Documents

OBJECTIVES: Social media data can be leveraged to provide insights into the experiences and burden of disease among patients and their families and caregivers. Consequently, it is important to be able to accurately classify their voices appropriately. We examined the ability of Clinical BERT, a novel application of a large language model, to classify social media texts by person-type

METHODS: The Clinical BERT model was trained, and subsequently underwent extensive fine-tuning, using a manually annotated and de-identified dataset of social media posts specifically curated to capture the distinct linguistic patterns of three distinct person-types (i.e., “patients”, “caregivers”, “others”); nuanced language differences essential for distinguishing between these three constituencies were made available to the model. Performance of the BERT model was contrasted against traditional classification methods

RESULTS: A total of 39,686 posts were identified from seven sites, including 29,391 from patients, 10,216 from caregivers, and 79 from others. Use of the fine-tuned BERT model to classify posts by person-type resulted in an F1 score of 0.91 vs, traditional methods, which yielded F1 scores ranging between 0.74 and 0.83 (p<0.05 for all comparisons)

CONCLUSIONS: Our findings indicate that the fine-tuned Clinical BERT large language model significantly outperforms traditional methods in the ability to classify social media posts by stakeholder group, thereby enabling a deeper and more comprehensive understanding of experiences and healthcare challenges faced by patients, caregivers, and others. This targeted approach to classifying social media texts by specific stakeholder voice(s) may increase the potential of text analysis to inform healthcare policies and practices

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Code

MSR50

Topic

Methodological & Statistical Research, Patient-Centered Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Patient-reported Outcomes & Quality of Life Outcomes

Disease

No Additional Disease & Conditions/Specialized Treatment Areas, Oncology

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×