Disease and Treatment Characteristics of Hodgkin's Lymphoma from Social Media Data: Application of Name Entity Recognition Natural Language Processing
Author(s)
Siddiqui ZA1, Pathan M2, Nduaguba S1, LeMasters T1, Scott VG3, Sambamoorthi U4, Patel J5
1West Virginia University, Morgantown, WV, USA, 2West Virginia University School of Pharmacy, Morgantown, WV, USA, 3West Virginia University, School of Pharmacy, Morgantown, WV, USA, 4University of North Texas Health Sciences Center, Denton, TX, USA, 5Temple University, Philadelphia, PA, USA
OBJECTIVES: Hodgkin's lymphoma (HL) is a rare malignancy of lymphocytes with an excellent prognosis. The objective of this study is to leverage social media data to study the disease and treatment characteristics of HL using natural language processing (NLP) method.
METHODS: We used X formerly Twitter API V2 developer portal to download posts (formerly Tweets) from January 2010, to October 2022. Annotation guidelines were developed from a literature and manual review of 500 posts to create a gold standard dataset of 2,000 posts divided into training (1,200), testing (300), and validation (500) sets. The training dataset was used to develop the named entity recognition (NER) NLP application, and testing and validation were used to assess the performance of the NLP application.
RESULTS: After data cleaning and excluding 500 posts for annotation guidelines and 2,000 posts for application development, 78,311 posts were analyzed. Model performance was good in terms of precision (87%), recall (86%) and accuracy (86%). We identified 2,339 (70%) posts mentioning classical HL as the prominent type. A total of 17,177 posts discussed HL stages and progression, with 4,422 (25.74%) indicating a complete cure, 2,545 (14.82%) advanced stages, and 1,915 (11.15%) relapsed/refractory lymphoma. Additionally, of the 2,311 posts discussing co-occurring diseases alongside HL, the majority, 919 (39.77%) were secondary cancers. A total of 20,013 posts mentioned treatments, with chemotherapy (6,194) and immunotherapy (5,047) being the most common. Frequently cited treatments included combination chemotherapy (1,180 mentions), PD-1 checkpoint inhibitors (2,134 mentions), and CD30 targeting antibody-drug conjugates (2,114 mentions).
CONCLUSIONS: The study underscores the value in leveraging social media data using NLP methods to study patient conversations about HL, offering insights into disease characteristics and treatment trends.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
RWD104
Topic
Epidemiology & Public Health, Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Public Health
Disease
Oncology