Disease and Treatment Characteristics of Hodgkin's Lymphoma from Social Media Data: Application of Name Entity Recognition Natural Language Processing

Author(s)

Siddiqui ZA1, Pathan M2, Nduaguba S1, LeMasters T1, Scott VG3, Sambamoorthi U4, Patel J5
1West Virginia University, Morgantown, WV, USA, 2West Virginia University School of Pharmacy, Morgantown, WV, USA, 3West Virginia University, School of Pharmacy, Morgantown, WV, USA, 4University of North Texas Health Sciences Center, Denton, TX, USA, 5Temple University, Philadelphia, PA, USA

OBJECTIVES: Hodgkin's lymphoma (HL) is a rare malignancy of lymphocytes with an excellent prognosis. The objective of this study is to leverage social media data to study the disease and treatment characteristics of HL using natural language processing (NLP) method.

METHODS: We used X formerly Twitter API V2 developer portal to download posts (formerly Tweets) from January 2010, to October 2022. Annotation guidelines were developed from a literature and manual review of 500 posts to create a gold standard dataset of 2,000 posts divided into training (1,200), testing (300), and validation (500) sets. The training dataset was used to develop the named entity recognition (NER) NLP application, and testing and validation were used to assess the performance of the NLP application.

RESULTS: After data cleaning and excluding 500 posts for annotation guidelines and 2,000 posts for application development, 78,311 posts were analyzed. Model performance was good in terms of precision (87%), recall (86%) and accuracy (86%). We identified 2,339 (70%) posts mentioning classical HL as the prominent type. A total of 17,177 posts discussed HL stages and progression, with 4,422 (25.74%) indicating a complete cure, 2,545 (14.82%) advanced stages, and 1,915 (11.15%) relapsed/refractory lymphoma. Additionally, of the 2,311 posts discussing co-occurring diseases alongside HL, the majority, 919 (39.77%) were secondary cancers. A total of 20,013 posts mentioned treatments, with chemotherapy (6,194) and immunotherapy (5,047) being the most common. Frequently cited treatments included combination chemotherapy (1,180 mentions), PD-1 checkpoint inhibitors (2,134 mentions), and CD30 targeting antibody-drug conjugates (2,114 mentions).

CONCLUSIONS: The study underscores the value in leveraging social media data using NLP methods to study patient conversations about HL, offering insights into disease characteristics and treatment trends.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)

Code

RWD104

Topic

Epidemiology & Public Health, Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Public Health

Disease

Oncology

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×