A Methodological Approach Using Sentiment Analysis of Online Medical Platforms As a Real-World Data Source of Patient Experiences

Author(s)

Cimino A1, Culbertson C1, Watkins E2, Li J3, Wangeshi S1
1Rogue Scholar Consulting, Baltimore, MD, USA, 2Organon, Jersey City, NJ, USA, 3Organon, Mason, OH, USA

OBJECTIVES: To describe an innovative methodology that leverages online reviews as a source of real-world data to understand disease state, patient preferences, and comparisons of products to treat medical conditions.

METHODS: Data from nine products to treat bacterial vaginosis (BV) were scraped from Drugs.com, WebMD.com, and Amazon.com. We (1) discuss ways to address pharmacovigilance and ethical concerns; (2) summarize how the data were collected, processed, and cleaned; (3) define the tokenization and determination of narrative segments; (4) describe the five lexicon-based algorithms used for sentiment analysis (i.e., sentimentr, affin, bing, syuzhet, NRC); (5) explain the quantitative analyses conducted with five-star ratings, user attributes, and sentiment data, and (6) illustrate the qualitative analytic approach, including inductive and query-focused coding.

RESULTS: Across all products, 3,891 reviews were collected for analysis (245 reviews were ultimately excluded for ineligibility). A relational SQL database was used to store and retrieve the data for analysis in R. Products included five Food and Drug Administration guideline recommended drugs and four over-the-counter supplements. The scraped information included the product name, formulary, route of administration, user/reviewer attributes, 5-star ratings, and free text review data. All sentiment scores and 5-star ratings were significantly positively correlated. Visualizations, univariate summaries, and bivariate comparisons depicted patient-preferred products. Sentiment analysis scores and scatterplots revealed patient likes and dislikes regarding medication effectiveness, use, adherence, product characteristics, side effects, and value. Qualitative data included themes on the disease state, its impact on relationships, and patient interactions with healthcare providers.

CONCLUSIONS: Online reviews of products used to treat medical conditions are a rich source of real-world data. Analyzing these data is a novel alternative to patient interviews and focus groups. The methods described here have broad application across diseases, new and emerging therapeutic areas, and for outcomes research evidence generation.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)

Code

RWD119

Topic

Methodological & Statistical Research, Real World Data & Information Systems

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Data Protection, Integrity, & Quality Assurance

Disease

No Additional Disease & Conditions/Specialized Treatment Areas, Reproductive & Sexual Health

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×