OBJECTIVES: To describe an innovative methodology that leverages
online reviews as a source of real-world data to understand disease state, patient preferences, and comparisons of products to treat medical conditions.
METHODS:
Data from nine products to treat bacterial vaginosis (BV) were scraped from Drugs.com, WebMD.com, and Amazon.com. We (1) discuss ways to address pharmacovigilance and ethical concerns; (2) summarize how the data were collected, processed, and cleaned; (3) define the tokenization and determination of narrative segments; (4) describe the five lexicon-based algorithms used for sentiment analysis (i.e., sentimentr, affin, bing, syuzhet, NRC); (5) explain the quantitative analyses conducted with five-star ratings, user attributes, and sentiment data, and (6) illustrate the qualitative analytic approach, including inductive and query-focused coding.
RESULTS:
Across all products, 3,891 reviews were collected for analysis (245 reviews were ultimately excluded for ineligibility). A relational SQL database was used to store and retrieve the data for analysis in R. Products included five Food and Drug Administration guideline recommended drugs and four over-the-counter supplements. The scraped information included the product name, formulary, route of administration, user/reviewer attributes, 5-star ratings, and free text review data. All sentiment scores and 5-star ratings were significantly positively correlated. Visualizations, univariate summaries, and bivariate comparisons depicted patient-preferred products. Sentiment analysis scores and scatterplots revealed patient likes and dislikes regarding medication effectiveness, use, adherence, product characteristics, side effects, and value. Qualitative data included themes on the disease state, its impact on relationships, and patient interactions with healthcare providers.
CONCLUSIONS:
Online reviews of products used to treat medical conditions are a rich source of real-world data. Analyzing these data is a novel alternative to patient interviews and focus groups. The methods described here have broad application across diseases, new and emerging therapeutic areas, and for outcomes research evidence generation.