NLP and Machine Learning to Automate Identification of Suspected Medication Errors from Real World Unstructured Narratives

Author(s)

Painter J¹, Haguinet F², Cranfield C³, Bate A³
¹GlaxoSmithKline, Raleigh, NC, USA, ²GlaxoSmithKline, Wavre, Belgium, ³GSK, London, UK

Presentation Documents

ISPOR2023_Painter_Poster126837.pdf

OBJECTIVES:

To use machine learning (ML) to determine whether or not unstructured text (e.g. safety reports) contains mention of a medication error (MedError) and if so, to further sub-classify those as error with stated adverse drug reaction, error without harm, intercepted error, or potential error. This classification is usually rules-based and requires manual review by trained safety scientists. ML models were built to automate this process, first identifying if narratives containing MedErrors or not, and then further identifying the correct sub-classification.

METHODS:

A balanced, random set of case narratives (N = 3,122) labelled by safety scientists as mentioning a MedError or not were collected for ML training. The narratives were processed using text vectorization, and all product names were masked to reduce potential bias. A Bernoulli naïve Bayes classifier was trained using 75% of the data, and 25% was held out to test model performance. Next, a multinomial naïve Bayes (MNB) classifier was built on 1,744 case narratives containing manually annotated sub-classifications. Performance was evaluated using cross-validation. Concordance with rules-based labels was then compared to a non-supervised clustering method using k-means for narratives not sub-classified.

RESULTS:

The binary classification for the detection of MedErrors had an F1-score of 86%. The MNB classifier for sub-classification had an F1- score of 66% averaged across sub-classes. Concordance between rules-based and unsupervised clustering had kappa of 0.05.

CONCLUSIONS:

Binary classification for the detection of MedErrors showed good performance, while sub-categorization with this data set was low, except for one sub-class, error with stated adverse drug reaction. Given the adjudged importance of accurate sub-categorization, the current value added is uncertain. More training data will be required in order to consider its potential use, while results suggest binary classification could be readily employed for screening unstructured text to determine whether a MedError is present or not.

Conference/Value in Health Info

2023-05, ISPOR 2023, Boston, MA, USA

Value in Health, Volume 26, Issue 6, S2 (June 2023)

Code

MSR20

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic

Methodology

Presentation