NLP and Machine Learning to Automate Identification of Suspected Medication Errors from Real World Unstructured Narratives
Author(s)
Painter J1, Haguinet F2, Cranfield C3, Bate A3
1GlaxoSmithKline, Raleigh, NC, USA, 2GlaxoSmithKline, Wavre, Belgium, 3GSK, London, UK
Presentation Documents
OBJECTIVES:
To use machine learning (ML) to determine whether or not unstructured text (e.g. safety reports) contains mention of a medication error (MedError) and if so, to further sub-classify those as error with stated adverse drug reaction, error without harm, intercepted error, or potential error. This classification is usually rules-based and requires manual review by trained safety scientists. ML models were built to automate this process, first identifying if narratives containing MedErrors or not, and then further identifying the correct sub-classification.METHODS:
A balanced, random set of case narratives (N = 3,122) labelled by safety scientists as mentioning a MedError or not were collected for ML training. The narratives were processed using text vectorization, and all product names were masked to reduce potential bias. A Bernoulli naïve Bayes classifier was trained using 75% of the data, and 25% was held out to test model performance. Next, a multinomial naïve Bayes (MNB) classifier was built on 1,744 case narratives containing manually annotated sub-classifications. Performance was evaluated using cross-validation. Concordance with rules-based labels was then compared to a non-supervised clustering method using k-means for narratives not sub-classified.RESULTS:
The binary classification for the detection of MedErrors had an F1-score of 86%. The MNB classifier for sub-classification had an F1- score of 66% averaged across sub-classes. Concordance between rules-based and unsupervised clustering had kappa of 0.05.CONCLUSIONS:
Binary classification for the detection of MedErrors showed good performance, while sub-categorization with this data set was low, except for one sub-class, error with stated adverse drug reaction. Given the adjudged importance of accurate sub-categorization, the current value added is uncertain. More training data will be required in order to consider its potential use, while results suggest binary classification could be readily employed for screening unstructured text to determine whether a MedError is present or not.Conference/Value in Health Info
2023-05, ISPOR 2023, Boston, MA, USA
Value in Health, Volume 26, Issue 6, S2 (June 2023)
Code
MSR20
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas