Identifying Economic Insecurity Among Psoriasis Patients Using Natural Language Processing and Unstructured Clinical Notes
Author(s)
Kumar V, Mummert A, Rasouliyan L, Althoff A, Chang S, Long S
OMNY Health, Atlanta, GA, USA
Presentation Documents
OBJECTIVES: To train and apply a natural language processing (NLP) model on clinical notes to identify economic insecurity among psoriasis patients and to compare findings to those of using structured clinical codes.
METHODS: Deidentified unstructured clinical data of patients in the OMNY Health Database having International Classification of Diseases, Tenth Revision (ICD-10) codes for economic insecurity (Z59.4 – Z59.7, Z59.86, Z59.87, Z59.9) were split into sentences. A random sentence sample was annotated for presence of economic insecurity and was split into training, validation, and test sets. This data was used to fine-tune an open-source, transformer-based NLP model. Probability thresholds for model predictions were chosen to maximize precision at an acceptable recall level. The model was applied to non-templated sentences of psoriasis patients from 5 specialty dermatology networks from 2017-2019. The number of patients having a sentence (i.e., indicating economic insecurity) was calculated and compared to the count of clinical-code-positive patients. A random sample of positively predicted sentences was checked for accuracy.
RESULTS: A total of 2,007 sentences were sampled and annotated. At the chosen probability threshold of 0.91, precision/recall were 0.91/0.66. Application of the model to the psoriasis population (50,969 patients) yielded 686 patients positive for economic insecurity, compared to zero patients having corresponding clinical codes. Of the 100 positive sentences checked for accuracy, 60 were true positives.
CONCLUSIONS: NLP can identify social determinants of health (SDoH) among patients not otherwise detectable using ICD-10 codes. Further model training and/or threshold adjustment is needed to improve accuracy. Similar analyses were performed for SDoH domains of undereducation and housing insecurity, and patient yields were significantly lower (not reported here), suggesting that some SDoH domains may be underreported in specialty clinical notes relative to others. Further research is needed to improve models for additional SDoH domains and obtain predictions for both primary care and specialty patients.
Conference/Value in Health Info
Value in Health, Volume 26, Issue 11, S2 (December 2023)
Code
MSR117
Topic
Health Policy & Regulatory, Methodological & Statistical Research, Real World Data & Information Systems, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Distributed Data & Research Networks, Electronic Medical & Health Records, Health Disparities & Equity
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, Sensory System Disorders (Ear, Eye, Dental, Skin)