Clinical Validation of an N-Gram Model for Detecting Economic Insecurity in Patient Populations Using Unstructured Clinical Notes
Author(s)
Kumar V1, Darbeloff T1, Mummert A2, Rasouliyan L1
1OMNY Health, Atlanta, GA, USA, 2OMNY Health, Sacramento, CA, USA
Presentation Documents
OBJECTIVES: As structured social determinant of health (SDoH) measures are often unavailable in real-world health data, one potential solution is to use n-gram models run on unstructured clinical notes to predict the existence of SDoH risk factors. Our objective was to investigate the credibility of an n-gram model in predicting economic instability.
METHODS: Unstructured clinical notes (2017-2022) from encounters across three hospital systems in the OMNY Health Database were examined for the presence of phrases indicative of economic insecurity as denoted in a previously published study. Model performance and clinical validity were measured by assessing most predictive terms and geographical demographics. Clinical characteristics such as age, gender, payer type, urban/rural location, and employment status were compared between patient encounters positively identified for economic insecurity compared to those without.
RESULTS: 126 million patient encounters were assessed. 1.47 million (1.2%) were positive for economic insecurity phrases. Interestingly, notes for encounters associated with economic insecurity had 7.7 times as many characters (mean (47,392); standard deviation (244,941)) compared to all (mean (6,109); standard deviation (42,774)). The five most frequent phrases predicting a positive indication of economic insecurity were: “Medicaid,” “homeless,” “shelter,” “too expensive,” and “not covered by insurance.” The five ZIP-3 locations with the most positive encounters were: 436 (Toledo, OH); 232 (Richmond, VA); 452 (Cincinnati, OH); 180 (Lehigh Valley, PA); 458 (Lima, OH). Encounters associated with economic insecurity were more likely to have Medicaid (28.7%), no payer (1.4%), or Medicare (23.4%) as payer type and unemployed as employment status (33.3%) as compared to all encounters (5.4%, 0.4%, 16.7%, and 18.7%, respectively).
CONCLUSIONS: Results show that positive economic insecurity using an n-gram model is confounded by the note length. Qualitative examination of the most common n-grams and demographic characteristics align with intuition. The ZIP-3 locations may reflect the underlying distribution of patients in the hospital systems.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
MSR62
Topic
Methodological & Statistical Research, Patient-Centered Research, Real World Data & Information Systems, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Distributed Data & Research Networks, Electronic Medical & Health Records, Patient-reported Outcomes & Quality of Life Outcomes
Disease
No Additional Disease & Conditions/Specialized Treatment Areas