Clinical Validation of an N-Gram Model for Detecting Economic Insecurity in Patient Populations Using Unstructured Clinical Notes

Author(s)

Kumar V1, Darbeloff T1, Mummert A2, Rasouliyan L1
1OMNY Health, Atlanta, GA, USA, 2OMNY Health, Sacramento, CA, USA

OBJECTIVES: As structured social determinant of health (SDoH) measures are often unavailable in real-world health data, one potential solution is to use n-gram models run on unstructured clinical notes to predict the existence of SDoH risk factors. Our objective was to investigate the credibility of an n-gram model in predicting economic instability.

METHODS: Unstructured clinical notes (2017-2022) from encounters across three hospital systems in the OMNY Health Database were examined for the presence of phrases indicative of economic insecurity as denoted in a previously published study. Model performance and clinical validity were measured by assessing most predictive terms and geographical demographics. Clinical characteristics such as age, gender, payer type, urban/rural location, and employment status were compared between patient encounters positively identified for economic insecurity compared to those without.

RESULTS: 126 million patient encounters were assessed. 1.47 million (1.2%) were positive for economic insecurity phrases. Interestingly, notes for encounters associated with economic insecurity had 7.7 times as many characters (mean (47,392); standard deviation (244,941)) compared to all (mean (6,109); standard deviation (42,774)). The five most frequent phrases predicting a positive indication of economic insecurity were: “Medicaid,” “homeless,” “shelter,” “too expensive,” and not covered by insurance.” The five ZIP-3 locations with the most positive encounters were: 436 (Toledo, OH); 232 (Richmond, VA); 452 (Cincinnati, OH); 180 (Lehigh Valley, PA); 458 (Lima, OH). Encounters associated with economic insecurity were more likely to have Medicaid (28.7%), no payer (1.4%), or Medicare (23.4%) as payer type and unemployed as employment status (33.3%) as compared to all encounters (5.4%, 0.4%, 16.7%, and 18.7%, respectively).

CONCLUSIONS: Results show that positive economic insecurity using an n-gram model is confounded by the note length. Qualitative examination of the most common n-grams and demographic characteristics align with intuition. The ZIP-3 locations may reflect the underlying distribution of patients in the hospital systems.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)

Code

MSR62

Topic

Methodological & Statistical Research, Patient-Centered Research, Real World Data & Information Systems, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Distributed Data & Research Networks, Electronic Medical & Health Records, Patient-reported Outcomes & Quality of Life Outcomes

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×