Real-World Evidence Study of Patients With NSCLC in Finland: Use of Machine Learning Algorithm to Extract Smoking Status From Patient Texts and Analysis of Resource Use and Survival by Smoking Status

Author(s)

Ekroos H1, Koistinen V2, Hölsä O3, Mattila R3, Knuuttila A4
1HUS Porvoo Hospital, Porvoo, Finland, 2Wellbeing services county of Kymenlaakso, Kotka, Finland, 3Medaffcon Oy, Espoo, Finland, 4Helsinki University Hospital, Helsinki, Finland

OBJECTIVES: Smoking status is known to be a significant risk factor and shortening the survival in NSCLC, yet it is commonly registered as unstructured data in the medical records, complicating its use in real-world evidence (RWE) studies. As part of a larger data collection, we identified smoking status of NSCLC patients from patient texts, analyzed the overall survival (OS) and healthcare resource utilization (HCRU) of NSCLC patients by smoking status.

METHODS: In the study, we included electronic health records of patients diagnosed with NSCLC between January 2013 to August 2023 in Helsinki University Hospital, Finland. Smoking status was identified from patient texts using a pretrained machine learning (ML) classification algorithm. All-cause specialized care resource use and costs (outpatient contacts, ER visits, and hospital admissions) and OS were analyzed by smoking status during the NSCLC follow-up.

RESULTS: Out of the 6 248 identified NSCLC patients, 93% had a known smoking status. Smokers (N= 2 369, 41%) had first-year follow-up costs of 26 146.63€ (confidence interval, CI 95% 25 261.66- 27 031.60), ex-smokers (N=2 721, 47%) 25 904.96 € (CI95% 25 066.16-26 743.76), and nonsmokers (N=720, 12%) 25 220.73 € (CI95% 23 277.51- 27 163.95), respectively. OS was 11.0 months for the smokers (CI95% 10.0-12.2), 12.9 months for the ex-smokers (CI95% 11.8-14.2), and 21.7 months for the nonsmokers (CI95% 18.8-26.7).

CONCLUSIONS: Smoking status was successfully identified from patient texts for a vast majority of the patients using the ML algorithm. All-cause HCRU was nearly the same regardless of the smoking status whereas the nonsmokers had increased survival compared to the ex-smokers and smokers.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

MSR168

Topic

Clinical Outcomes, Economic Evaluation, Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Clinical Outcomes Assessment, Electronic Medical & Health Records

Disease

Oncology, Respiratory-Related Disorders (Allergy, Asthma, Smoking, Other Respiratory)

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×