Real-World Evidence Study of Patients With NSCLC in Finland: Use of Machine Learning Algorithm to Extract Smoking Status From Patient Texts and Analysis of Resource Use and Survival by Smoking Status
Author(s)
Ekroos H1, Koistinen V2, Hölsä O3, Mattila R3, Knuuttila A4
1HUS Porvoo Hospital, Porvoo, Finland, 2Wellbeing services county of Kymenlaakso, Kotka, Finland, 3Medaffcon Oy, Espoo, Finland, 4Helsinki University Hospital, Helsinki, Finland
Presentation Documents
OBJECTIVES: Smoking status is known to be a significant risk factor and shortening the survival in NSCLC, yet it is commonly registered as unstructured data in the medical records, complicating its use in real-world evidence (RWE) studies. As part of a larger data collection, we identified smoking status of NSCLC patients from patient texts, analyzed the overall survival (OS) and healthcare resource utilization (HCRU) of NSCLC patients by smoking status.
METHODS: In the study, we included electronic health records of patients diagnosed with NSCLC between January 2013 to August 2023 in Helsinki University Hospital, Finland. Smoking status was identified from patient texts using a pretrained machine learning (ML) classification algorithm. All-cause specialized care resource use and costs (outpatient contacts, ER visits, and hospital admissions) and OS were analyzed by smoking status during the NSCLC follow-up.
RESULTS: Out of the 6 248 identified NSCLC patients, 93% had a known smoking status. Smokers (N= 2 369, 41%) had first-year follow-up costs of 26 146.63€ (confidence interval, CI 95% 25 261.66- 27 031.60), ex-smokers (N=2 721, 47%) 25 904.96 € (CI95% 25 066.16-26 743.76), and nonsmokers (N=720, 12%) 25 220.73 € (CI95% 23 277.51- 27 163.95), respectively. OS was 11.0 months for the smokers (CI95% 10.0-12.2), 12.9 months for the ex-smokers (CI95% 11.8-14.2), and 21.7 months for the nonsmokers (CI95% 18.8-26.7).
CONCLUSIONS: Smoking status was successfully identified from patient texts for a vast majority of the patients using the ML algorithm. All-cause HCRU was nearly the same regardless of the smoking status whereas the nonsmokers had increased survival compared to the ex-smokers and smokers.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
MSR168
Topic
Clinical Outcomes, Economic Evaluation, Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Clinical Outcomes Assessment, Electronic Medical & Health Records
Disease
Oncology, Respiratory-Related Disorders (Allergy, Asthma, Smoking, Other Respiratory)