ADDRESSING SAMPLE SIZE CHALLENGES IN LINKED DATA THROUGH DATA FUSION USING ARTIFICIAL NEURAL NETWORKS
Author(s)
Arunajadai S1, Lee L2, Haskell T3
1Kantar, New York, NY, USA, 2Kantar Health, Diamond Bar, CA, USA, 3Kantar, Havertown, PA, USA
Presentation Documents
OBJECTIVES: Linking secondary clinical data with patient-reported data at the patient-level brings together a comprehensive view of the patient but sample sizes can be a challenge. This study demonstrates the fusion of Patient Reported Outcomes (PROs) in surveys with clinical data in claims enabling the study of associations between quality of life and disease-treatment interactions at scale especially for rare diseases. METHODS: The PROs SF-36v2 PCS, MCS, SF6D, and EQ5D were available in the National Health and Wellness Survey (N=345K). Clinical information from Komodo Health, a large U.S. database of health insurance claims (N=200M), were obtained using ICD, CPT/HCPCS, and NDC codes. 104K patients were linkable in the two data sets. The fusion process was accomplished using an artificial neural network-based predictive model followed by predictive mean matching. The linked data was used to train, validate and test the fusion methodology. The method allows for the simultaneous imputation of the 4 PROs. Results were assessed for the general patient population (GP), type-2 diabetes (T2D), and Myasthenia Gravis (MG), a rare disease. Results were also assessed after stratifying by age and gender. RESULTS: The triplet of numbers corresponds to the 3 cohorts (GP,T2D,MG). The number of patients in the test set was N:(5207,898,100). The difference between the observed and imputed means were: PCS:(0.23,-0.23,-0.22), MCS:(-0.009,0.14,1.5), EQ5D:(0.002,-0.005,-0.01) and SF6D:(0.002,-0.002,-0.004). We failed to reject hypothesis of no difference in all cases. All differences were less than the respective minimal clinical important difference. Similar results were observed when stratified by age and gender. The correlations between the imputed PROs mimic the observed correlations (absolute difference < 0.05). CONCLUSIONS: This study shows the suitability of data fusion as a substitute for linkage where overlap between data sources is small to study the effects of clinical variables on PROs.
Conference/Value in Health Info
2020-05, ISPOR 2020, Orlando, FL, USA
Value in Health, Volume 23, Issue 5, S1 (May 2020)
Code
PNS164
Topic
Methodological & Statistical Research, Patient-Centered Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Patient-reported Outcomes & Quality of Life Outcomes
Disease
No Specific Disease