ADDRESSING SAMPLE SIZE CHALLENGES IN LINKED DATA THROUGH DATA FUSION USING ARTIFICIAL NEURAL NETWORKS

Author(s)

Arunajadai S1, Lee L2, Haskell T3
1Kantar, New York, NY, USA, 2Kantar Health, Diamond Bar, CA, USA, 3Kantar, Havertown, PA, USA

Presentation Documents

OBJECTIVES: Linking secondary clinical data with patient-reported data at the patient-level brings together a comprehensive view of the patient but sample sizes can be a challenge. This study demonstrates the fusion of Patient Reported Outcomes (PROs) in surveys with clinical data in claims enabling the study of associations between quality of life and disease-treatment interactions at scale especially for rare diseases. METHODS: The PROs SF-36v2 PCS, MCS, SF6D, and EQ5D were available in the National Health and Wellness Survey (N=345K). Clinical information from Komodo Health, a large U.S. database of health insurance claims (N=200M), were obtained using ICD, CPT/HCPCS, and NDC codes. 104K patients were linkable in the two data sets. The fusion process was accomplished using an artificial neural network-based predictive model followed by predictive mean matching. The linked data was used to train, validate and test the fusion methodology. The method allows for the simultaneous imputation of the 4 PROs. Results were assessed for the general patient population (GP), type-2 diabetes (T2D), and Myasthenia Gravis (MG), a rare disease. Results were also assessed after stratifying by age and gender. RESULTS: The triplet of numbers corresponds to the 3 cohorts (GP,T2D,MG). The number of patients in the test set was N:(5207,898,100). The difference between the observed and imputed means were: PCS:(0.23,-0.23,-0.22), MCS:(-0.009,0.14,1.5), EQ5D:(0.002,-0.005,-0.01) and SF6D:(0.002,-0.002,-0.004). We failed to reject hypothesis of no difference in all cases. All differences were less than the respective minimal clinical important difference. Similar results were observed when stratified by age and gender. The correlations between the imputed PROs mimic the observed correlations (absolute difference < 0.05). CONCLUSIONS: This study shows the suitability of data fusion as a substitute for linkage where overlap between data sources is small to study the effects of clinical variables on PROs.

Conference/Value in Health Info

2020-05, ISPOR 2020, Orlando, FL, USA

Value in Health, Volume 23, Issue 5, S1 (May 2020)

Code

PNS164

Topic

Methodological & Statistical Research, Patient-Centered Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Patient-reported Outcomes & Quality of Life Outcomes

Disease

No Specific Disease

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×