September 29, 2022
Open to all ISPOR Members and Non-members
Title: On the Validity of Statistical Analyses with Privacy-Preserving Synthetic Data
11:00AM EDT | 3:00PM UTC | 5:00PM CEST
Click here for time zone conversion
Register Here
Description
Synthetic data generation is a contemporary method for preserving patient privacy in real-world health data, which reduces friction when sharing this data internally or externally for secondary analysis. Synthetic data is generated by training a machine learning model (called a generative model) to learn the patterns in an original dataset. That model then generates a dataset that looks and operates like the original dataset, with the intention of preserving the statistical properties of the original. Different machine learning and deep learning techniques can be used to train such a generative model.
The questions about synthetic data include whether it can provide valid statistical analysis. In this webinar we will present a brief tutorial on synthetic data generation, an overview of its privacy preserving properties, its advantages over traditional de-identification methods, and then review the results from a simulation of the validity of inference on synthetic oncology datasets. Using multiple imputation principles, we show that logistic regression parameter estimates on synthetic data have low bias, close to nominal coverage and power, and comparable precision to the original data. These results contribute to the growing evidence that inferences from synthetic datasets are valid. The appropriate parameterizations, strengths and limitations of the approach will be discussed.
All materials will be presented with relevant illustrative examples.
Learning Objectives
- Be able to describe basic techniques for synthetic data generation
- Learn how to evaluate the privacy risks and utility of synthetic data
- Understand how to perform statistically valid population inferences from synthetic data
Speakers:
Khaled El Emam, PhD, SVP and General Manager, Replica Analytics & Professor, School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada
Lucy Mosquera, MS, Director of Data Science, Replica Analytics, Ottawa, ON, Canada
Please note: On the day of the scheduled webinar, the first 1000 registered participants will be accepted into the webinar. For those who are unable to attend, or would like to review the webinar at a later date, the
full-length webinar recording will be made available at the ISPOR Educational Webinar Series webpage approximately 2 days after the scheduled Webinar.