Implementation of a Real-World Data Quality Framework in a Nationwide Oncology Electronic Health Record-Derived Database
Author(s)
Castellanos E1, Wittmershaus B2, Chandwani S3
1Flatiron Health Inc., Philadelphia, PA, USA, 2Flatiron Health Inc., New York, NY, USA, 3Flatiron Health Inc., Somerset, NJ, USA
Presentation Documents
OBJECTIVES: In recent years, multiple real-world data (RWD) quality frameworks have been released identifying key dimensions of quality. Practical considerations in applying these different frameworks to scaled datasets have not been well-described. We demonstrate the implementation of a RWD quality framework incorporating core published dimensions of data quality to a scaled, electronic health record (EHR)-based oncology RWD source.
METHODS: We assessed the nationwide Flatiron Health EHR-derived de-identified database, with data originating from ~280 US academic and community cancer clinics, using structured and unstructured sources as well as external linkages to genomic and claims data. We examined quality assessment approaches for generating oncology RWD and mapped them to quality dimensions across published frameworks.
RESULTS: Our RWD quality framework aligns with published frameworks and includes the following dimensions: relevance (including sufficiency and representativeness) and reliability (including accuracy, completeness, provenance, and timeliness). Dataset size, breadth and depth of data elements using structured and unstructured EHR-derived data or linked data sources are selected to optimize relevancy to broad or specific sets of use cases. A range of validation approaches are implemented, including direct comparison to an external or internal reference standard, or indirect benchmarking. Verification checks, implemented at patient and cohort level throughout the data lifecycle assess conformance, consistency and plausibility. Completeness is assessed according to clinical expectations for documentation at source. Provenance is addressed by recording data transformation, documenting data management procedures, and maintaining auditable metadata. Timeliness is addressed by setting refresh frequency to minimize lags in data capture (e.g., 30 day recency).
CONCLUSIONS: Our data quality assessments address the common dimensions of reliability and relevance using a range of approaches to balance robustness, scalability, and feasibility. This framework can be flexibly applied across other RWD sources, enables transparency in determining fitness for use, and standardizes language for data quality implementation.
Conference/Value in Health Info
Value in Health, Volume 26, Issue 6, S2 (June 2023)
Code
RWD96
Topic
Real World Data & Information Systems
Topic Subcategory
Data Protection, Integrity, & Quality Assurance, Distributed Data & Research Networks, Health & Insurance Records Systems, Reproducibility & Replicability
Disease
No Additional Disease & Conditions/Specialized Treatment Areas