Assessing the Representativeness of Real-World Claims Databases
Author(s)
Stephenson J1, Teng CC2, Harris K3
1Carelon Research, WILMINGTON, DE, USA, 2Carelon Research, Wilmington, DE, USA, 3Carelon Research, Willmington, DE, USA
Presentation Documents
OBJECTIVES: Despite their widespread use, little is known about the representativeness of real-world claims databases. This study assesses the representativeness of a large, US claims database using the 2020 US Census population as a benchmark.
METHODS: The Healthcare Integrated Research Database (HIRD) is a large administrative claims database maintained by Carelon Research for health-related research. We assessed representativeness by comparing the 2020 HIRD researchable population consisting of individuals enrolled in commercial and Medicare health plans to self-reported data from the 2020 Census population for a common set of demographic characteristics. The characteristics included sex (2-categories), age (5-year categories), region (4-categories), and race/ethnicity (5-categories).
We compared the probability distributions for each characteristic using two alternative measures of similarity. The standardized mean difference (SMD) assessed the magnitude or effect size of the difference where 0.2 represents a small effect, 0.5 a medium effect, and 0.8 a large effect. The overlap index (η) measured the degree of overlap between the two distributions where 0% means no overlap and 100% means complete overlap.RESULTS: Comparing the 2020 US Census (N=331,449,281) and 2020 HIRD commercial and Medicare (N=24,774,264) populations, we determined for sex, SMD=0.02 and η=99.2%; for age, SMD=0.19 and η=92.0%; for region, SMD=0.16 and η=94.8%; and for race/ethnicity, SMD=0.66 and η=86.8%.
CONCLUSIONS: We found the 2020 HIRD population to be highly representative of the 2020 US Census population in terms of sex, age, and region, while race/ethnicity appeared to be less representative. Differing modes of determining race/ethnicity may have potentially impacted this comparison. The HIRD race/ethnicity information was determined using multiple methods (e.g., self-report, imputation), whereas the US Census was self-reported.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
RWD100
Topic
Real World Data & Information Systems
Topic Subcategory
Health & Insurance Records Systems, Reproducibility & Replicability
Disease
No Additional Disease & Conditions/Specialized Treatment Areas