vos-headline-type-email-header-062620
ISPOR News

Real-World Evidence: From Frameworks to Practice

Summary of the May 2023 ISPOR/ISPE/Duke-Margolis Summit


Richard J. Willke, PhD,
Chief Science Officer Emeritus, ISPOR, Lawrenceville, NJ, USA


Richard Willke_2017-12



Introduction
Where does the credibility of real-world evidence (RWE) currently stand in regulatory and health technology assessment (HTA) decision making? What is the most recent thinking about real-world data (RWD) quality, fit-for-purpose use, and transparency criteria? What are we learning from ongoing efforts and actual cases of RWE in these areas? These were the key questions addressed during the ISPOR/ISPE/Duke-Margolis RWE Summit, entitled, “Real World Evidence: From Frameworks to Practice,” held on May 7, 2023 in Boston, Massachusetts.



"What is the most recent thinking about real-world data quality, fit-for-purpose use, and transparency criteria?"



This Summit comprised four 1-hour sessions on data quality, fit-for-purpose use, study transparency, and case studies, respectively (see Figure for full session titles and speaker names). Speakers included representatives from regulatory bodies, HTA agencies, academia, industry, consulting/data research companies, and the sponsoring organizations—a healthy balance of stakeholder interest and activity in RWE. An engaged, sold-out audience ensured vigorous question and answer periods. Each session’s discussion is summarized below. The slides from the session can be found here

ISPOR News_agenda 



From Data Quality to Qualities
The US Food and Drug Administration (FDA) and the European Medicines Agency (EMA) have each recently created guidance on RWE data quality, among their other RWE-related guidances. The FDA defined 2 major criteria for quality—relevance and reliability—where relevance includes the availability of key data elements and representative patients for the study, while reliability includes accuracy, completeness, provenance, and traceability. The EMA considered reliability and relevance as well as extensiveness, coherence, and timeliness, as aspects of data quality. While their respective definitions of these criteria have much in common, they are not identical and thus there is potential for differing standards. The International Coalition of Medicines Regulatory Authorities has called for harmonization of RWE terminologies and convergence of guidance and best practices across countries. In general, reliability is seen as an intrinsic aspect of the data’s quality, while relevance refers to the data’s applicability to the particular question at hand and speaks to its being “fit-for-purpose” for that question.

The major threat to the reliability of RWE is its potential for biases (eg, selection, information, and confounding biases). To assess and quantify the potential for bias, the measurement characteristics of the data (such as missingness, sensitivity and specificity of coding, or accuracy of timing of event onset) must be analyzed and documented. The FDA Sentinel Initiative has created a structured “data-adaptive review cycle” for electronic health records to assess whether measurement performance is adequate and to apply corrective measures when needed and possible. Using Sentinel data, they demonstrated that high-quality RWD—combined with target trial emulation study designs—can regularly replicate randomized clinical trial (RCT) results.

From the industry sponsor side, it is critical to have clear criteria for, and documentation of, data quality to satisfy regulatory requirements. As part of a larger collaborative effort to enable the use of RWE, the Transcelerate Initiative produced its draft, “Real-World Data Overview: RWD Audit Readiness,” in early 2023 for public review. It includes definitions and suggested documentation for data relevance, accrual, provenance, completeness, and accuracy.  At the time of the Summit, the draft was under revision based on public comments.


"The major threat to the reliability of real-world evidence is its potential for biases (eg, selection, information, and confounding biases)."



The RWE Alliance is a coalition of RWD and analytics organizations with a common interest in harnessing the power of RWE to inform regulatory decision making to improve patients’ lives. It has been trying to address some challenges created by recent regulatory guidances for RWE. These challenges include access to patient-level source data (eg, privacy concerns), conversion of RWD to a supported standard (which may sacrifice some granularity), data quality assessment and examples thereof (more are needed), and clarification of data collection and analysis expectations. To assess the role of data source variability, a pilot study was conducted where each data organization estimated, using its own data, the treatment effect on survival of a given oncologic product. Estimated treatment effects varied considerably, leading to some recommendations on data quality assessment: (a) develop a template, (b) use a quantitative approach, (c) use careful evaluation by experts with deep knowledge of the data, and (d) use quality indicators for the data.

Audience comments during the session related to who bears responsibility for data quality, the particular importance of high-quality data for exposure and outcomes, the need to establish higher RWD collection standards, and the need for standards for non-Western data.

 

Considering Data Qualities in Determining Fit-for-Purpose: Can We Converge on an Approach?
While some data quality aspects may be largely inherent in the data, judging whether data are fit-for-purpose combines the data’s intrinsic quality, particularly reliability, with the specific question being addressed (ie, its relevance). This session sought to consider the question of whether standards for fit-for-purpose used can be harmonized across agencies and countries. Following brief presentations, several questions were posed to the panel.

FDA considerations when judging RWE, as stated in their 2018 guidance, can be characterized by 3 “swim lanes”: (1) whether the RWD are fit for use; (2) whether the trial or study design used to generate RWE can provide adequate scientific evidence to answer or help answer the regulatory question; and (3) whether the study conduct meets FDA regulatory requirements. To assess fit-for-purpose use, relevance and reliability must be assessed. RWD-based study designs can potentially yield evidence that meets the statutory standard for substantial evidence (there is not a different standard for RCT vs RWE results), although the degree of certainty supporting an FDA conclusion about substantial evidence of effectiveness may differ depending on the clinical circumstances (eg, disease severity, unmet medical need). Representative problems observed by FDA with RWE relate to 3 main factors: (1) real-world data sources, (2) nonrandomized study designs, and (3) conduct of nonrandomized studies.

From the EMA’s perspective, the fit-for-purpose standards they apply align well with FDA’s, although they do explicitly identify coherence, extensiveness, and timeliness in their Data Quality Framework (DQF). More guidance is probably needed in exactly how they will be operationalized; however, some regulatory judgment is needed in what evidence is truly fit-for-purpose. In addition to the DQF, EMA has also produced their Good Practice Guide for the Use of Real World Metadata, which is based on using a metadata catalog developed by ENCePP as one selects data for a given study and provides the needed aspects of the data in the study protocol.

At Canadian Agency for Drugs and Technologies in Health (CADTH), HTA is done not only initially but also later in the product life cycle and considers not only safety and effectiveness but also resource costs as well as patient, societal, and cultural factors. As such, RWD is an important tool, and transparency in the generation and reporting of RWE is critical for their evaluation. CADTH is launching, after public review, guidance
for reporting RWE to ensure that regulators and HTA agencies have sufficient information to evaluate a study for its appropriateness of use for decision making, to provide core reporting standards for RWE studies that align with global standards, and to prioritize transparency in reporting while accounting for practical challenges related to RWD and RWE.

From an industry perspective, components of the multifaceted nature of fit-for-purpose RWD result in many “moving parts” that need to work together to enable use of RWD for regulatory and HTA decision making. These include the regulatory or HTA context, filing strategy, clinical context, data sources, methods, and established data standards. The industry “ask” is for convergence on common principles for characterizing data quality for global drug development.

The first question posed to the panelists was: “What is the relationship between the standards for data quality and fitness-for-purpose? Do they not vary with the nature of the regulatory decision?” Briefly, the panelists all replied “yes”—both the context and the totality of the evidence matter for the regulatory decision. Context includes a judgment about whether the data context (eg, population or clinical setting) is applicable to the decision context. Similarly, the credence given to the RWE may depend on its consistency with, or reason for being different from, other evidence.   

The second question was “While the criteria for fitness-for-purpose may be bespoke for a particular regulatory question, doesn’t one consider what has been learned about the ‘operating characteristics’ of particular data sources based on their track record for producing credible RWE?” The answer was again, essentially, “yes.” Past experience with the data does matter for understanding of intrinsic data quality, although data completeness can vary from variable to variable and thus from study to study. In some cases, fitness-for-purpose can be best evaluated by showing the “unfitness” of the data (ie, how certain data imperfections can affect study results). This “unfitness” can sometimes be fed back to the data collectors to improve future data collection.

The third question was, “Data quality and fit-for-purpose standards will likely vary when there exists RCT data related to the therapeutic in question as opposed to when standard RCTs are not available for feasible. Does this mean that there are different standards for sufficient (substantial) evidence?” As addressed earlier in the session, substantial evidence standards should stay the same, since they are about the fundamentals of the data and research question. Nevertheless, early interaction between sponsors and regulators about RWD study plans can help with specific aspects there. In addition, it may depend on the totality of evidence since the complementarity of RWE and RCT evidence may affect the decision.  

Questions from the audience brought out several additional points:

- Benchmarking of data quality standards would be most useful and could help data curators report on their processes; documentation of dataset quality needs much improvement.

- Experience with some regulatory committees (eg, NICE) indicates that RWD has rarely been used for comparative effectiveness and also that the opinions of 1 or 2 clinical experts can greatly affect decisions about data quality and credibility.

- Transparency of the research process is becoming quite important to decision makers; conversely, transparency of the decision-making process is important to stakeholders but is often still lacking—more case studies are needed.

- Completeness (lack of missingness) of data is often critical but standards there would be most helpful.

 

Transparency in RWE: Ensuring Credibility and Confidence
In the first 2 sessions, the importance of study transparency was mentioned several times. The third session expanded on efforts to improve transparency via protocol registration, use of protocol templates and master protocols, and considerations for publication.

Preregistration of RWE study protocols improves confidence that the results were not based on a data-mining exercise. In recent years, registration of RWD/observational studies in the best-known protocol registry, clinicaltrials.gov, has increased significantly, although it is still structured better for prospective studies. The EU PAS Register, originally created by EMA for post-authorization studies in Europe, is also an option for registering safety or comparative effectiveness studies using RWD. In the last few years, the RWE Registry was created by the ISPOR/ISPE/Duke-Margolis/NPC Transparency Initiative. Hosted by the Center for Open Science, it is specifically designed for efficient registration (ie, half as many questions as clincialtrials.gov) of RWD cohort, case-control, or other retrospective study designs, and has a “lockbox” option to maintain confidentiality of ongoing studies. All the databases are searchable so that they can be used for systematic or other types of literature reviews.


"Preregistration of real-world evidence study protocols improves confidence that the results were not based on a data-mining exercise.... Journal editors are starting to seriously consider the need for preregistration of RWD study protocols."

 

The HARPER (HARmonized Protocol to Enhance Reproducibility) protocol template, created by a recent ISPE-ISPOR Special Task Force, provides a structure for creating and documenting elements of a RWD study protocol. It is intended to promote transparency and reproducibility of noninterventional study protocols by academics, companies, and regulators. It is compatible with the legal format and content of the GVP Module VIII on PASS and can already be used in PASS protocols without change of structure. Several pilot initiatives using HARPER, as well as training in its use, are in progress.

There is also an ongoing initiative led by Duke-Margolis to create an RWE Master Protocol. An RWE Master Protocol design is meant to align research questions with appropriate methods and data sources to facilitate consistent implementation and replication. A linchpin to implementing RWE master protocols is understanding data requirements for the study, including the translation of study questions into RWD, data requirements and programming specifications, and data quality considerations. A white paper based on the work of this initiative is expected soon.

From a journal’s perspective, editors want both transparency and validity in the manuscripts they review. Of the two, validity is likely more important but good transparency is important for determining validity, and a prespecified protocol helps ensure that the researchers did what they originally intended to do. Journal editors are starting to seriously consider the need for preregistration of RWD study protocols but have not reached consensus yet. More journals are now using “badging” (ie, small icons accompanying article titles) to indicate study characteristics like protocol registration. Incorporation of a registration number into reporting checklists like PRISMA or CHEERS would help reviewers see it as an important study element. Value in Health is moving towards having expedited review for preregistered studies as an incentive for doing so.

Audience Q&A emphasized the need to broaden the discussion about these transparency tools to more audiences, especially those who may use RWD less regularly—in academia, industry, clinical practice, journalism, etc. Incentives and ease of use will be keys to their becoming normal practice.

 

Navigating the RWE Landscape—Successes, Struggles, and the Path Forward
An important complement to guidances and tools is experience with how they are implemented in actual practice and decision making. This session provided details on a number of specific cases involving RWE in regulatory and HTA decisions.

In the United States, while RWE has been used by FDA as supplementary evidence of efficacy in a few cases, the first time RWE was used as primary evidence of efficacy was in 2021, for a new indication of tacrolimus for prevention of organ rejection in lung transplants. It relied on data from a non-interventional (observational) treatment arm, where tacrolimus was used off-label, compared to historical controls, with both arms drawn from the Scientific Registry of Transplant Recipients data on all lung transplants in the United States during 1999–2017. The primary endpoint was graft failure or all-cause mortality at 1 year. There were some issues with the study data and analysis (eg, relating to the choice of index date and some missing data) but they were resolved by discussions between the agency and the sponsor. Lessons learned relate to the topics discussed at this Summit—ensuring data reliability and relevance, prespecification of the study protocol, and robust scientific rationale for study and analysis choices. While the decision shows that RWD/RWE brings opportunities, there have also been some “failures” at FDA—one where it was decided that the RWE for an external control did not match clinical data for inclusion/exclusion criteria and standard of care, and one where RWE was not allowed to be added for effectiveness for lack of relevance and reliability. As an additional point, it was noted that FDA is exploring more use of machine learning and natural language processing for safety surveillance.

At NICE, a current focus is implementing the RWE Framework they published in 2022. One case related a review of mobocertinib for EGFR exon 20 insertion-positive non-small cell lung cancer after platinum chemotherapy. Treatment evidence came from phase I and phase II single-arm trials, with external control arms that used US and German RWD and adjusted indirect treatment comparisons. There were several review issues related to data provenance, effects of missing data, use of pooling, and relevance of case-mix adjustments. After a company response with more data provenance information and several scenario/sensitivity analyses, a positive decision was made; however, NICE felt uncertainty could have been reduced more. As at FDA, there have been significant challenges—both for HTA and clinical purposes—with use of RWE, ranging from gaps in the NICE RWE Framework, to data access, to need for organizational upskilling. In addressing these challenges, NICE is committed to stewarding RWE across the evidence life cycle.

As discussed earlier, an important factor in the success of RWE studies is ensuring that the data are of high quality and fit-for-purpose. To that end, Zorginstituut Nederland (ZIN) is working to provide national guidance on disease-specific patient registries to enable the production of high-quality comparative effectiveness and cost-effectiveness studies. This effort involves establishing minimal data sets with involvement of all stakeholders, using a new tool (REQUEST) to assess the data quality and transparency of the patient registries, as well as piloting the HARPER template to help define the research question. Lessons learned include the value of REQUEST in helping registry owners better document their data characteristics; the value of HARPER tool’s graphical representation of exposure-based cohort entry for establishing index dates; and the difficulty many registries have linking with electronic health records.


"In a growing number of cases across countries, data quality has been judged good enough to support regulatory/HTA decisions about drug efficacy or effectiveness."

 

External control arms (ECAs) are presently the primary use of RWE for regulatory decisions. A recent study by Jaska et al reviewed decisions made by 3 regulators and 5 HTA agencies on 7 drug applications using ECAs. It tracked the positive and negative comments on specific aspects of RWE data analyses, finding variability and sometimes disagreement across agencies. The most prevalent critiques related to generalizability/relevance (eg, inconsistency of the standard of care in the ECA over time) and mitigation of confounding, which are well-known issues but may lessen as data quality and study designs improve; variability across agencies should improve as more of them produce their own RWE guidances.

Looking ahead, there are many opportunities for postlaunch use of RWE to address evidence gaps and uncertainties, but key questions remain. Who is going to review evidence? At what time points? Based on what regulations? How are evidence gaps prioritized? Who bears the burden of evidence generation?  Will data be transferable across countries? Multistakeholder collaborations, such as a current one between Health Canada and Aetion, may be a path forward to help answer some of these questions.

Subsequent discussion highlighted several more points. First, RWD quality is still seen as a major issue by many HTA agencies. Efforts to improve data quality, especially its relevance, are being pursued in several ways, such as collection of new patient endpoints via wearables, linkage of administrative data with community-based data collection from individuals, and longitudinal linkage of data across insurers to collect long-term outcomes data. The importance of transparency from all concerned was also reemphasized—both clear explication of the study design and process by sponsors/researchers, as well as more explicit statements by reviewers about the merits or issues driving decisions about submissions—along with the desirability of a repository of regulatory/HTA decisions with key criteria identified.

 

Summary
Collectively the 4 sessions of the Summit provided many key insights into the questions posed at the beginning of this article. Data quality for RWD has been defined by several agencies and focuses on the general aspects of reliability and relevance. In a growing number of cases across countries, data quality has been judged good enough to support regulatory/HTA decisions about drug efficacy or effectiveness. However, there is considerable room for improvement. For reliability, more consistent attention to the measurement characteristics and curation of the data, as well as structured documentation of both, is needed. Questions about who has responsibility for data quality must be resolved and multistakeholder efforts are needed there. Relevance of the data to the clinical context and the population are critical for their use in decision making for specific indications. To make study results credible, proper study designs to deal with selection bias and confounding, transparency of study process, and addressing uncertainty (to the extent possible) via scenario and sensitivity analysis are all important. In the end, decision making may also be influenced by factors like the totality of the evidence, unmet medical need, and magnitude of treatment effect relative to price. As data reliability is improved and experience with use of RWE accumulates, one can anticipate its increasingly greater use for both initial regulatory decisions and subsequent reevaluations during the product life cycle.

 

Acknowledgments: This article directly incorporates a number of points from the speakers’ slides as well as their remarks during the sessions; to simplify the exposition, quotations marks and attributions are not used, but readers are encouraged to refer to the linked slides and recording as original sources. The assistance of the Program Committee in creating the agenda for this Summit (Marc Berger, William Crown, Nancy Dreyer, Shirley Wang, Rachele Hendricks-Sturrup, Sebastian Schneeweis, David van Brunt, Gracy Crane, Lucinda Orsini, Massoud Toussi, Adam Aten, Christina Mack), is gratefully acknowledged, as is the work of Kat Bissett, Meredith Kaganovskiy, Paul Wong, and other ISPOR staff members in making all the needed arrangements, as well as Lyn Beamesderfer’s help in reviewing this article.

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×