Real-World Evidence Sources for Drug Safety: Registries and Claims Data

Real-World Evidence Sources for Drug Safety: Registries and Claims Data
By Frankie Torok 7 February 2026 0 Comments

Adverse Event Detection Calculator

How to Use

Calculate the minimum number of patient records needed to detect rare adverse events using either registry data or claims data. Enter the expected prevalence rate (e.g., 1 in 10,000) and select your data source.

1 in
Enter the denominator (e.g., 10000 for 1 in 10,000)

Minimum Sample Size Required


Based on the FDA's real-world evidence standards:

  • Registry data requires approximately 500,000 records to reliably detect 1 in 10,000 adverse events
  • Claims data requires approximately 1,000,000 records to reliably detect 1 in 10,000 adverse events
Why this matters

The minimum sample size needed depends on both the expected frequency of the event and the data source. Registries provide richer, more detailed data but typically cover smaller populations than claims databases. Claims data provides enormous scale but lacks clinical detail that might be critical for verification.

When a new drug hits the market, the clinical trials that got it approved only tell part of the story. Those trials usually involve a few thousand patients over a couple of years. But what happens when tens of thousands of people start taking it for decades? That’s where real-world evidence comes in. It’s not guesswork or speculation-it’s hard data pulled from everyday healthcare systems. Two of the most trusted sources? Disease registries and claims data. Together, they help uncover drug safety issues that clinical trials simply can’t catch.

What Exactly Is Real-World Evidence?

Real-world evidence (RWE) is clinical evidence drawn from real-world data (RWD)-information collected outside of tightly controlled trials. Think of it as the messy, complex, but incredibly honest picture of how drugs actually perform in the real world. The U.S. Food and Drug Administration (FDA) officially defined RWE in 2018 as evidence about the benefits or risks of a medical product based on data collected during routine care. It’s not a replacement for clinical trials. It’s a necessary complement. Since 2017, the FDA has used RWE to approve 12 new drug uses, with five of those relying directly on claims data or registry data.

Across the Atlantic, the European Medicines Agency (EMA) launched Darwin EU in 2021 to coordinate safety monitoring across 32 healthcare databases in 15 EU countries. By 2023, that network covered over 120 million patients. This isn’t just academic-it’s regulatory reality. Regulators now expect drug makers to use these data sources to monitor long-term safety.

Disease Registries: The Deep Dive

Disease registries are structured databases that collect detailed, standardized information about patients with a specific condition. These aren’t random surveys. They’re carefully designed systems that track demographics, lab results, imaging reports, treatments, side effects, and even patient-reported outcomes like quality of life. Some are small-like a single hospital’s registry tracking 300 patients with a rare liver disease. Others are massive, like the Surveillance, Epidemiology, and End Results (SEER) program, which covers nearly half of all U.S. cancer patients.

What makes registries powerful is depth. They capture things claims data can’t: actual lab values, detailed treatment protocols, genetic markers, and direct patient input. For example, the Cystic Fibrosis Foundation Patient Registry helped identify a safety signal for ivacaftor-improvements in lung function and reduced hospitalizations-that weren’t visible in the original clinical trials because those trials didn’t include patients with rare CFTR mutations. The registry’s high data completeness (up to 92% for key variables) made the signal trustworthy.

Registries also shine for rare conditions. If a side effect affects 1 in 10,000 patients, you need a lot of data to spot it. Registries, with their focused populations and rich detail, can detect these signals with as few as 500,000 records. Compare that to claims data, which might need a million. Registries also allow researchers to track outcomes over time. The Scientific Registry of Transplant Recipients (SRTR) provided the data that supported the 2021 approval of a new dosing regimen for tacrolimus, a key immunosuppressant.

But they’re not perfect. Setting up a national registry can take 18 to 24 months and cost between $1.2 million and $2.5 million. Annual upkeep runs $300,000 to $600,000. And participation isn’t mandatory-voluntary registries often see only 60% to 80% enrollment, which can introduce bias. About 35% of academic registries shut down within five years due to funding gaps.

A dual-panel scene showing registry data dissecting an organ and claims data scanning billing codes, with a robotic sentinel filtering signals.

Claims Data: The Broad View

Claims data comes from billing systems. Every time a patient visits a doctor, fills a prescription, or gets hospitalized, that encounter gets coded and sent to insurers. These codes-ICD-10 for diagnoses, CPT for procedures, NDC for medications-create a massive, longitudinal trail of healthcare activity.

The big advantage? Scale. IBM MarketScan covers 200 million lives. Optum Clinformatics tracks 100 million. Truven Health MarketScan handles 150 million. Medicare alone has over 60 million beneficiaries, with claims data stretching back 15+ years. That’s more than enough to spot rare side effects.

The FDA has used claims data to investigate dozens of drugs. In 2015, they analyzed 1.2 million Medicare records over five years to check whether entacapone (used for Parkinson’s) raised heart risks. No signal found. In 2014, they reviewed 850,000 records to assess olmesartan (a blood pressure drug) in diabetics. Again, no clear danger emerged. These weren’t flukes-they were methodical, regulatory-grade studies.

Claims data is 95% to 98% complete for hospitalizations and prescriptions. That’s unmatched. But it’s shallow. Lab values? Only 45% to 60% complete. Patient-reported symptoms? Almost never captured. Diagnosis codes can be wrong-up to 20% of ICD-10 codes have errors, according to the Agency for Healthcare Research and Quality (AHRQ). A patient might be coded as having “hypertension” when they actually have white-coat hypertension. Or a side effect might be lumped under “general malaise” and never traced back to the drug.

Still, claims data is indispensable. It was central to the 2019 FDA approval of palbociclib for new patient groups. By combining claims, electronic health records, and safety reports, regulators confirmed the drug’s safety in older adults and those with kidney issues-populations excluded from initial trials.

Registries vs. Claims Data: A Side-by-Side Look

Comparison of Registry Data and Claims Data for Drug Safety Monitoring
Feature Registry Data Claims Data
Population Size 1,000-50,000 patients 100 million-300 million patients
Data Completeness (Lab Values) 87% 52%
Longitudinal Coverage 5-15 years (varies) 15+ years (Medicare)
Detail on Patient Outcomes High (includes imaging, symptoms, quality of life) Low (limited to billing codes)
Detection Threshold (for 1:10,000 adverse events) ~500,000 records ~1,000,000 records
Implementation Cost $1.2M-$2.5M upfront $500K-$1M (integration)
Key Limitation Selection bias, sustainability Code inaccuracies, lack of clinical context

Registries give you the microscope. Claims data gives you the telescope. Neither is enough alone. The International Council for Harmonisation (ICH) E2 proposal from June 2023 recommends combining both. Why? Because when you cross-check a signal from claims data with registry data, false positives drop by 40%. That’s huge. A claim might say “liver injury” after taking a new drug. But without knowing if the patient had pre-existing liver disease, or if their ALT levels spiked 3x, you can’t be sure. Registries fill that gap.

Two AI entities merging registry and claims data into a safety shield, watched by a doctor as protective data streams envelop a city.

How Regulators Are Adapting

The FDA’s Sentinel Initiative, launched in 2008, is the gold standard. It connects 11 major healthcare systems and 3 claims processors to monitor safety for over 300 million patient records. It’s not just passive monitoring-it’s active surveillance. In 2022 alone, the FDA reviewed 107 RWE submissions, up from 29 in 2018. That’s a 290% increase in four years.

Therapeutic areas are shifting how they use these tools. Oncology leads with registries-38% of RWE submissions use them, because cancer treatments are complex and patient outcomes are tracked closely. Cardiovascular drugs? Claims data dominates. Why? Because heart disease affects millions, and outcomes like hospital readmissions or strokes are well-coded in billing systems.

New tools are emerging. In 2023, Novartis began integrating wearable data-heart rate, activity levels-with claims data to monitor Entresto patients for signs of heart failure worsening. A 2024 JAMA Network Open study showed AI-powered signal detection reduced false positives by 28%. The FDA’s REAL program, started in September 2023, aims to standardize registry collection for 20 priority diseases by 2026, starting with rare conditions where traditional methods fail.

What’s Next?

The future of drug safety isn’t just about bigger data-it’s about smarter integration. Registries will get more automated, with EHRs feeding data directly into national databases. Claims systems will improve coding accuracy through AI-assisted validation. The goal? To make safety monitoring faster, more accurate, and more predictive.

For patients, this means fewer surprises after a drug is approved. For doctors, it means better tools to weigh risks. For regulators, it’s a way to keep pace with rapidly evolving treatments. The old model-wait for a hundred reports of liver failure before acting-is gone. Real-world evidence, from registries and claims data, is now the backbone of modern pharmacovigilance.

Can claims data prove that a drug causes a side effect?

No-not on its own. Claims data can flag potential safety signals, like a spike in kidney injury reports after a drug launch. But proving causation requires clinical review. Did the patient have pre-existing conditions? Were other drugs involved? Was the diagnosis coded correctly? That’s why regulators always combine claims data with clinical adjudication, registry data, or controlled studies before taking action.

Why do registries cost so much to set up?

Because they’re not just databases-they’re systems. Setting up a registry means designing data collection forms, training staff, integrating with clinics, ensuring patient consent, securing data, and maintaining quality control. A national registry might involve hundreds of hospitals, thousands of clinicians, and millions of patient records. That takes people, technology, and ongoing funding. Small registries can be run on a shoestring, but they lack the scale and reliability needed for regulatory decisions.

Is real-world evidence accepted outside the U.S.?

Yes. The European Medicines Agency (EMA) has been using RWE since before 2020, and its Darwin EU network now connects 32 databases across 15 countries. Health agencies in Canada, Japan, Australia, and the UK also use registry and claims data to monitor drug safety. In fact, the EU often leads in data-sharing infrastructure, with more centralized systems than the U.S.

How do companies use this data before launching a drug?

Many companies now run “pre-market” observational studies using existing claims databases to estimate how many patients might be exposed, what comorbidities they have, and what other drugs they’re taking. This helps them design safer post-launch monitoring plans. For example, if a new diabetes drug is likely to be used mostly by elderly patients on multiple medications, the company will focus its safety tracking on drug interactions and kidney function.

Can patients opt out of having their data used?

Yes. In the U.S., claims data is de-identified under HIPAA rules-meaning names, addresses, and direct identifiers are removed before analysis. Patients can opt out of having their data shared for research through their insurer or state health department. Registries typically require explicit consent. But even with opt-outs, the volume of data is so large that removing a few thousand records doesn’t impact overall safety signals.

Drug safety isn’t a one-time check. It’s a continuous process. Registries and claims data are no longer just backup tools-they’re the frontline. And as systems get smarter, they’ll catch problems before they become crises.