How We Compared Clinical Trial and Cancer Incidence Data
An in-depth look at newly approved cancer drugs, who participates in their clinical trials and who is affected by those cancers.
by Riley Wong
For our story, Black Patients Miss Out On Promising Cancer Drugs, ProPublica compared participant pools for clinical trials of new cancer treatments to the populations most at risk for the types of cancers targeted by these drugs. To conduct this analysis, we compiled two main data sets:
- Clinical trial data: Participant data by race for clinical trials of every cancer drug approved by the U.S. Food and Drug Administration between January 2015 and June 2018.
- Cancer incidence data: For 25 types of cancer, incidence data by race from the National Cancer Institute.
Clinical Trial Data
FDA Drug Trials Snapshots
In 2012, as part of the FDA Safety and Innovation Act, Congress asked the FDA to report clinical trial participation by demographic subgroup. In 2013, the agency found minorities were often underrepresented, noting that, for many of the drugs under consideration, “there were too few African American or Black patients in the trials to enable meaningful subset analysis.”
For every new drug approved starting in 2015, the FDA published a “Drug Trials Snapshot,” which includes the demographic breakdown for the clinical trial participants by sex, race, and age subgroups. ProPublica has compiled this data for all FDA-approved drugs from January 2015 to mid-August 2018 into a single dataset. Download this dataset at ProPublica's Data Store.
Snapshots included clinical trials run in the United States and internationally, but did not begin until 2017 to report what percentage of trials were conducted in the U.S. Though Asians appear to be well-represented in most trials, many of these trials were likely based outside of the United States. Analysis of 2017 data shows that, for drugs with at least 70 percent of trials conducted within the U.S., Asians make up only 1.7 percent of participants. Furthermore, the “Asian” category does not say if participants are of East Asian, South Asian, Southeast Asian, or Pacific Islander descent.
Reports did not include a Hispanic ethnicity category until 2017, and do not distinguish between white and non-white Hispanics, or between Hispanics of European or Latin American descent.
Cancer Trials Data
From the FDA Drug Trials Snapshots data, ProPublica identified 32 drugs that are primarily used to treat cancer, and were approved by the FDA between January 2015 and June 2018. To obtain more detailed racial breakdowns for these specific drugs, ProPublica manually compiled demographic information from individual FDA Snapshot reports into one dataset, shown in the table below.
Five drugs were approved more than once by the FDA to treat different types of cancers. For example, Lenvima was first approved in 2015 for differentiated thyroid cancer then in 2016 for renal cell carcinoma and again in 2018 for hepatocelullar carcinoma. Similarly, Imfinzi was approved in 2017 for urothelial carcinoma and in 2018 for non-small cell lung cancer. The FDA did not publish Snapshot data for clinical trials related to additional approvals, so we only included data on their first approvals.
Our final analysis for the story excluded one of the 32 drugs, Rydapt, which was approved for two uses. In its first approval, for acute myeloid leukemia, 57 percent of the racial demographic data for the trial was unreported. Its second approval was for aggressive systemic mastocytosis, systemic mastocytosis with associated hematological neoplasm, and mast cell leukemia. Since the first two conditions are not cancers, and we could not separate out data for patients with mast cell leukemia, we do not include Snapshot data related to the second approval.
Clinical Trial Demographics by Race,
for FDA-Approved Cancer Drugs
|Brand Name||Maker||Cancer Type||White||African American||Asian||Other: American Indian or Alaska Native||Other: Native Hawaiian or Other Pacific Islander||Other: Multiple/Mixed||Other: Unreported||Other: Other||Other: Unknown/Missing||Other: Aggregate||United States (2017 Only)||Year|
|COTELLIC||Roche||melanoma with a BRAF V600E or V600K mutation||93.0%||N/A||N/A||7.0%||7.0%||N/A||2015|
|ODOMZO||Sun Pharma (NVS developed)||basal cell carcinoma||94.0%||<1%||0.0%||6.0%||N/A||2015|
|LONSURF||Taiho Oncology||colorectal cancer||58.0%||1.0%||35.0%||6.0%||N/A||2015|
|PORTRAZZA||Eli Lilly||squamous non-small cell lung cancer||83.0%||1.0%||8.0%||<1%||<1%||8.0%||N/A||2015|
|TAGRISSO||AstraZeneca||EGFR T790M mutation-positive non-small cell lung cancer||36.0%||1.0%||60.0%||<1%||2.0%||3.0%||N/A||2015|
|IBRANCE||Pfizer||HR-positive, HER2-negative breast cancer||89.7%||1.2%||6.1%||3.0%||3.0%||N/A||2015|
|ALECENSA||Genentech||ALK-positive non-small cell lung cancer||73.5%||1.6%||18.2%||0.4%||0.0%||0.4%||4.3%||1.6%||7.0%||N/A||2015|
|LENVIMA||Merck||differentiated thyroid cancer||79.3%||2.0%||17.9%||0.3%||0.5%||<1%||N/A||2015|
|EMPLICITI||BMS & Abbvie||multiple myeloma||84.0%||4.0%||10.0%||<1%||2.0%||<1%||2.0%||N/A||2015|
|YONDELIS||J&J||liposarcoma or leiomyosarcoma||76.0%||12.0%||4.0%||1.0%||3.0%||1.0%||3.0%||8.0%||N/A||2015|
|VENCLEXTA||AbbVie||chronic lymphocytic leukemia with 17p deletion||94.0%||3.0%||<1%||<1%||2.0%||1.0%||3.0%||N/A||2016|
|RUBRACA||Clovis Oncology||deleterious BRCA mutation associated ovarian cancer||78.0%||4.0%||7.0%||2.0%||9.0%||11.0%||N/A||2016|
|LARTRUVO||Eli Lilly||soft tissue sarcoma||86.0%||8.0%||3.0%||2.0%||2.0%||N/A||2016|
|BAVENCIO||Merck KGaA & Pfizer||Merkel cell carcinoma||92.0%||0.0%||3.0%||3.0%||1.0%||4.0%||58%||2017|
|ZEJULA||Tesaro||epithelial ovarian, fallopian tube, or primary peritoneal cancer||87.0%||1.0%||3.0%||<1%||9.0%||9.0%||70%||2017|
|ALUNBRIG||Takeda||ALK-positive non-small cell lung cancer||67.0%||1.0%||31.0%||1.0%||1.0%||NR||2017|
|RYDAPT||Novartis||FLT3-positive acute myeloid leukemia||38.0%||2.0%||2.0%||<1%||<1%||<1%||57.0%||58.0%||33%||2017|
|BESPONSA||Pfizer||B-cell precursor acute lymphoblastic leukemia||71.0%||2.0%||17.0%||10.0%||10.0%||47.0%||2017|
|VERZENIO||Eli Lilly||HR-positive, HER2-negative breast cancer||60.6%||2.5%||27.0%||3.2%||0.2%||6.5%||19.0%||2017|
|KISQALI||Novartis||HR-positive, HER2-negative breast cancer||82.0%||3.0%||8.0%||<1%||<1%||3.0%||4.0%||7.0%||32%||2017|
|NERLYNX||Puma||HER2-overexpressed/amplified breast cancer||81.0%||3.0%||13.0%||3.0%||3.0%||32.0%||2017|
|CALQUENCE||AstraZeneca||mantle cell lymphoma||74.0%||3.0%||0.0%||23.0%||23.0%||36.0%||2017|
|IDHIFA||Agios & Celgene||acute myeloid leukemia||77.0%||6.0%||<1%||<1%||16.0%||<1%||17.0%||83.0%||2017|
|BRAFTOVI+MEKTOVI||Array Biopharma||melanoma with a BRAF V600E or V600K mutation||91.0%||0.0%||3.0%||0.5%||1.0%||3.5%||6.0%||9.0%||2018|
Source: U.S. Food and Drug Administration; ProPublica analysis
Credit: Riley Wong/ProPublica
Cancer Incidence Data
The National Cancer Institute runs the Surveillance, Epidemiology, and End Results (SEER) Program for tracking cancer statistics within the United States. For some of the most common cancer types, SEER provides Cancer Stat Facts, summary reports that contain incidence and mortality rates by race and binary gender, based off of SEER 2011-2015 data. For other cancer types, the SEER Cancer Query Systems allows queries to the database for incidence and mortality statistics by cancer type, race, and gender.
The SEER age-adjusted incidence rate for a cancer type is the number of new cases of that cancer per 100,000 people, weighted by the age distribution of the U.S. standard population.
Finally, SEER groups “Asian or Pacific Islander” into one category and does not provide disaggregated data for patients of East Asian, South Asian, Southeast Asian, or Pacific Islander descent.
Cancer Incidence Rates Per 100,000 People by Race and Per Year
|Cancer Type||White||African American||Asian||Native American|
|acute myeloid leukemia||4.6||3.9||3.5||2.5|
|ALK-positive non-small cell lung cancer||54.2||62.5||35.2||36.3|
|B-cell precursor acute lymphoblastic leukemia||1.0||0.3||0.5||N/A|
|basal cell carcinoma||22.0||1.0||N/A||N/A|
|chronic lymphocytic leukemia with 17p deletion||5.3||3.9||1.1||1.6|
|deleterious BRCA mutation associated ovarian cancer||12.1||9.3||9.6||9.0|
|differentiated thyroid cancer||12.4||7.2||7.8||11.9|
|EGFR T790M mutation-positive non-small cell lung cancer||54.2||62.5||35.2||36.3|
|epithelial ovarian, fallopian tube, or primary peritoneal cancer||12.1||9.3||9.6||9.0|
|FLT3-positive acute myeloid leukemia||4.6||3.9||3.5||2.5|
|HER2-overexpressed/amplified breast cancer||17.7||22.1||19.3||N/A|
|HR-positive, HER2-negative breast cancer||97.1||76.4||71.5||N/A|
|mantle cell lymphoma||1.2||0.5||N/A||N/A|
|melanoma with a BRAF V600E or V600K mutation||28.4||1.1||1.5||5.3|
|Merkel cell carcinoma||0.7||0.0||N/A||N/A|
|soft tissue sarcoma||4.5||5.1||2.8||2.8|
|squamous non-small cell lung cancer||12.2||15.5||5.6||10.0|
Native American figures include both American Indians and Alaska Natives. Decimals have been rounded to the nearest 0.1 for standardization.
For urothelial carcinoma, we used the incidence rates for bladder cancer. Additional research found that urothelial carcinoma accounts for 90 percent of all bladder cancers.
For differentiated thyroid cancer (DTC), we used the incidence rates for thyroid cancer. Additional research found that DTC accounts for 90 percent of thyroid cancers.
For epithelial ovarian cancer, we used the incidence rates for ovarian cancer. Additional research found that epithelial ovarian cancer accounts for 90 percent of ovarian cancers.
For acute myeloid leukemia (AML) that is FLT3 mutation-positive, we used the incidence rates for AML without the mutation, as additional research found that the frequency of FLT3 mutations did not differ between races.
For ALK-positive non-small cell lung cancer (NSCLC), we used the incidence rates for NSCLC without the mutation, as additional research found that race was not significantly associated with ALK rearrangement status.
For EGFR-mutation non-small cell lung cancer (NSCLC), we used the incidence rates for NSCLC without the mutation. Though additional research found that the EGFR mutant rate has been noted to be higher in Asian populations, data was not available to translate this finding into incidence rates by race.
For chronic lymphocytic leukemia (CLL) with 17p deletion, we used the incidence rates for CLL without the mutation. Though additional research has found that, out of those with CLL, black patients have a greater frequency of the 17p deletion compared to non-black patients, data was not available to translate this finding into incidence rates by race.
Source: U.S. Food and Drug Administration; ProPublica analysis
Credit: Riley Wong/ProPublica
Some of the cancer drugs in our compiled dataset treat a specific subset of a cancer, e.g. HR-positive, HER2-negative breast cancer. Since these more specific incidence rates were not available in SEER, we looked at additional research on these specific subsets to calculate incidence rates:
For B-cell precursor acute lymphoblastic leukemia, we used additional distribution data to calculate the incidence of the B-cell precursor subtype among lymphoblastic leukemias overall.
For HR-positive, HER2-negative breast cancer, we used additional distribution data to calculate the incidence of the HR-positive, HER2-negative subtype among breast cancers overall.
For HER2-overexpressed/amplified breast cancer, we used additional distribution data to calculate the incidence of the HER2-overexpressed/amplified subtype among breast cancers overall.
For seven cancers, we used additional research to find incidence rates by race for patients: liposarcoma, leiomyosarcoma, follicular lymphoma, mantle cell lymphoma, soft tissue sarcoma, basal cell carcinoma, and Merkel cell carcinoma.
Caroline Chen contributed to this methodology.