Skip to content
ProPublicaDonate
ProPublicaDonate
The information in this archive of the Data Store is not actively updated. It is provided as a historical snapshot.

Data Store Archive

Browse datasets released between 2013 and 2023.

The ProPublica Data Store is no longer updated. The Data Store was a project to give readers access to the data behind our reporting. Datasets previously available in the Data Store are listed on this page for archival purposes.

Some free datasets are available for download under our terms of use. None of the datasets provided by ProPublica on this page are actively updated (even though some datasets might list a defunct schedule for updates).

Two types of product are listed on this page:

Free datasets

The following datasets are available for free download, according to our terms of use.

2016 Election: Congressional and Presidential Candidates

Link Copied!
Source
Federal Election Commission, The Green Papers, Center for Responsive Politics, Google
Date released
November 2016
Dates covered
As of November 3, 2016
Topic
Politics
Terms of Use
Standard terms of use

A listing of active presidential and congressional candidates for 2016, with some additional columns, as of November 3, 2016.

Learn more about this dataset

A listing of active presidential and congressional candidates for 2016, with some additional columns. The basis for this data is the Federal Election Commission’s candidate master file, and includes the columns described there. ProPublica has removed candidates no longer running in the general election, based on fundraising data and The Green Papers. ProPublica has added a `clean_name` column that converts the candidate name from all-capital letters and makes it more suitable for display. In addition, we’ve added columns with the ID used by the Center for Responsive Politics and the ID used by Google’s Knowledge Graph Search, where available.

Get the data:

2018 Midterm Election Congressional Candidates

Link Copied!
Source
Federal Election Commission, The Green Papers, Center for Responsive Politics, Google
Date released
September 2018
Dates covered
As of September 20, 2018
Topic
Politics
Terms of Use
Standard terms of use

A listing of active congressional candidates for the 2018 midterm elections, with some additional columns, as of September 21, 2018.

Learn more about this dataset

A listing of active congressional candidates for the 2018 midterm elections, with some additional columns. The basis for this data is the Federal Election Commission’s candidate master file, and includes the columns described there. ProPublica has removed candidates no longer running in the general election, based on fundraising data and The Green Papers. ProPublica has added a `clean_name` column that converts the candidate name from all-capital letters and makes it more suitable for display, as well as a `url` column with the candidate's official website. In addition, we’ve added columns with the ID used by the Center for Responsive Politics and the ID used by Google’s Knowledge Graph Search, where available.

Get the data:

ACA Plan Compare Data (2014-2015)

Link Copied!
Source
Centers for Medicare & Medicaid Services
Date released
December 2014
Dates covered
2014-2015
Rows
79279
Topic
Health
Featured use
Comparing 2015 Obamacare Plans
Terms of Use
Standard terms of use

How is the cost of health insurance changing under the Affordable Care Act? This data compares differences between 2014 and 2015 ACA insurance plans.

Learn more about this dataset

This data compares differences between 2014 and 2015 Affordable Care Act insurance plans. The data comes already joined through a crosswalk file and includes fields that indicate if a plan changed, and by how much. ProPublica used to create the "Will My Obamacare Health Care Costs Go Up?" app.

Get the data:

Alternative Schools in U.S. School Districts

Link Copied!
Source
ProPublica analysis of U.S. Department of Education data
Date released
March 2017
Dates covered
2013-2014 School Year
Rows
1923
Topic
Education
Featured uses
Terms of Use
Standard terms of use

This data set provides an analysis of disparities between alternative and non-alternative schools across nearly 2,000 school districts.

Learn more about this dataset

Nearly 2,000 school districts in the United States had alternative schools during 2013-14 school year. This data set provides details on the number of students enrolled in alternative schools within each district, as well as comparative metrics on alternative and non-alternative schools within each district, including student-to-teacher ratios, school funding, teacher experience level, access to counseling, graduation rates, and more.

All data is for the 2013-14 school year unless otherwise noted. The file was created by ProPublica, combining and summarizing several publicly available school-level data sets. Data set does not include information about individual schools.

The federal data used in our analysis relies on reports from states, which in turn often rely on reports from school districts. While the federal data collection efforts include some verification and data cleaning, the data is only as accurate as states’ record-keeping and reporting allows.

We identified alternative schools using a school type classification from the Common Core of Data (CCD), compiled by the U.S. Department of Education’s National Center for Education Statistics, with a few modifications. Some charter schools are authorized under their own administrative agency or under an agency other than a regular, local school district.

We reassigned such schools to the district where they are located geographically, to better capture the number of total and alternative students in each district. The reassignment was done using a geographic crosswalk provided by the Stanford Education Data Archive.

Additional information about using the data is included in the documentation provided with the data download.

Get the data:

Amazon Pricing Data

Link Copied!
Source
Amazon, ProPublica
Date released
September 2016
Dates covered
Summer 2016
Rows
6973
Methodology
How We Analyzed Amazon's Shopping Algorithm
Topic
Business
Featured uses
Terms of Use
Standard terms of use

Data collected by ProPublica on product and shipping costs for 6,973 vendor listings of 250 best-selling products on Amazon.

Learn more about this dataset

ProPublica reporters examined Amazon’s shopping algorithm; we scraped data from the company's website to examine listings for 250 bestselling products across a wide range of categories, from electronics to household supplies, over a period of several weeks during summer 2016. We compared pricing and shipping costs for products offered by multiple vendors, including those sold by Amazon, sellers in the "Fulfilled by Amazon" program. In total, we examined 6,973 vendor listings.

Get the data:

Audio: Crying Children Inside a U.S. Customs and Border Protection Facility

Link Copied!
Source
See story for details
Date released
June 2018
Topic
Politics
Featured use
Listen to Children Who’ve Just Been Separated From Their Parents at the Border
Terms of Use
Standard terms of use

ProPublica obtained audio from inside a U.S. Customs and Border Protection facility, in which children can be heard wailing as an agent jokes, “We have an orchestra here.”

Learn more about this dataset

The desperate sobbing of 10 Central American children, separated from their parents one day last week by immigration authorities at the border, makes for excruciating listening. Many of them sound like they’re crying so hard, they can barely breathe. They scream “Mami” and “Papá” over and over again, as if those are the only words they know.

The baritone voice of a Border Patrol agent booms above the crying. “Well, we have an orchestra here,” he jokes. “What’s missing is a conductor.”

Then a distraught but determined 6-year-old Salvadoran girl pleads repeatedly for someone to call her aunt. Just one call, she begs anyone who will listen. She says she’s memorized the phone number, and at one point, rattles it off to a consular representative. “My mommy says that I’ll go with my aunt,” she whimpers, “and that she’ll come to pick me up there as quickly as possible.”

Read the full story here.

Get the data:

Bryan Independent School District - Office for Civil Rights Investigation Emails

Link Copied!
Source
Bryan Independent School District Public Records Request
Date released
April 2018
Dates covered
September 2013 - January 2018
Rows
~2660 emails, 3 Mbox files
Topic
Education
Featured use
Shutdown of Texas Schools Probe Shows Trump Administration Pullback on Civil Rights
Terms of Use
Standard terms of use

Emails related to an Office for Civil Rights investigation into the Bryan (Texas) Independent School District’s disciplinary practices.

Learn more about this dataset

This data set contains emails received by ProPublica through a public records request to the Bryan (Texas) Independent School District related to an Office for Civil Rights investigation into the school district’s disciplinary practices. About a dozen emails have been removed that contained lists of student names and contact information.

Get the data:

CDC Mortality Data

Link Copied!
Source
Centers for Medicare & Medicaid Services
Dates covered
See website
Topic
Health
Featured use
Overdose
Terms of Use
Standard terms of use

The CDC's mortality and cause-of-death data set. ProPublica used this data to report on the dangers of Tylenol overdose.

Learn more about this dataset

The CDC's mortality and cause-of-death data set. ProPublica used this data to report on the dangers of Tylenol overdose.

Get the data:

Chicago Police Department Data on Gang Members

Link Copied!
Source
Chicago Police Department
Date released
March 2018
Rows
128,000
Topic
Criminal Justice
Featured use
Chicago’s gang database is full of errors -- and records we have prove it
Terms of Use
Standard terms of use

This dataset contains descriptive information on individuals that the Chicago Police Department has classified as gang members.

Learn more about this dataset

This dataset contains descriptive information on individuals that the Chicago Police Department has classified as gang members. There are no names or identification included in the data, but each row includes the gang affiliation, race, age, date of subject’s first arrest or first entry into the data system, and the police beat where the first arrest occurred.

ProPublica Illinois has identified numerous problems with this data. Technically, there isn’t a stand-alone list or database of suspected gang members, though police often refer to it that way. The department tracks gang affiliation along with arrest records, reported crimes and other information in a massive data “warehouse” called the Citizen and Law Enforcement Analysis and Reporting (CLEAR) system.

Officers enter information about everyone who is arrested, as well as many people who are stopped but not charged with a crime. The department’s internal rules for classifying someone as a gang member are fuzzy. If suspects admit they are in a gang or have gang tattoos, that counts. In some instances, police base the decision on what they hear from sources deemed to have given them “reliable information” in the past.

Get the data:

Chicken Checker

Link Copied!
Source
United States Department of Agriculture Food Safety and Inspection Service
Date released
November 2021
Dates covered
2000-2020
Rows
33,805
Methodology
About the data
Topic
Health
Featured use
Chicken Checker
Terms of Use
Standard terms of use

Data on every salmonella testing sample collected at U.S. poultry processing plants.

Learn more about this dataset

The USDA posts public data containing the results of every salmonella sample it takes at poultry processing plants nationwide, detailing when it took the sample, what type of poultry it sampled and whether or not it found salmonella. If salmonella was present, the records include information on the type of salmonella found and whether it was resistant to antibiotics.

While the USDA splits these into different files based on type of poultry — for example, ground chicken or whole turkey — and sample date, we combined the data into one file. The USDA uses this information to categorize plants based on whether or not they achieved the agency’s salmonella targets. ProPublica used this sampling dataset to calculate the salmonella positivity rates shown in the Chicken Checker app. While our app uses the most recent 52 weeks of data available, we are including all samples that have been available on the app since its publication.

Get the data:

Child Abuse Prevention and Treatment Act Reports, 2011-2015

Link Copied!
Source
State and county child welfare agencies
Date released
December 2019
Dates covered
2011-2015
Rows
6,511
Topic
Criminal Justice
Provided in collaboration with
The Boston Globe
Featured use
Nobody Knows How Many Kids Die From Maltreatment and Abuse in the U.S.
Terms of Use
Standard terms of use

More than 6,500 state-level reports of children who died of abuse or neglect across 38 U.S. states.

Learn more about this dataset

This data set contains more than 6,500 state-level reports of children who died of abuse or neglect across 38 U.S. states. You can also search this data through our interactive website.

The Child Abuse Prevention and Treatment Act requires states make available certain information about children who die of abuse or neglect. ProPublica requested this information in all 50 states, and the records contained in this dataset were provided by 38 states and, in three states, individual counties. In some states we have augmented the records with additional information. For example, in Alabama we have procured autopsy records for children whose autopsies were available.

Upon request, CAPTA requires states to list the age and gender of the child, and information about a household’s prior contact with welfare services. The information is supposed to help government agencies prevent child abuse, neglect and death, but reporting across states is so inconsistent that comparisons and trends are impossible to identify. Some states release more than they are required to, but most do not release enough. Journalists should not use this data to make numerical conclusions about child abuse or neglect.

The free download of this CSV file includes the information publicly available on our site. Journalists may contact ProPublica at [email protected] for a copy of a CSV file containing additional information about each child that can be joined with the free dataset on the column "DBN." The non-public information includes the names of children we believe we’ve identified through news articles, links to relevant news articles, and narrative summaries of their deaths and any previous contact the children had with child welfare services. These should be thoroughly fact checked before publication of the material. Each row in the dataset is one child. If more than one child was killed in a singular incident, each child has his or her own row.

Get the data:

City of Chicago Camera Tickets and Warnings Data

Link Copied!
Source
Chicago Department of Finance
Date released
July 2021
Dates covered
Jan. 1, 2010 to June 13, 2021
Rows
39,387,511
Topic
Transportation
Featured use
Chicago’s “Race-Neutral” Traffic Cameras Ticket Black and Latino Drivers the Most
Terms of Use
Standard terms of use

Information about every red-light and speed camera ticket and warning issued by the City of Chicago from January 1, 2010 to mid-2021.

Learn more about this dataset

This dataset contains information about every ticket and warning issued by the City of Chicago through its red-light and speed camera programs since January 1, 2010.

Data encompasses red-light tickets given beginning at midnight on Jan. 1, 2010 through late in the evening on June 13, 2021. Speed camera data is from the morning of Oct. 16, 2013, when the program launched, through midday on May 4, 2021. The database contains every ticket that was vetted and sent to the city by the camera operators at the time the data was exported. The red-light and speed camera ticket review processes are separate, which is why this dataset has two end dates.

ProPublica reporters used this dataset to examine disparities in the city’s camera ticketing program. The reporters found that households in majority-Black and majority-Hispanic ZIP codes were ticketed at higher rates than their majority-white counterparts. Reporters supplemented their analysis with data previously obtained by ProPublica. That data, available here, includes red-light camera ticket data from the start of that program in 2003.

Each data point represents a single ticket or warning issued by the city and contains information about the type of violation, when and where a violation was recorded, the ZIP code of the registered vehicle owner and information about the recent status of the ticket (e.g., whether it was dismissed, paid or connected to a bankruptcy).

The dataset, which was provided to ProPublica by the Chicago Department of Finance on July 1, 2021, is a snapshot of the city’s camera tickets at that point in time. The status of tickets and amount due were current at the time the database was exported by the city.

Get the data:

City of Chicago Parking and Camera Ticket Data

Link Copied!
Source
Chicago Department of Finance, ProPublica Illinois
Date released
May 2018
Dates covered
1996-2018 (see description for details)
Rows
28,272,580
Topic
Transportation
Provided in collaboration with
WBEZ
Featured uses
Terms of Use
Standard terms of use

A detailed dataset of parking, vehicle compliance, and camera tickets issued in Chicago. Includes details on when and where tickets were issued, the violation for which the vehicle was cited, payment status and more.

Learn more about this dataset

This dataset provides details on all parking and vehicle compliance tickets issued in Chicago from January 1, 1996 to May 14, 2018. It also includes camera ticket data issued in Chicago from November 1, 2003 to May 3, 2018.

ProPublica Illinois, in collaboration with WBEZ, used this dataset to report on city sticker tickets, which come with the steepest fines of any parking citation and are the largest source of ticket debt in Chicago.

The data includes information on when, where, and by whom tickets were issued; de-identified license plates; vehicle make; registration zip code; the violation for which the vehicle was cited; the payment status and more. ProPublica Illinois has also added block-level address information to the location where a ticket was issued.

The City of Chicago has said that an official data dictionary does not exist. Through interviews with finance department officials and other reporting, we have compiled our own version, which is included with the download.

Get the data:

Civilian Complaints Against New York City Police Officers

Link Copied!
Source
New York City’s Civilian Complaint Review Board
Date released
July 2020
Dates covered
September 1985 - January 2020
Rows
33,358
Topic
Criminal Justice
Featured uses
Terms of Use
Standard terms of use

A database of more than 12,000 civilian complaints filed against New York City police officers. Includes incidents ranging from September 1985 to January 2020.

Learn more about this dataset

This free download is a database of more than 12,000 civilian complaints filed against New York City police officers.

After New York state repealed the statute that kept police disciplinary records secret, known as 50-a, ProPublica filed a records request with New York City’s Civilian Complaint Review Board, which investigates complaints by the public about NYPD officers. The board provided us with records about closed cases for every police officer still on the force as of late June 2020 who had at least one substantiated allegation against them. The records span decades, from September 1985 to January 2020.

We have published, and are releasing for download here, a version of the data that excludes any allegations that investigators concluded did not occur and were deemed unfounded.

We chose to include the basic information disclosed by the CCRB about allegations that investigators deemed unsubstantiated. Unsubstantiated means the CCRB, which has limited investigative powers, was not able to confirm that the alleged incident happened and that it violated the NYPD’s rules.

We also chose to include cases where an investigator found that what a civilian alleged did happen but the conduct was allowed by the NYPD’s rules. The Police Department’s guidelines often give officers substantial discretion, particularly around use of force. Those cases are classified as “exonerated.”

All this information can help readers examine the records of officers who have been the subject of a pattern of complaints.

Each record in the data lists the name, rank, shield number, and precinct of each officer as of today and at the time of the incident; the age, race and gender of the complainant and the officer; a category describing the alleged misconduct; and whether the CCRB concluded the officers’ conduct violated NYPD rules.

Every complaint in the database was fully investigated by the CCRB, which means, among other steps, a civilian provided a sworn statement to investigators. The CCRB was not able to reach conclusions in many cases, in part because the investigators must rely on the NYPD to hand over crucial evidence, such as footage from body-worn cameras. Often, the department is not forthcoming despite a legal duty to cooperate in CCRB investigations. The CCRB gets thousands of complaints per year but substantiates a tiny fraction of them. Allegations of criminal conduct by officers are typically investigated not by the CCRB but by state or federal prosecutors in conjunction with the NYPD’s Internal Affairs Bureau or the FBI.

The download includes the information on this page, a layout table and basic glossary for the fields included.

Updated 7/27/20: Download was updated to include expanded documentation and shield numbers.

Get the data:

Clinical Trials: Participant Demographic Data

Link Copied!
Source
U.S. Food and Drug Administration, Drug Trials Snapshots
Date released
August 2018
Dates covered
January 2015 to mid-August 2018
Rows
155
Methodology
How We Compared Clinical Trial and Cancer Incidence Data
Topic
Health
Featured use
Black Patients Miss Out On Promising Cancer Drugs
Terms of Use
Standard terms of use

This dataset contains the demographic breakdowns of participants in clinical trials for FDA-approved drugs between January 2015 and June 2018.

Learn more about this dataset

This dataset contains the demographic breakdowns of participants in clinical trials for FDA-approved drugs between January 2015 and June 2018. The FDA has been providing demographic reports for each approved drug since January 2015. While the FDA provides summary reports by year, sometimes in PDF format only, this dataset was compiled to include all available data across years in an easily usable format.

The columns of the dataset include: brand name; drug indication; percentage of women in the clinical trials; percentage of participants by race: white, black or African American, Asian, and other; percentage of participants of Hispanic ethnicity; percentage of participants who are age 65 and older; and year.

The "Other" race category was used as a catch-all for any of these categories: American Indian/Alaska Native (AI/AN), Native Hawaiian or Other Pacific Islander (NH/OPI), mixed race, multiple races, Unknown, Unreported, and Other. While the FDA also provides these demographic breakdowns by drug, which contains more detailed information, raw numbers for patients, and occasionally disaggregated "Other" categories, we did not include this information here. For individual drugs, the disaggregated "Other" categories are not consistent.

For drugs approved in 2015 and 2016, percentages for the "Other" category were provided in FDA summary reports. For 2017 drugs, we calculated this percentage by subtracting the other categories from 100%. For 2018 drugs, we manually compiled these percentages from the reports for each individual drug.

The "Hispanic" ethnicity category was not included in the yearly summary reports for 2015 and 2016, although it is sometimes included in individual drug reports. Note that this percentage is one category out of the following: Hispanic, Not Hispanic, and Unknown/Unreported. Also to note is that some drugs report "Hispanic or Latino" whereas others only have "Hispanic."

ProPublica used this data in our piece about racial representation in cancer clinical trials. We analyzed this data to determine the race distribution of patients in clinical trials for cancer drugs. We also compiled a more detailed dataset, including disaggregated "Other" categories, using the FDA demographic reports specifically for drugs indicated to treat cancer.

Get the data:

Commander's Emergency Response Program Data

Link Copied!
Source
Special Inspector General for Afghanistan Reconstruction
Date released
January 2015
Dates covered
2003-2012
Rows
17958
Topic
Military
Featured use
How U.S. Commanders Spent $2 Billion of Petty Cash in Afghanistan
Terms of Use
Standard terms of use

This dataset contains individual payments totaling $2 billion made by U.S. military commanders to the Afghan people during the Afghanistan war.

Learn more about this dataset

This data contains details on $2 billion in payments made by U.S. military commanders to the Afghan people during the Afghanistan war under the Commander's Emergency Response Program Data. It was released to ProPublica under a Freedom of Information Act. The data was culled from several different databases by the Special Inspector General for Afghanistan Reconstruction (SIGAR).

Get the data:

COMPAS Recidivism Risk Score Data and Analysis

Link Copied!
Source
Broward County Clerk’s Office, Broward County Sherrif's Office, Florida Department of Corrections, ProPublica
Dates covered
2013-2014
Methodology
How We Analyzed the COMPAS Recidivism Algorithm
Topic
Criminal Justice
Featured use
Machine Bias
Terms of Use
Standard terms of use
The data, code, and documentation behind our analysis of Northpointe, Inc.'s COMPAS risk-assessment algorithm for the story, "Machine Bias," by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner.
Learn more about this datasetAcross the nation, judges, probation and parole officers are increasingly using algorithms to assess a criminal defendant’s likelihood to re-offend. There are dozens of these risk assessment algorithms in use, including two leading nationwide tools offered by commercial vendors. Our story, "Machine Bias," set out to assess one of the commercial tools, called COMPAS (which stands for Correctional Offender Management Profiling for Alternative Sanctions), made by Northpointe, Inc. to discover the underlying accuracy of their recidivism algorithm and to test whether the algorithm was biased against certain groups.

The linked data includes: a database containing the criminal history, jail and prison time, demographics and COMPAS risk scores for defendants from Broward County from 2013 and 2014; code in R and Python; a Jupyter notebook; and other files needed for the analysis.

Get the data:

Consumer Bankruptcy Case Filings

Link Copied!
Date released
October 2017
Topic
Business
Terms of Use
Standard terms of use

Raw data on bankruptcy cases from the Department of Justice, used in ProPublica's analysis of consumer bankruptcy between 2008 and 2015.

Learn more about this dataset

ProPublica's analysis of consumer bankruptcy filings used raw data on bankruptcy cases from the Department of Justice. The data is also available through ICPSR. The data used for ProPublica's analysis was limited to only cases filed between 2008 and 2015 that included consumer debts and were originally filed under either Chapter 7 or Chapter 13.


Get the data:

Cook County Commercial and Industrial Property Tax Assessments

Link Copied!
Source
Cook County Assessor's Office
Date released
December 2017
Dates covered
2002-2016
Rows
3.8 million
Topic
Business
Provided in collaboration with
Chicago Tribune
Terms of Use
Standard terms of use

This download includes three different data sets that were provided to ProPublica Illinois and the Chicago Tribune by the Cook County Assessor's Office.

Learn more about this dataset

This download includes three different data sets that were provided to ProPublica Illinois and the Chicago Tribune by the Cook County Assessor's Office. The three datasets are:

  • The raw first-pass or initial assessment values and market values for each property, identified only by its Property Index Number (PIN), which is unique for each parcel of property. Covers 2002-2015
  • Property assessment data on each PIN that was the subject of an appeal, including property information, as well as initial assessed values, second-pass assessed values, and the final assessed values (incorporating any successful appeals to the Cook County Board of Review), as well as attorney names. Covers 2003-2016.
  • The raw final assessed values that were submitted to the Cook County Board of Review by the Cook County Assessor's office for each PIN. Covers 2002-2015.

The Assessor's Office did not provide documentation.

Get the data:

Cook County Regional Gang Intelligence Database

Link Copied!
Source
Cook County Sheriff’s Office
Date released
July 2018
Dates covered
June 2018
Rows
25,063
Topic
Criminal Justice
Featured use
Like Chicago Police, Cook County and Illinois Officials Track Thousands of People in Gang Databases
Terms of Use
Standard terms of use

A snapshot of the gang database maintained by the sheriff’s office and jail in Cook County, Illinois, along with other law enforcement agencies. Personal identifiers have been removed.

Learn more about this dataset

Updated August 8, 2018 This dataset is a snapshot of the gang database maintained by the sheriff’s office and jail in Cook County, Illinois, along with other law enforcement agencies. Names and other personally identifying details have been removed, but the data includes information about the gender, appearance, gang affiliation, zip code, and race of the individuals listed. Additional information is also included, such as whether the individual wears gang colors or has tattoos, has self-identified their gang involvement, is under probation and more.




Get the data:

Credibly Accused Priests

Link Copied!
Source
U.S. Catholic archdiocese, diocese, eparchies and religious orders; ProPublica reporting; Pontifical Yearbook (2019)
Date released
January 2020
Dates covered
2002-2020
Rows
6,754
Methodology
We Assembled The Only Nationwide Database of Priests Deemed Credibly Accused of Abuse. Here's How.
Topic
Religion
Featured use
Catholic Leaders Promised Transparency About Child Abuse. They Haven't Delivered.
Terms of Use
Standard terms of use

Data on more than 5,800 clergy who have been listed as credibly accused of sexual abuse in reports released by Catholic dioceses and religious orders.

Learn more about this dataset

This dataset contains all of the information included in ProPublica's interactive database that lets users search for clergy who have been listed as credibly accused of sexual abuse in reports released by Catholic dioceses and religious orders.

ProPublica combined information from nearly 180 lists into a single database. More than 6,700 names are included, and over 5,800 of them are unique. A little more than half of the people named were listed as being deceased.

Your download contains two tables. The first contains the names, dioceses, assignment histories, ordination dates, birth years and other information about credibly accused priests and clergy. The second, which can be joined to the first by the diocese id, contains additional information that was collected and reported out from dioceses, including the date of the list release and the Catholic population of the dioceses.

Last updated August 31, 2021.

Get the data:

Debt Collection Datasets

Link Copied!
Source
ProPublica analysis, state court data (various jurisdictions)
Dates covered
Varies
Rows
Varies
Topics
  • Business
  • Finance
Featured use
So Sue Them: What We’ve Learned About the Debt Collection Lawsuit Machine
Terms of Use
Standard terms of use

This file contains data from a variety of state courts about how debt collectors and banks have used lawsuits to collect on old debts.

Learn more about this dataset

This file contains data from a variety of state courts about how debt collectors and banks have used lawsuits to collect on old debts. It was used to to create the graphics "So Sue Them: What We've Learned About the Debt Collection Lawsuit Machine," by Paul Kiel and Lena Groeger.

Get the data:

Defense Environmental Restoration Program Sites

Link Copied!
Source
U.S. Department of Defense
Date released
November 2017
Dates covered
Site data as of 2015
Topics
  • Military
  • Environment
Featured use
Bombs In Our Backyard
Terms of Use
Standard terms of use

Data on all cleanup efforts administered by the Department of Defense at current and former military locations as of 2015.

Learn more about this dataset

The Defense Environmental Restoration Program, which is administered by the Department of Defense, measures and documents cleanup efforts at current and former military locations. These efforts include the cleanup of sites that contain toxic pollutants and contaminants in the soil or water, as well as sites that contain explosives or discarded military munitions.

This Oracle database, collected under the Defense Environmental Restoration Program, documents the Department of Defense’s cleanup program for active military installations, closed or closing installations and formerly used defense sites. ProPublica obtained this information, last updated in 2015, through a Freedom of Information.

Get the data:

Documenting Hate News Index (Raw Data)

Link Copied!
Source
Google News
Date released
August 2017
Dates covered
February 13, 2017 - Present
Rows
Varies
Topic
Criminal Justice
Featured use
Documenting Hate News Index
Terms of Use
Standard terms of use

News stories about hate crimes collected by Google News.

Learn more about this datasetThis download includes a set of news stories about hate crimes collected by Google News, including the title, date, publisher, location, keywords, and a brief summary of each story. The data is updated weekly.

Get the data:

Emergency Rooms, Hospital Inspection Reports

Link Copied!
Source
Center for Medicare and Medicaid Services
Date released
September 2019
Methodology
About ER Inspector's Data
Topic
Health
Featured use
ER Inspector
Terms of Use
Standard terms of use

While CMS releases data on all types of hospital violations, ProPublica's ER Inspector specifically pulls out the violations related to ER care. You can see all violations in the raw CMS data.

Learn more about this dataset

While CMS releases data on all types of hospital violations, ProPublica's ER Inspector specifically pulls out the violations related to ER care. You can see all violations in the raw CMS data.

ER-related violations include those relating to not properly assessing and treating patients, inadequate medical and nursing staff, or not following ER policies and procedures. It also includes violations of the Emergency Medical Treatment and Labor Act (EMTALA), which requires ERs to provide a medical screening examination to anyone who comes to the emergency department, regardless of their ability to pay.

Get the data:

Emergency Rooms, Timely and Effective Care (Hospital Level Data)

Link Copied!
Source
Center for Medicare and Medicaid Services
Date released
September 2019
Rows
90,402
Methodology
About ER Inspector's Data
Topic
Health
Featured use
ER Inspector
Terms of Use
Standard terms of use

Much of the data in ProPublica's ER Inspector interactive database comes from CMS's Timely and Effective Care datasets. While ER Inspector only uses hospital level data, it is also provided at the state and national level.

Learn more about this dataset

Much of the data in ProPublica's ER Inspector interactive database comes from CMS's Timely and Effective Care datasets. While ER Inspector only uses hospital level data, it is also provided at the state and national level.

This data set includes hospital-level data for measures of cataract surgery outcome, colonoscopy follow-up, heart attack care, emergency department care, preventive care, blood clot prevention, pregnancy and delivery care, and cancer care.

Get the data:

Facebook Ad Categories

Link Copied!
Source
ProPublica, Facebook
Date released
December 2016
Rows
52235 / 29176
Topic
Business
Featured uses
Terms of Use
Standard terms of use

A unique data set of Facebook ad groups and interest categories, collected by ProPublica reporters.

Learn more about this dataset

This dataset includes two tables: data on the interest categories Facebook shows to users and the ad groups its shows to advertisers. ProPublica used this data to show that Facebook tells its users a lot of things it knows about them, but not all the things it's selling to advertisers.

Interest category data was compiled using a Chrome extension, built by ProPublica reporters. The extension showed users the interest categories Facebook assigned to them, and gave users the opportunity to share all of these categories with ProPublica. The data shared did not include any identifiable user information. Through this extension, ProPublica crowdsourced 52,235 unique interest categories.

The second table contains data scraped from the company's ad buying portal. This table shows what audiences Facebook allows ad buyers to target.

Get the data:

Federal Air Marshal Misconduct Database

Link Copied!
Source
Transportation Security Administration
Date released
February 2016
Dates covered
November 2002 - February 2012
Rows
5214
Topic
Transportation
Featured use
The TSA Releases Data on Air Marshal Misconduct, 7 Years After We Asked
Terms of Use
Standard terms of use

Information on 5,214 cases of misconduct committed by federal air marshals and how they were disciplined.

Learn more about this dataset

Federal air marshals fly undercover on passenger planes and are trained to intervene in the event of a hijacking. This database contains information on 5,214 cases of misconduct committed by federal air marshals by date and field office and what discipline was meted out in response. The data covers November 2002 to February 2012.

Get the data:

Free the Files Filing Data

Link Copied!
Source
ProPublica, Federal Communications Commission
Date released
January 2015
Dates covered
2012
Rows
66225
Topic
Politics
Featured use
Help ProPublica Unlock Political Ad Spending
Terms of Use
Standard terms of use

Data on approximately $1.2 billion in political ad buys in 33 markets during the 2012 election. This data set was created by nearly 1,000 volunteers and curated by ProPublica reporters.

Learn more about this dataset

Data on approximately $1.2 billion in political ad buys in 33 markets during the 2012 election. This data set was created by nearly 1,000 volunteers and curated by ProPublica reporters.

Get the data:

Georgia Title Lenders

Link Copied!
Source
Georgia Department of Revenue, Google Maps, company websites
Date released
November 2022
Dates covered
October 2022
Rows
490
Methodology
How We Measured the Title-Lending Industry in Georgia
Topics
  • Business
  • Finance
Provided in collaboration with
The Current
Featured use
How Title Lenders Trap Poor Americans in Debt With Triple-Digit Interest Rates
Terms of Use
Standard terms of use

Title lender locations in Georgia

Learn more about this dataset

This data lists the title lenders located in Georgia that were identified by ProPublica and The Current, a nonprofit newsroom based in Savannah, Georgia.

Title lenders in Georgia are regulated under the state’s pawn shop statutes and licensed at the local level, so there is no official statewide list of storefronts that offer “title pawns.”

ProPublica and The Current compiled this list using information from Google Maps and corporate websites, along with vehicle lien data from Georgia Department of Revenue’s motor vehicle division. The news organizations also verified locations by calling stores and checking company websites to ensure that they were in operation and issued title pawns. Online-based title lenders are not included.

Some of the state’s licensed installment lenders offer auto-secured loans; however, these locations were excluded unless they referred to their product specifically as a “title pawn.”

This data was finalized in October 2022. However, because this list was compiled over the course of several months, it is possible that a limited number of store openings and closures that occurred close to the time of the data’s publication may not be reflected.

Counties and state legislative districts were added to this data using spatial joins with the U.S. Census Bureau’s TIGER shapefiles and the Atlanta Regional Commission’s Georgia legislative district shapefiles, respectively.

ProPublica and The Current found that these title lending storefronts are disproportionately located in lower-income ZIP codes and those with higher proportions of people of color.


For more detail on our analysis of the title-lending industry in Georgia, see the original story and the section at the end titled, “How We Measured the Title-Lending Industry in Georgia.”

Get the data:

Harris County Flood Control District Buyout Data

Link Copied!
Source
Harris County Flood Control District
Date released
January 2018
Dates covered
1985-2017
Rows
3077
Topic
Environment
Featured use
Buyouts Won’t Be the Answer for Many Frequent Flooding Victims
Terms of Use
Standard terms of use

Property buyouts in Harris County made by the Harris County Flood Control District between 1985 and 2017.

Learn more about this dataset

Property buyouts in Harris County made by the Harris County Flood Control District between 1985 and 2017. Columns include buyout program, date executed, purchase amount and ZIP code.

Get the data:

Hawaii Seawall Exemptions

Link Copied!
Source
Honolulu Star-Advertiser, Hawaii Department of Land and Natural Resources, City and County of Honolulu, Hawaii Legislature, Maui County, and Kauai County
Date released
December 2020
Dates covered
2000-2020
Rows
230
Topic
Environment
Provided in collaboration with
Honolulu Star-Advertiser
Featured uses
Terms of Use
Standard terms of use

Information about properties in Hawaii that received exemptions from local and state laws that bar property owners from building seawalls.

Learn more about this dataset

Hawaii’s beaches are owned by the public, and the government is required to preserve them. So years ago, officials adopted a “no tolerance” policy toward new seawalls, which scientists say are the primary cause of coastal erosion. But over the past two decades, oceanfront property owners across the state have used an array of loopholes in state and county laws to get around that policy, armoring their own properties at the expense of the environment and public shoreline access.

This dataset contains information about properties that received exemptions from these environmental laws (and were allowed to keep existing shoreline structures or build new ones) between 2000 and 2020, including the type of exemption, the location of the site, the dates of the exemptions, and in some cases the fees paid. ProPublica and the Honolulu Star-Advertiser used the data to create this interactive map.

The data, which covers more than 230 exemptions, was compiled from public records requests filed with Hawaii’s Department of Land and Natural Resources and the City and County of Honolulu’s Department of Planning and Permitting. The records include state approvals for seawall easements and emergency sandbags, as well as county approvals for new or illegally constructed seawalls.

Records on seawall easements were compiled from individual paper files archived at the DLNR, as well as annual government reports filed with the Hawaii Legislature. The data for emergency permits was derived from paper files at the DLNR. The City and County of Honolulu, as well as the counties of Maui and Kauai, provided files for shoreline setback variances requested by private property owners seeking approvals for shoreline hardening structures.

A handful of properties with a known exemption in the past two decades could not be linked up to an address from source documents and were not marked on the map. They are not included in this download.

Get the data:

Home Price Impact of Tax Cuts and Jobs Act of 2017

Link Copied!
Source
Moody's Analytics
Date released
October 2019
Dates covered
Based on March 2019 data
Rows
3088
Methodology
How the estimated reductions were calculated
Topics
  • Business
  • Finance
Featured use
Trump’s Trillion-Dollar Hit to Homeowners
Terms of Use
Standard terms of use

A list of the estimated reduction in house values in about 3,000 counties throughout the United States as a result of 2017 policy changes.

Learn more about this dataset

This dataset is a county-by-county list of the estimated reduction in house values in about 3,000 counties throughout the country, as calculated by Mark Zandi, the chief economist of Moody’s Analytics. ProPublica used this data in "Trump’s Trillion-Dollar Hit to Homeowners," which highlighted its findings and identified the five counties with the largest estimated reductions: Essex County, New Jersey; Westchester County, New York, suburban New York City; Union County, New Jersey, which is adjacent to Essex County; New York County, the New York City borough of Manhattan; and Lake County, Illinois, suburban Chicago.

This dataset includes two columns — the county and the estimated percent reduction. To calculate the estimated reduction, Zandi took what financial analysts call the present value of the property tax and mortgage interest deductions that homeowners will lose over seven years (the average duration of a mortgage) because of changes in the tax law and subtracted it from the value of the typical house. That calculates the reduction in each county’s home values below what they would otherwise be. Zandi then adds an additional one percentage point of value shrinkage, which comes from the higher interest rates that he says will result from the higher federal budget deficits caused by the tax bill. He estimates that rates on 10-year Treasury notes, a key benchmark for mortgage rates, will be 0.2% higher than they would otherwise be, which in turn will make mortgage rates 0.2% higher.

Get the data:

Hospital Bed Capacity and COVID-19

Link Copied!
Source
Harvard Global Data Institute
Date released
March 2020
Dates covered
2018
Rows
306
Methodology
How ProPublica Mapped Hospital Capacity for Coronavirus
Topic
Health
Featured uses
Terms of Use
Standard terms of use

A dataset of hospital bed capacity data for each of 306 U.S. hospital markets, including data for nine different models of COVID-19 infection scenarios.

Learn more about this dataset

A dataset of hospital bed capacity data for each of 306 U.S. hospital markets, including data for nine different models of COVID-19 infection scenarios. The data comes from a team of researchers at the Harvard Global Data Institute. They modeled various scenarios, in which 20%, 40% and 60% of the adult population would be infected with the novel coronavirus, many of whom would have no or few symptoms, and examined whether hospitals had the capacity to handle them if the cases came in over six months, 12 months and 18 months. Hospital bed figures were derived from recent surveys conducted by the American Hospital Association and data compiled by the American Hospital Directory. The data is divided into slightly more than 300 regions, also known as hospital referral regions.

Get the data:

House Office Expenditure Data

Link Copied!
Source
U.S. House of Representatives Statement of Disbursements
Date released
July 2018
Dates covered
July 2009 - March 2018
Topic
Politics
Featured uses
Terms of Use
Standard terms of use

Data on official spending by the House of Representatives, including lawmakers’ offices, committee offices and administrative offices.

Learn more about this dataset

Members of the House of Representatives get an annual budget for their Washington and district offices, but how they spend it is up to them. Lawmakers are required to report the recipients of their office spending; this data details the official spending done by the House of Representatives, including lawmakers’ offices, committee offices and administrative offices.

Each quarter we release two CSV files: a summary file listing the office and total amount in one of a number of broad categories and a detail file listing individual recipients and amounts. The data is updated four times a year. You can download individual quarterly data on Represent.

Download all available data from July 2009 through March 2018, complete the form on this page.


Data dictionary

Summary files

BIOGUIDE_ID – the official ID of members of the House
OFFICE – the name of the House office
YEAR – the calendar year
QUARTER – the quarter of the year
CATEGORY – broad description of spending
YTD – year to date amount spent by office in that category
AMOUNT – amount spent by office in that category in quarter

Detail files

Has BIOGUIDE, OFFICE, QUARTER, YEAR, CATEGORY, AMOUNT, plus:

PAYEE – name of recipient
PURPOSE – specific purpose of spending
DATE - date of payment (optional)
START DATE – beginning of period which payment covers
END DATE – end of period which payment covers
TRANSCODE – House transaction code
TRANSCODELONG – description of House transaction code
RECORDID – House record number
RECIP (orig.) - original (non standardized) recipient

Get the data:

Immigration and Customs Enforcement Arrest Data (2013-2017)

Link Copied!
Source
Immigrations and Customs Enforcement
Date released
April 2018
Dates covered
2013-2017
Rows
Varies
Topics
  • Criminal Justice
  • Politics
Featured use
In Pennsylvania, It’s Open Season on Undocumented Immigrants
Terms of Use
Standard terms of use

This data set provides summary statistics, broken down by region, on the number and type of administrative arrests made by Immigrations and Customs Enforcement.

Learn more about this dataset

This download includes summary statistics on the number of administrative arrests of criminal and non-criminal individuals made by Immigrations and Customs Enforcement (ICE) Enforcement and Removal Operations (ERO) by region (Area of Responsibility). Includes three files with data on:

  • ERO At-Large Administrative Arrests for fiscal years January 1, 2013 - July 15, 2017.
  • ERO At-Large Administrative Arrests (July-November 2017)
  • ERO Administrative and At-Large Administrative Arrests (October 2017-December 2017)

Get the data:

Interim COVID-19 Vaccine Distribution Plans

Link Copied!
Source
Various health departments
Date released
November 2020
Dates covered
Plans released between 10/16/20-11/10/20
Topic
Health
Featured use
Most States Aren’t Ready to Distribute the Leading COVID-19 Vaccine
Terms of Use
Standard terms of use

A combined download of all of the draft plans for distribution of a COVID-19 vaccine released by states, territories and metro areas in response to a CDC request.

Learn more about this dataset

The Centers for Disease Control and Prevention required 64 jurisdictions — including all 50 states, eight territories and six metropolitan areas — to submit plans for how they would distribute a COVID-19 vaccine. The first drafts of these plans were due by Oct. 16, 2020, and many states have posted them online. ProPublica has gathered available draft plans together in this repository. States will likely continue to update their plans beyond the versions available here. You can read about these plans and how states have prepared for a potential vaccine here. Last updated Nov. 10, 2020

Get the data:

IRS Audit Rates by County

Link Copied!
Source
Kim M. Bloomquist, IRS website
Date released
April 2019
Dates covered
2012-2015
Rows
3141
Methodology
Github Repo for IRS Audit Rates by County
Topic
Finance
Featured uses
Terms of Use
Standard terms of use

This data set contains the total number of income tax filings and the estimated number of audits per county, for the combined tax years 2012-15.

Learn more about this dataset

The earned income tax credit, or EITC, is a program designed to help boost low-income workers out of poverty. In response to pressure from congressional Republicans to root out incorrect payments of the credit, the IRS audits EITC recipients at higher rates than all but the richest Americans.

Kim M. Bloomquist, who served as a senior economist with the IRS’ research division for two decades, decided to map the distribution of audits to illustrate the dramatic regional effects of the agency’s emphasis on EITC recipients. In a study first published in Tax Notes, he found that because more than a third of all audits are of EITC recipients, the number of audits in each county is largely a reflection of how many taxpayers there claimed the credit.

The included data covers the total number of income tax filings and the estimated number of audits per county, for the combined tax years 2012-15.

The data is also available, along with our analysis scripts, on Github.

Get the data:

LEADS Gang File Summary Data

Link Copied!
Source
Illinois State Police
Date released
July 2018
Dates covered
1993 to 2018
Rows
177
Topic
Criminal Justice
Featured use
Like Chicago Police, Cook County and Illinois Officials Track Thousands of People in Gang Databases
Terms of Use
Standard terms of use

Counts of individuals added into the LEADS (Law Enforcement Agencies Data System) Gang File by Illinois state and local police in each year since 1993.

Learn more about this dataset

This dataset provides counts of the number of people entered into the LEADS (Law Enforcement Agencies Data System) Gang File by Illinois state and local police in each year since 1993. Race and gender totals are also included for each year.

Get the data:

Los Angeles County Sheriff’s Deputy Contacts in Lancaster

Link Copied!
Source
Los Angeles County Sheriff’s Department
Date released
February 2021
Dates covered
January 2019 to December 2019
Rows
3,854
Methodology
About the Data
Topics
  • Criminal Justice
  • Education
Provided in collaboration with
KPCC/LAist
Featured use
In a California Desert, Sheriff’s Deputies Settle Schoolyard Disputes. Black Teens Bear the Brunt.
Terms of Use
Standard terms of use

A dataset of contacts with Los Angeles County sheriff’s deputies in Lancaster, Calif., during the 2019 calendar year.

Learn more about this dataset

This data covers contacts with Los Angeles County sheriff’s deputies in Lancaster, California, during the 2019 calendar year.

ProPublica’s analysis found that of the contacts taking place in Lancaster that were listed as having “reasonable suspicion that the person was engaged in criminal activity” as the basis for the contact, a large number took place at Antelope Valley Unified High School District campuses. We also found that the contacts disproportionately involved Black teens.

We obtained raw data describing Los Angeles County Sheriff’s deputies’ contacts from the County of Los Angeles’ open data portal. The data describes all incidents where at least one person was detained or arrested, as well as details about each person involved in these contacts. To assess where contacts with deputies were taking place, ProPublica cleaned and geocoded the addresses reported for each contact. Contacts at schools were established based on the geocoded addresses and campus footprints obtained from Python library OSMnx. A field in the data indicating whether the contact took place at a K-12 school was not reliably populated, but we have added two columns specifying whether a contact took place at an AVUHSD school campus, and, if so, the name of the campus.

The “About the Data” section of the article provides more detail on our analysis.

Get the data:

Medicare Part B Provider Utilization and Payment Data

Link Copied!
Source
Centers for Medicare & Medicaid Services
Date released
May 2016
Dates covered
2012-2014
Topic
Health
Featured use
Treatment Tracker
Terms of Use
Standard terms of use

This data documents Medicare’s Part B program and the individual doctors and other health professionals serving more than 33 million seniors and disabled.

Learn more about this dataset

This data documents Medicare’s Part B program and the individual doctors and other health professionals serving more than 33 million seniors and disabled. The data includes all services performed by doctors 11 or more times that year to Part B patients. ProPublica uses this data to create the Treatment Tracker app and stories.

Get the data:

Medicare Part D Hepatitis C Prescribing Data (2014)

Link Copied!
Source
Centers for Medicare & Medicaid Services
Date released
March 2015
Dates covered
2013-2014
Rows
6805
Topic
Health
Featured use
The Cost of a Cure: Medicare Spent $4.5 Billion on New Hepatitis C Drugs Last Year
Terms of Use
Standard terms of use

Medicare Part D prescription data for Hepatitis C drugs in 2014.

Learn more about this dataset

Medicare Part D prescription data for Hepatitis C drugs in 2014. This data set shows the 15 fold increased in Medicare’s spending on drugs to treat hepatitis C. The drugs cure the disease, but taxpayers are footing the bill.

Get the data:

Medicare Part D Prescribing Data (2010)

Link Copied!
Source
Centers for Medicare & Medicaid Services
Dates covered
2010
Rows
20758453
Topic
Health
Terms of Use
Standard terms of use

A detailed data set of prescriptions written by providers under the Medicare Part D program, including all drugs prescribed to Part D patients 11 or more times during 2010.

Learn more about this dataset

This is the Medicare Part D prescription data for 2010. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier.

Get the data:

Medicare Part D Prescribing Data (2011)

Link Copied!
Source
Centers for Medicare & Medicaid Services
Dates covered
2011
Rows
21150242
Topic
Health
Terms of Use
Standard terms of use

A detailed data set of prescriptions written by providers under the Medicare Part D program, including all drugs prescribed to Part D patients 11 or more times during 2011.

Learn more about this dataset

This is the Medicare Part D prescription data for 2011. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier.

Get the data:

Medicare Part D Prescribing Data (2012)

Link Copied!
Source
Centers for Medicare & Medicaid Services
Dates covered
2012
Rows
21970751
Topic
Health
Terms of Use
Standard terms of use

A detailed data set of prescriptions written by providers under the Medicare Part D program, including all drugs prescribed to Part D patients 11 or more times during 2012.

Learn more about this dataset

This is the Medicare Part D prescription data for 2012. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier.

Get the data:

Medicare Part D Prescribing Data (2013)

Link Copied!
Source
Centers for Medicare & Medicaid Services
Dates covered
2013
Rows
21970751
Topic
Health
Terms of Use
Standard terms of use

This is the Medicare Part D prescription data for 2012. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older.

Learn more about this dataset

This is the Medicare Part D prescription data for 2012. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older.

Get the data:

Medicare Part D Prescribing Data (2016)

Link Copied!
Source
Centers for Medicare & Medicaid Services
Date released
November 2018
Dates covered
2016
Topic
Health
Terms of Use
Standard terms of use

This is the Medicare Part D prescription data for 2016. The data includes all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older.

Learn more about this dataset

This is the Medicare Part D prescription data for 2016. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older.

Get the data:

Medicare Part D Prescribing Data, Patients 65 or Older (2011)

Link Copied!
Source
Centers for Medicare & Medicaid Services
Dates covered
2011
Rows
16366282
Topic
Health
Terms of Use
Standard terms of use

A detailed data set of Medicare Part D prescriptions written only for patients 65 or older in 2011. The data include all drugs prescribed by doctors 11 or more times to these patients in 2012. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier.

Learn more about this dataset

This is a subset of the Medicare Part D prescription data for 2011. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients who were 65 and older. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier.

Get the data:

Medicare Part D Prescribing Data, Patients 65 or Older (2012)

Link Copied!
Source
Centers for Medicare & Medicaid Services
Dates covered
2012
Rows
16966011
Topic
Health
Terms of Use
Standard terms of use

A detailed data set of Medicare Part D prescriptions written only for patients 65 or older in 2012. The data include all drugs prescribed by doctors 11 or more times to these patients in 2012. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier.

Learn more about this dataset

This is a subset of the Medicare Part D prescription data for 2012. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients who were 65 and older. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier.

Get the data:

New Jersey Public Sector Contracts

Link Copied!
Source
New Jersey Public Employment Relations Commission
Date released
July 2020
Rows
6,366
Topics
  • Criminal Justice
  • Finance
Featured use
New Jersey Public Employment Relations Commission
Terms of Use
Standard terms of use

This data set includes all public sector contracts filed with the New Jersey Public Employment Relations Commission (PERC)

Learn more about this dataset

This data set includes all public sector contracts (including police union contracts) filed with the New Jersey Public Employment Relations Commission (PERC). State law provides that public employers shall “file with the commission a copy of any contracts negotiated with public employee representatives following consummation of negotiations.” This requirement applies to all public sector employers in New Jersey, including police and firefighting departments, school districts, etc. As of July 2020, there were 6,366 contracts in this data set. It is updated regularly.

Some of the contracts in this data set, available here, were used in reporting on police contracts by the Asbury Park Press and ProPublica.

Get the data:

New Mexico School Discipline

Link Copied!
Source
New Mexico Public Education Department
Date released
February 2023
Dates covered
2010-2022
Rows
285,917 discipline records
Methodology
How We Found the School District Responsible for Much of New Mexico’s Outsized Discipline of Native Students
Topics
  • Criminal Justice
  • Education
Provided in collaboration with
New Mexico In Depth
Featured use
This School District Is Ground Zero for Harsh Discipline of Native Students in New Mexico
Terms of Use
Standard terms of use

This data includes all disciplinary incidents reported by school districts in New Mexico to the state’s Public Education Department, as well as district-level enrollment figures.

Learn more about this dataset

This includes two types of data: discipline and enrollment.


Discipline Data

This data includes all disciplinary incidents reported by school districts in New Mexico to the state’s Public Education Department. In working with this data, we found it to have several limitations, which we note here. We also recommend reading the methodology post that was published along with the story.

The data was extracted from the state’s public schools database, called the Student Teacher Accountability Reporting System, and covers the 2010-11 to 2021-22 school years. The data was acquired in June 2022 through a public records request made by New Mexico In Depth and ProPublica.

At the time the data was received, reporting for 2021-22 school year was not yet finalized and therefore the disciplinary data for that year is incomplete in this file. Data for the 2020-21 school year is complete, but there are many fewer disciplinary actions than previous school years because of school closures during the COVID-19 pandemic.

ProPublica and New Mexico In Depth used this data to identify disproportionate rates of expulsion and referrals to law enforcement among Native American students in New Mexico. One district, Gallup-McKinley County Schools, played an outsized role in this disparity. The analysis focused on the 2016-17 to 2019-20 school years.

The Gallup-McKinley County Schools superintendent disputed our findings. He asserted that a much smaller number of students had been expelled, although that is contradicted by the district's reports to the state and to the U.S. Department of Education’s Office for Civil Rights.

The data is self-reported by districts, and the state told ProPublica and New Mexico In Depth that it validates the data. However, over the four-year period the news organizations found roughly 20 cases in which a school district, including Gallup-McKinley, recorded few or no disciplinary incidents for the first several months of a school year, despite reporting significant numbers in the rest of the year.

In addition, many of the disciplinary records involving pre-kindergarten students appear to be errors. The race of the students in those records are mostly Pacific Islander and Native Hawaiian, although very few such students are enrolled in the state. Our analysis excluded pre-kindergarten incidents.

Each record in the database represents a disciplinary action against a student. During the time period we analyzed, if a student faced multiple types of discipline for a single incident (e.g., both arrested and suspended in response to a fight), schools were instructed to record only the most severe punishment, according to STARS manuals from this period. (Starting in the 2022-23 school year, which is not covered in this database, the STARS manual indicates that the state has changed how multiple types of discipline are recorded for a single incident.)

If the same student was disciplined for multiple incidents over the course of a school year, they appear as separate records in the database. There are no unique student identifiers in the data; it cannot be used to calculate how many students were disciplined.

This download contains a copy of the STARS manuals for the 2017-18 and 2019-20 through 2021-22 school years. The manual includes a data dictionary for the discipline data. Other STARS documentation can be found on PED’s website.

Enrollment Data

The download includes a spreadsheet with enrollment data for the 2010-11 through 2021-22 school years, which includes breakdowns by racial group and counts of special education students and English-language learners.

Prior to the 2019-20 school year, the enrollment data survey date was at the end of the school year (June 30). For 2019-20 and later, the survey date is early in the fall semester (Oct. 1), and enrollments of five or fewer are masked.

The enrollment spreadsheet was compiled using STARS data received from a public records request and PED’s website. More recent enrollment figures and breakdowns by grade can be found on the site.

Get the data:

New York State Subsidy Programs

Link Copied!
Source
Multiple New York State Agencies
Dates covered
2011-2014
Rows
Varies
Topic
Business
Featured uses
Terms of Use
Standard terms of use

Information on 12 major New York State subsidy programs, as received by ProPublica under Freedom of Information Law requests.

Learn more about this dataset

As part of our research into 12 of New York State's major subsidy programs, ProPublica, Investigative Post and the Columbia University Graduate School requested information from several state agencies under Freedom of Information Law requests.

The agencies include Empire State Development, the state’s economic development arm, which provided data on seven programs: Commercial Tax Credits, Film Tax Credits, the Economic Development Fund, JOBSNow, Regional Economic Development Councils, StartUp New York and Excelsior; the New York Power Authority, which provided two additional subsidy programs; and the state’s Taxation and Finance Department, which provided data on the Brownfield Cleanup Program. This download includes the raw data as provided by the agencies. In some cases, it contains additional information about these programs that was not included in our online database.

You can read more about how we used this data in the methodology.

Get the data:

Northern Illinois Federal Gun Cases

Link Copied!
Date released
October 2017
Dates covered
January 2007 - October 2017
Topic
Criminal Justice
Terms of Use
Standard terms of use

Bulk data about federal gun cases in the Illinois Northern District between January 2007 and October 2017.

Learn more about this dataset

ProPublica Illinois utilized this bulk data about federal gun cases as part of its reporting about gun trafficking in Chicago and northern Illinois. This data set includes information about federal gun cases under statutes dealing with unlicensed firearm dealing (18:922A.F), unlawful sale of a firearm (18:922C.F), unlawful shipping of a firearm (18:922E.F), possession of a firearm by a felon (18:922G.F), making a false statement in a gun purchase (18:924A.F), and use of a gun in drug trafficking (18:924C.F) in the Illinois Northern District from January 1, 2007 to October 4, 2017. Documentation and a data dictionary are available from PACER (pdf).

Get the data:

Nursing Home Compare Data

Link Copied!
Source
Centers for Medicare & Medicaid Services
Topic
Health
Featured uses
Terms of Use
Standard terms of use

Use this dataset to compare nursing homes in a state based on the deficiencies cited by regulators and the penalties imposed in the past three years.

Learn more about this dataset

The Centers for Medicare and Medicaid Services Nursing Home Inspection data, including general information about nursing homes, health deficiencies, and penalties, updated monthly. ProPublica used this data in our Nursing Home Inspect. The data sets in particular that we used are Health Deficiencies, Penalties, and Provider Info.

Get the data:

Nursing Home Deficiencies Data

Link Copied!
Source
Centers for Medicare & Medicaid Services
Date released
August 216
Topic
Health
Featured uses
Terms of Use
Standard terms of use

The Centers for Medicare and Medicaid Services makes publicly available the full-text statements of nursing home deficiencies.

Learn more about this dataset

The Centers for Medicare and Medicaid Services makes publicly available the full-text statements of nursing home deficiencies. The linked data set is a zip file that contains 10 Excel files -- one for each region of the country. ProPublica has been using this data to power the Nursing Home Inspect tool since August 2012. This data also supported ProPublica reporting that found that government fails to ensure consistent penalties for nursing homes in different states.

Get the data:

Open Payments Data (2016)

Link Copied!
Source
Centers for Medicare & Medicaid Services
Date released
July 2018
Dates covered
2016
Topic
Health
Terms of Use
Standard terms of use

The data set is a raw data download of the January 2018 release of the 2016 Open Payments data set.

Learn more about this dataset

Open Payments is a federal program, required by the Affordable Care Act, that collects information about the payments drug and device companies make to physicians and teaching hospitals for things like travel, research, gifts, speaking fees, and meals. It also includes ownership interests that physicians or their immediate family members have in these companies. New data is released annually.

Get the data:

Open Payments Raw Data Download

Link Copied!
Source
Centers for Medicare & Medicaid Services
Date released
January 2015
Topic
Health
Terms of Use
Standard terms of use

Open Payments is a federal program, required by the Affordable Care Act, that collects information about the payments drug and device companies make to physicians and teaching hospitals,

Learn more about this dataset

Open Payments is a federal program, required by the Affordable Care Act, that collects information about the payments drug and device companies make to physicians and teaching hospitals for things like travel, research, gifts, speaking fees, and meals. It also includes ownership interests that physicians or their immediate family members have in these companies.

Get the data:

PAC Donor Similarity Scores

Link Copied!
Source
Federal Election Commission; ProPublica analysis
Date released
January 2016
Dates covered
2014
Rows
435
Topic
Politics
Featured use
Campaign Donations Reflect the Sharp Split in Congress Among Republicans
Terms of Use
Standard terms of use

A data set of cosine similarity scores comparing 2014 PAC donors to Paul Ryan, the current Speaker of the House of Representatives, to other House members during the 113th Congress (2013-2014).

Learn more about this dataset

A GitHub repository that contains cosine similarity scores for PAC donors to congressional recipients. ProPublica’s analysis used a calculation called cosine similarity to compare each House members' donors to others; two members with an identical set of donors would receive a score of 1, while two with no PAC donors in common would get a score of 0.

The GitHub repository currently has cosine similarity scores comparing 2014 PAC donors to Paul Ryan, the current Speaker of the House of Representatives, to other House members during the 113th Congress (2013-2014).

Get the data:

Pipeline Safety Data

Link Copied!
Source
Pipeline & Hazardous Materials Safety Administration
Date released
November 2012
Dates covered
1986-Present
Topics
  • Business
  • Transportation
Featured uses
Terms of Use
Standard terms of use

The U.S. Pipeline & Hazardous Materials Safety Administration documents incidents affecting more than 2.5 million miles of oil and gas pipelines each year.

Learn more about this dataset

The U.S. Pipeline & Hazardous Materials Safety Administration documents incidents affecting more than 2.5 million miles of oil and gas pipelines each year.

Get the data:

Political Advertisements from Facebook

Link Copied!
Source
ProPublica, Facebook Users
Date released
July 2020
Dates covered
August 2018 - July 2020
Topics
  • Politics
  • Business
Terms of Use
Standard terms of use

This database contains political ads that ran on Facebook and were submitted by thousands of users from around the world.

Learn more about this dataset

Note: This database was last updated in 2020. It should only be used as a historical snapshot. Researchers can access data about political ads on Facebook from the Ad Observer project by the NYU Cybersecurity for Democracy program.

This database contains ads that ran on Facebook and were submitted by thousands of users from around the world. ProPublica, the Globe and Mail, and Quartz asked readers to install browser extensions that automatically collected advertisements on their Facebook pages and sent them to our servers. We then used a machine learning classifier to identify which ads were likely political and included them in this dataset. The included fields are:

  • id: post id number on facebook
  • html: HTML of the ad as collected by the Political Ad Collector
  • political: number of Political Ad Collector users who have voted that the ad is political
  • not_political: number of Political Ad Collector users who have voted that the ad is not political
  • title: ad title
  • message: ad content
  • thumbnail: link for a thumbnail of the profile image (of the advertiser)
  • created_at: date ad was first collected by the Political Ad Collector
  • updated_at: the most recent time that it got an impression OR the most recent time it was voted on
  • lang: language of the ad. always en-US.
  • images: link for images included in the ad
  • impressions: number of times the ad has been seen by the Political Ad Collector
  • political_probability: calculated by the classifier. data only includes ads with a probability >=0.7
  • targeting: Facebook’s “Why am I seeing this?” disclosure provided to Political Ad Collector users
  • suppressed: value is false. suppressed ads are excluded from this data set because they were misclassified.
  • targets: a parsed version of targeting
  • advertiser: the account that posted the ad
  • entities: named entities mentioned in the ad, extracted using software
  • page: the page that posted the ad
  • lower_page: the Facebook URL of the advertiser that posted the ad (the “page” column, lowercased)
  • targetings: an array of one or more of Facebook’s “Why am I seeing this?” disclosures provided to Political Ad Collector users
  • paid_for_by: for political ads, the entity listed in Facebook’s required disclosure as having paid for the ad
  • targetedness: an internal metric for estimating how granularly an ad is targeted, used for sorting in the ProPublica search interface

Get the data:

Politicians tracked by Politwoops

Link Copied!
Source
Twitter, ProPublica
Date released
July 2019
Rows
varies
Topic
Politics
Featured use
Politwoops
Terms of Use
Standard terms of use

A downloadable listing of all of the politicians whose Twitter accounts are tracked by Politwoops. Updates daily.

Learn more about this dataset

This dataset is a complete listing of the politicians whose Twitter accounts are currently tracked by Politwoops, a database of deleted tweets maintained by ProPublica. This file includes both campaign and official accounts for federal officeholders and candidates, along with governors and gubernatorial candidates. ProPublica has added columns with information on gender, Federal Election Commission candidate ID, branch of government, congressional district and the unique identifier given to all members of the House and Senate, where applicable. It does not include any deleted tweets. Updates daily.

Get the data:

Preferential Rents in New York City

Link Copied!
Source
New York City Rent Guidelines Board, New York State Division of Housing and Community Renewal
Date released
June 2017
Dates covered
2003-2016
Topic
Business
Featured uses
Terms of Use
Standard terms of use

Data on where New York City landlords have taken advantage of a 2003 law enabling them to raise rents by more than the annual limits if they registered a high rent — often high above existing market rates — but charged tenants a lower, "preferential" rent.

Learn more about this dataset

In 2003, lawmakers in New York State passed a law that in effect allowed landlords to bypass annual limits on rent increases for their rent-stabilized apartments. Owners could raise rents by more than the annual limits if they registered a high rent — often high above existing market rates -- but charged tenants a lower, “preferential” rent. Preferential rents are not regulated and can be raised up to the registered rate upon lease renewal. Today, more than 250,000 New York City apartments feature these rents.

We used this data to make an interactive map exploring the issue. The New York City Rent Guidelines board received ZIP code level data on preferential rents as of 2016 from the New York State Division of Housing and Community Renewal and added additional info on the number of occupied rent-stabilized units in each ZIP. We extracted this data from the memo (also included) and produced a spreadsheet from it.

The spreadsheet includes all ZIP codes in NYC with preferential rents, along with the count of preferential rents, the number of occupied rent stabilized units, and the percent of occupied rent stabilized units with preferential rents. It also includes the total Major Capital Improvement costs allowed for 2016, but for all apartments regardless of rent levels, and the Rent Guidelines Board warns that they are not correlated with preferential rents.

Get the data:

ProPublica's Afghan Waste Data

Link Copied!
Source
Special Inspector General for Afghanistan Reconstruction, ProPublica
Date released
December 2015
Dates covered
2009-2015
Rows
132
Topic
Military
Featured use
We Blew $17 Billion in Afghanistan. How Would You Have Spent It?
Terms of Use
Standard terms of use

A unique data set, compiled by ProPublica reporters, that summarizes $17 billion in wasteful spending in Afghanistan.

Learn more about this dataset

ProPublica reviewed 235 SIGAR financial audits, special projects, program audits and inspection reports to compile this data set of $17 billion in wasteful spending. Financial audits were excluded from the final data. Of the other reports, only those that had specific monetary figures were used.

ProPublica also asked the military, the State Department, USAID and the United States Army Corps of Engineers for updates to some projects. As a result, some SIGAR reports were removed from the data, such as a teacher training facility that SIGAR had found was poorly built but had been fixed. Some figures were added that came from the government, such as the cost to fix buildings constructed with hazardous materials. There are 77 of these entries.

Based on SIGAR’s conclusions, we identified three main categories for 55 of the projects: waste, unsustainable, and at-risk.

  • Waste: A program, policy, purchase or building that has not fulfilled its purpose or achieved its goals, has involved misspending, or involved spending required as a result of poor decision making.
  • At Risk (On the Brink): A program, policy, purchase or building that is is in danger of becoming waste. Possible reasons: the project is currently unused, won’t be used as intended, underused or it is vulnerable to theft or corruption.
  • Unsustainable (Budget Busters): A program, policy, purchase or building that is beyond the means, capabilities or desires of the Afghan government to operate, maintain or use.

There were 22 additional projects that were on the borderline of those three categories, and could be deemed wasteful from the point of view of the taxpayer. Perhaps the project might have fallen short of goals but is still being utilized by the Afghans or had mixed results. These were categorized as “You Be the Judge.”

ProPublica consulted six experts drawn from academia and government who were well versed in reconstruction in Afghanistan. Their suggestions and criticisms were incorporated into the final data.

Get the data:

Restraint and Seclusion Data

Link Copied!
Source
Office of Civil Rights, U.S. Department of Education
Date released
June 2014
Dates covered
2011-2012
Rows
95635
Topic
Education
Featured uses
Terms of Use
Standard terms of use

This data contains all instances of restraints and seclusions that public schools self-reported during the 2011-2012 school year. It is broken down by state, district, and school.

Learn more about this dataset

This data contains all instances of restraints and seclusions that public schools self-reported during the 2011-2012 school year. It is broken down by state, district, and school. This is the first time the federal government has attempted to collect this data from all schools, though beware: many school districts did not report. ProPublica used this data in our story on the use of restraints at school. Read our reporting recipe, below, for tips on how you can report this story.

Get the data:

Road Home Rebuilding Grants

Link Copied!
Source
Louisiana Division of Administration
Date released
February 2023
Dates covered
2007 to 2022
Rows
130,054
Topic
Environment
Provided in collaboration with
The Times-Picayune | The New Orleans Advocate and WWL-TV
Featured uses
Terms of Use
Standard terms of use

Individual-level records of Road Home grants, insurance payments, damage estimates and property values for homes damaged or destroyed after hurricanes Katrina and Rita in Louisiana.

Learn more about this dataset

This data includes individual-level records for all property owners who received grants from the Road Home program, which was set up after hurricanes Katrina and Rita to cover repair and rebuilding costs that exceeded insurance payouts or FEMA aid for owner-occupied properties in Louisiana.

The data, acquired through public records requests to the Louisiana Division of Administration, includes details on the four types of grants available through the rebuilding program:

  • Compensation grants, the main type of funding available to homeowners.

  • Additional compensation grants, which were made available to lower-income homeowners.

  • Elevation grants, meant to help raise homes to prevent future flooding.

  • Individual mitigation measure grants, which could be used toward other efforts to prevent future flooding.

The data also contains data on state-generated estimates of the pre-storm value of each home and the cost to repair or rebuild it. Additional fields indicate whether properties were insured and how much the property owner received from insurance, FEMA or other sources.

While this data does not contain the names of property owners or addresses, geographic identifiers (using U.S. Census Bureau geographies from 2000) are available down to the census block level.

Note that there are some idiosyncratic aspects of the data contained in this table. Users are strongly encouraged to read the attached field definitions carefully as they detail these issues.

The most important caveat is that the grant fields indicate the total amount disbursed to the property owner. However, a variety of circumstances may have resulted in property owners owing some of that money to the state. These include situations in which property owners received additional money from insurance after they got their Road Home funds and cases in which checks were reissued due to fraud. To get an accurate accounting of the net amount received by each property owner, calculations must be made to adjust the grant or insurance values. Certain records may need to be excluded using flags included in the data.

Field names in the data are as provided by the Louisiana Division of Administration.

Get the data:

Salmon Testing Results

Link Copied!
Source
ALS
Date released
December 2022
Dates covered
September 2021: Samples caught from Columbia River. Nov. 16, 2021: Samples submitted to laboratory. March 8, 2022: Samples tested and results returned to Oregon Public Broadcasting and ProPublica.
Rows
13 metals and 2 classes of chemicals
Topic
Environment
Provided in collaboration with
Oregon Public Broadcasting
Featured use
The U.S. Promised Tribes They Would Always Have Fish, but the Fish They Have Pose Toxic Risks
Terms of Use
Standard terms of use

Explore the levels of contaminants in 50 salmon from the Columbia River that were caught in September 2021.

Learn more about this dataset

ProPublica and Oregon Public Broadcasting purchased 50 salmon from Native fishermen along a stretch of the Columbia River. The majority of the fish were fall Chinook salmon, with two coho salmon and one steelhead. Ten fish that were roughly the same size were placed in five coolers. The fish from each of these samples were combined to create a total of five composite samples. The samples were reduced to skin-on fillets and tested by ALS for 13 metals and two classes of chemicals. The lab returned an analytical report, as well as spreadsheets with the testing results.

Get the data:

School Desegregation Orders Data

Link Copied!
Source
U.S. Department of Justice; Stanford University
Date released
December 2014
Dates covered
1954-2014
Rows
Varies
Topic
Education
Featured uses
Terms of Use
Standard terms of use

Across the United States, some school districts are bound by orders to increase the racial integration of black and Latino students and improve their educational opportunities. This is a comprehensive data set of those desegregation orders from 1957 through December 2014.

Learn more about this dataset

Across the United States, some school districts are bound by orders to increase the racial integration of black and Latino students and improve their educational opportunities. This dataset collects information about all of those school desegregation orders, from 1957 through the end of 2014.

The data files include information about school desegregation orders mandated by federal courts and open school desegregation orders that resulted from voluntary agreements between school districts and the U.S. Department of Education’s Office of Civil Rights.

Get the data:

Spending at Trump Properties

Link Copied!
Source
Federal and state agencies, ProPublica’s FEC Itemizer (Federal Election Commission)
Date released
June 2018
Dates covered
April 2015-June 2018 (varies by agency and campaign)
Topics
  • Politics
  • Business
Featured uses
Terms of Use
Standard terms of use

Details of spending at Trump properties by political campaigns and federal officials between 2015 and the present.

Learn more about this dataset

During the presidential race, Donald Trump’s campaign spent millions at his properties. When he became the presumptive nominee, Republican campaign committees followed suit. Since his inauguration, federal officials have continued the pattern, spending taxpayer money at his hotels, golf clubs and restaurants. ProPublica has collected the details of this spending since 2015* into an interactive graphic, Paying the President, and is releasing the data as a download.

Please note: Federal government spending is incomplete because many government agencies have actively fought requests to disclose spending at Trump properties. The data we have so far was released, in part, after lawsuits. We’ll continue to update this page as we receive more data.

Additionally, federal government spending data does not include all expenses from the Secret Service or from Coast Guard protection details. Some federal spending reports we received did not include transaction dates.

* Federal Election Commission data is from April 30, 2015 through May 8, 2018; Federal agency expenditure data from the Department of Commerce is from Jan. 15, 2017 through April 10, 2018; data from the Department of Defense is from Jan. 20, 2017 through June 14, 2017; data from the Department of Homeland Security is from Jan. 20, 2017 through Feb. 13, 2018; data from the General Services Administration is from Jan. 20, 2017 through Nov. 20, 2017; data from the Department of State is from Jan. 20, 2017 through Aug. 2, 2017, with three expenditures with unknown transaction dates; State and local government agency expenditure data is from Jan. 20, 2017 through June 2018.

Get the data:

The Mar-a-Lago Crowd Documents

Link Copied!
Source
Department of Veterans Affairs
Date released
August 2018
Dates covered
February 2017 to April 2018
Topics
  • Politics
  • Military
Featured use
The Shadow Rulers of the VA
Terms of Use
Standard terms of use

Documents obtained from the Department of Veterans Affairs showing how three outside advisers who often meet at Mar-a-Lago wield vast influence over the agency.

Learn more about this dataset

This download includes hundreds of pages of documents — including emails, calendars, expense reports, and other records — obtained from the Department of Veterans Affairs through the Freedom of Information Act showing how three outside advisers who often meet at Mar-a-Lago wield vast influence over the agency. The documents range from February 2017 to April 2018 and were released to ProPublica between May and August 2018.

Get the data:

Tobacco Bonds Underwriting Pitches

Link Copied!
Source
Bond underwriters' responses
Date released
December 2014
Dates covered
1999-2014
Rows
265
Topic
Business
Featured use
Bankers Brought Rating Agencies ‘To Their Knees’ On Tobacco Bonds
Terms of Use
Standard terms of use

This data contains a summary of comments investment bankers made about credit rating agencies while pitching their underwriting services for tobacco bonds from 1999 onward.

Learn more about this dataset

This data contains a summary of comments investment bankers made about credit rating agencies while pitching their underwriting services for tobacco bonds from 1999 onward. The comments come from 140 underwriting pitches ProPublica collected under public records requests in more than a dozen states. The data shows that Wall Street pressed S&P, Moody’s and Fitch to assign more favorable credit ratings to their deals and bragged that the raters complied.

ProPublica used this data in "Bankers Brought Rating Agencies ‘To Their Knees’ On Tobacco Bonds".

Get the data:

Toxic Air Pollution Hot Spots

Link Copied!
Source
ProPublica analysis of the EPA’s Risk Screening Environmental Indicators model
Date released
November 2021
Dates covered
2014-2018
Rows
41,188
Methodology
https://www.propublica.org/article/how-we-created-the-most-detailed-map-ever-of-cancer-causing-industrial-air-pollution
Topic
Environment
Featured use
The Most Detailed Map of Cancer-Causing Industrial Air Pollution in the U.S.
Terms of Use
Standard terms of use

Data on carcinogenic industrial air pollution

Learn more about this dataset

These are the data files behind ProPublica’s “Sacrifice Zones” series. ProPublica analyzed five years of data from the EPA’s Risk Screening Environmental Indicators model to identify hot spots of cancer-causing industrial air pollution across the country. The model estimates concentrations of toxic chemicals near industrial plants in 810-by-810-meter squares of land, referred to as grid cells. ProPublica derived cancer risk estimates from the concentration numbers in the model’s grid cells, and averaged those estimates over a five-year period (2014-2018). At the time of publication, the data that ProPublica analyzed were the most recent available RSEI data.

After computing the average cancer risk estimates, we wrote an algorithm to identify toxic “hot spots,” defined as contiguous grid cells with an estimated incremental lifetime cancer risk greater than or equal to 1 in 100,000. That is, if a community of 100,000 people in the given area or grid cell were exposed to a toxic chemical continuously at the concentration provided in the RSEI data over a presumed lifetime of 70 years, roughly one additional individual might develop cancer from the exposure. That risk level is the exponential midpoint value in the EPA’s “fuzzy bright line,” a range of benchmarks for risks that the agency deems “acceptable.” The upper limit of this range was established in 1989, with the promulgation of emissions standards for the release of the chemical benzene as 1 in 10,000. At the low end of the range is 1 in 1 million.

The download available here includes a data dictionary, as well as two GeoJSON files and one CSV file:

  • One GeoJSON gives the grid cells — 810-by-810-meter squares — where cancer risk estimates are above 1 in 100,000. Contiguous groups of these grid cells make up the hot spots on our map. They also contain additional information, such as cancer risk estimates for each year of the five-year analysis period and population estimates.

  • The second GeoJSON contains the outlines of each of the hot spots identified by our analysis, with additional information such as the location of the hot spot and its area.

  • The CSV is a list of facilities that our analysis identified to be significant drivers of cancer risk in each of these hot spots.

For more detailed information on ProPublica’s analysis and the limitations of this data, please read our methodology.

Get the data:

Trump Administration Financial Disclosures

Link Copied!
Source
White House counsel's office, U.S. Office of Government Ethics, various federal agencies
Date released
August 2017
Dates covered
1/2017 through 8/1/17
Rows
751
Topics
  • Politics
  • Business
Featured use
Here Are the Financial Disclosures of 349 Officials Trump Has Installed Across the Government
Terms of Use
Standard terms of use

A list of 750 Trump administration officials' financial disclosure forms, which lay out their financial holdings an employment backgrounds, and (when available) any ethics waivers provided to the official.

Learn more about this dataset

ProPublica has compiled disclosure forms that lay out Trump administration officials' financial holdings and employment backgrounds. This data set includes the name, agency, job title, and links to financial disclosure forms for 750 individuals, including White House staffers, President Trump’s Cabinet and the hundreds of members of so-called beachhead teams that the administration has installed at federal agencies. The data set also includes links to the ethics waivers that have been provided to 29 officials. Updated August 3, 2017

Get the data:

Trump Administration Political Appointees

Link Copied!
Source
ProPublica, various federal agencies, Office of Personnel Management, Office of Government Ethics
Date released
October 2019
Dates covered
1/20/2017-10/14/19
Methodology
How We Compiled Trump Town
Topic
Politics
Featured uses
Terms of Use
Standard terms of use and the following additional terms:

You may not redistribute the entire dataset or use the data to create an online version of the database.

A unique database of Trump administration political appointees, cabinet members and White House staffers. Last updated October 14, 2019

Learn more about this dataset

Trump Town is a database of Trump administration political appointees, cabinet members and White House staffers. We created it by requesting staffing lists from individual agencies and the Office of Personnel Management. We then used those staff lists to request financial disclosure documents from the Office of Government Ethics and individual agencies. We parsed those financial disclosures to create a relational database that includes tables for organizations (former employers) and agencies, in addition to staffers. We also cross-referenced staffer names from our Represent lobbying database, and reviewed those names to verify that the people match.

The database contains 14 tables: five created by ProPublica, as well as nine tables from financial disclosure documents, scraped and cleaned into usable data. The tables are available in the download both as 14 individual CSVs with primary and foreign keys, and a single SQL dump file containing the 14 tables. Complete documentation is included in the download and contains important information about how to use the data accurately.

Get the data:

Trump’s Beachhead Team Appointments

Link Copied!
Source
Office of Personnel Management, 0ther Federal Agencies
Date released
August 2017
Dates covered
1/20/17 through 8/3/17
Rows
543
Topic
Politics
Featured uses
Terms of Use
Standard terms of use

ProPublica obtained a list of more than 1,000 Trump administration hires for beachhead teams, including dozens of lobbyists and some from far-right media. (Updated 8/31/17)

Learn more about this dataset

The White House said in January that around 520 staffers were being hired for the beachhead teams. Beachhead team members are temporary employees serving for stints of four to eight months, but many are expected to move into permanent jobs. Through a Freedom of Information Act request, ProPublica has obtained a list of more than 1,000 Trump administration hires. (Updated 8/31/17)

Get the data:

U.S. Congress: Bulk Data on Bills

Link Copied!
Source
ProPublica's Represent App
Dates covered
January 1973-Present
Topic
Politics
Featured use
Represent
Terms of Use
Standard terms of use

Thousands of bills are introduced in Congress during each session. This dataset contains metadata for every bill introduced, including sponsors, cosponsors, committee actions, floor votes and a summary, along with the data of the last modification to the bill.

Learn more about this dataset

Thousands of bills are introduced in Congress during each session. This dataset contains metadata for every bill introduced, including sponsors, cosponsors, committee actions, floor votes and a summary, along with the data of the last modification to the bill. The primary download on this page includes data only for the current 118th Congress (2021-2022). This data is updated twice daily.

Data is provided in both JSON and XML formats.

Bulk data from previous congresses can be downloaded by clicking the links below. Bulk data for congresses before and including the 112th was generated by the Sunlight Foundation. Data for congresses the 113th Congress and subsequent congresses was generated by ProPublica, using code from the @UnitedStates GitHub organization.

Historical Data:

Get the data:

U.S. Rape Clearance Data

Link Copied!
Source
Various law enforcement agencies, Federal Bureau of Investigation, Newsy, Reveal, and ProPublica
Date released
November 2018
Dates covered
2014-2016
Rows
103
Methodology
How We Analyzed Rape Clearance Rates
Topic
Criminal Justice
Provided in collaboration with
Newsy
Featured uses
Terms of Use
Standard terms of use

Summary data on how police jurisdictions process rape cases, based on public records requests from 103 law enforcement agencies in cities with populations over 300,000.

Learn more about this dataset

This data was collected as part of an investigation by Newsy, Reveal, and ProPublica into how police process rape cases. Scripps/Newsy requested internal case management data from every major law enforcement agency in the United States serving a population of at least 300,000 people. The data provided in this download includes annual counts of total rape cases and number of clearances by type (arrest, unfounded, and exceptional) for each jurisdiction, as well as the columns in the data provided by the agencies that were used to distinguish rape cases from other crimes, to divide the data by reporting year and to tally case dispositions as either unfounded, cleared by arrest or cleared by exceptional means. (A complete description of the included fields is included with the download.)

To create this data, we sent record requests to 103 law enforcement agencies and as of November 14, 2018, we had received data from 67 agencies. (A copy of the request sent to each jurisdiction is also included with the download. As more requests are filled, we will continue to update the rows for those districts.) The requests were for the years 2014 through 2016 and included the incident number of the offense, the date, the type of offense, whether it was unfounded, the date it was unfounded, the date it was cleared, the arrest date, the type of clearance and the reason for exceptional clearance if applicable. We parsed data from the FBI’s summary and NIBRS master files for the three years and calculated reported FBI clearance rates alongside our own analysis of the internal data from each jurisdiction.

We found that it was common for agencies to use a reporting category called “clearance by exceptional means” to mark rape cases as cleared without making an arrest, inflating the clearance rates that are often cited as a measure of police effectiveness. The FBI reporting system used by most agencies nationwide does not distinguish between the two types of clearance. These data provide a window into how often these agencies are using exceptional means to clear rape cases, as well as the prevalence of unfounded rape reports for jurisdictions that do not report that total to the FBI.

Additional data available upon request

Upon request, Newsy can provide additional files containing additional data provided by each agency on rape cases. In many cases, Newsy/Scripps has aggregated the data from multiple files. The columns are as provided with the exception of any column named “YEAR,” which was added when data were provided in separate files or tabs for each year, and any column starting with the letters “MF,” which are columns that were added in the course of processing the data. All of the files have been restricted to rape cases as defined in the analysis file and data dictionary, and in some cases fields have been dropped from the originals to remove personal identifiable information.

Information about how to request these additional files is included in the download.

NOTE: This data has also been archived with Big Local News, which provides support for data collection and analysis for local journalists. For more information, contact Cheryl Phillips.

Get the data:

Voting Machine Age in 2016 Election

Link Copied!
Source
Verified Voting
Date released
December 2017
Rows
6403
Topic
Politics
Featured use
Election Security a High Priority — Until It Comes to Paying for New Voting Machines
Terms of Use
Standard terms of use

County-level data indicating the first year voting machines in the 2016 election were used.

Learn more about this dataset

County-level data indicating the first year voting machines in the 2016 election were used. The data comes from Verified Voting. We used this data in Election Security a High Priority — Until It Comes to Paying for New Voting Machines.

Get the data:

White House Complex Visitor Logs

Link Copied!
Source
Property of the People; the Office of Management and Budget; the Office of the U.S. Trade Representative; the Office of National Drug Control Policy; the Office of Science and Technology Policy; and the Council on Environmental Quality.
Date released
November 2017
Dates covered
Jan. 20, 2017 through Sept. 6, 2017
Rows
8,807
Topic
Politics
Featured use
Here Are the White House Visitor Records the Trump Administration Didn’t Want You to See
Terms of Use
Standard terms of use

A dataset of visitors to the White House complex between January 20, 2017 and September 6, 2017. Last updated Nov. 21, 2017

Learn more about this dataset

The White House complex -- formally called the Executive Office of the President, or EOP -- is made up of more than a dozen offices and about 4,000 staffers who craft White House policy and support the president. It includes the White House itself, the National Security Council, the Office of Management and Budget, and other federal agencies.

Property of the People, a Washington-based nonprofit transparency group, successfully sued to force the administration to release the visitor logs and calendars of top agency officials from five agencies within the White House complex: the Office of Management and Budget; the Office of the U.S. Trade Representative; the Office of National Drug Control Policy; the Office of Science and Technology Policy; and the Council on Environmental Quality.

The court held that these agencies are subject to public disclosure through the Freedom of Information Act, even if the White House itself is not. The Trump administration refuses to release visitor logs for the White House, citing "grave national security risks and privacy concerns of the hundreds of thousands of visitors annually.”

The Obama White House also initially refused to release a list of its visitors, as had previous administrations. But in 2009, facing four lawsuits from government transparency groups and increasing public scrutiny, the Obama administration began voluntarily posting records of those who came in and out of the White House itself online.

The dataset covers the period between Jan. 20, the day of Trump’s inauguration, and about Sept. 6, although the date ranges differ by agency. The download includes both the original PDFs provided by the administration, as well as our CSV version of the data, used in our story, "Here Are the White House Visitor Records the Trump Administration Didn’t Want You to See."

The government redacted the names of some White House complex visitors, citing privacy reasons. Property of the People and the government are negotiating for the release of names currently redacted in some of the visitor logs and calendars. We plan to publish additional data, likely disclosed on a quarterly basis, as it becomes available.

The government noted in its response to Property of the People’s open-records request that it couldn’t guarantee that every visitor’s name was logged. In some cases, where we couldn’t confirm the proper spelling of handwritten names or other text, we noted entries as “illegible.”

The government redacted the names of some White House complex visitors, citing privacy reasons. Property of the People and the government are negotiating for the release of names currently redacted in some of the visitor logs and calendars. We plan to publish additional data, likely disclosed on a quarterly basis, as it becomes available.

The government noted in its response to Property of the People’s open-records request that it couldn’t guarantee that every visitor’s name was logged. Because the visitor logs and calendars are produced by the agencies themselves, meeting details might be mislabeled or incorrect. In some cases, where we couldn’t confirm the proper spelling of handwritten names or other text, we noted entries as “illegible.”


Get the data:

Workers’ Compensation Premium Rate Data

Link Copied!
Source
Oregon Department of Consumer and Business Services
Topic
Business
Featured uses
Terms of Use
Standard terms of use

A state-by-state ranking of workers' compensation insurance rates paid by employers is produced every two years by Oregon’s Department of Consumer and Business Services.

Learn more about this dataset

Oregon’s Department of Consumer and Business Services (DCBS) produces this biannual ranking of workers’ compensation insurance rates paid by employers. The data from 50 states and Washington, D.C. showed that, as of 2014, employers were paying a lower rates for workers’ compensation insurance than at any time in the past 25 years.

Get the data:

Workers’ Compensation State Reforms Data

Link Copied!
Source
ProPublica research on state reform laws
Date released
March 2015
Dates covered
2002-2014
Rows
50
Topic
Business
Featured uses
Terms of Use
Standard terms of use

A detailed dataset summarizing major changes to state workers’ compensation laws since 2003.

Learn more about this dataset

This data tracks the changes to the major medical and wage-replacement replacement benefits in workers’ comp.

To track the impact of the workers compensation nationwide, ProPublica assigned a starting value for each state by combining a ranking of average statutory benefits conducted by Actuarial & Technical Solutions of Bohemia, N.Y., and a report from the U.S. Department of Labor that monitored how many recommendations of a 1972 presidential commission on workers’ comp that each state was following. ProPublica then analyzed state reform laws, using data from the National Council on Compensation Insurance Annual Statistical Bulletin, which rates the effects of legislation on benefit payments. In addition, ProPublica consulted reports from the Workers Compensation Research Institute and conducted interviews with stakeholders to determine how the changes compared to the historical norms provided by state workers’ comp systems.

States may have adopted additional legislation that is not included here because it didn’t affect the core benefits.

ProPublica used this data in our graphic Workers’ Compensation Reforms by State.

Get the data:

Premium datasets

The following premium datasets are no longer available for download, but they are listed for archival purposes. If you have any questions about them, email [email protected].

421-a Housing Subsidy Compliance Data

Link Copied!
Source
NYC Dept. of Finance, Dept. of Housing Preservation and Development, Rent Guidelines Board, ProPublica
Date released
March 2016
Dates covered
1992-2016
Rows
6650
Topic
Business
Featured uses

Detailed information on 6,650 properties in New York City that receive tax benefits under the $1.4 billion-a-year 421-a housing subsidy program, including program compliance details.

Learn more about this dataset

This data set includes detailed information on 6,650 properties receiving 421-a benefits as of the city's FY15/16 final roll and the status of their application for a final certificate of eligibility (FCE) to receive such benefits, based on data provided by the New York City Department of Finance and the Department of Housing Preservation and Development under the New York State Freedom of Information Law. The two databases were joined on property identifier data, and contextual and regulatory data was added from additional data sources.

Landlords who collect property tax benefits under the $1.4 billion-a-year 421-a program — the city’s single-largest housing subsidy — are required to provide tenants with rent-stabilized leases. These leases limit rent increases to city-set limits, such as the rent freeze announced in June 2016. Often, the city’s Finance Department gives out the tax benefit without an approved application on file showing that the landlord has met all the requirements of the program, including rent stabilization.

Note: The sample download includes both complete raw data files provided under the NYS FOI Law.

Bombs In Your Backyard Data

Link Copied!
Source
U.S. Department of Defense
Date released
November 2017
Dates covered
Site data as of 2015
Rows
Varies
Methodology
Bombs in Your Backyard Methodology
Topics
  • Military
  • Environment
Featured uses

Information on the location, type, and costs of cleanup efforts administered by the Department of Defense at current and former military locations as of 2015.

Learn more about this dataset

Bombs in Your Backyard is an interactive map and database of military sites that contain toxic pollutants and contaminants in the soil or water, as well as sites that contain explosives or discarded military munitions. The data, which ProPublica obtained through a Freedom of Information Request, comes from the Defense Environmental Restoration Program, which is administered by the Department of Defense. The program measures and documents cleanup efforts at current and former military locations. The data was last updated in 2015.

The original dataset contained 4,785 military installations with at least one hazardous site, and 40,688 total hazardous sites. It also contained information on the type and amount of contamination, the past and estimated future cost of cleanup, the type of restrictions to public access or future land use, the method of cleanup, and the date at which cleanup ended or is expected to end, among other things. In some cases, we added descriptions of military installations from the U.S. Army Corps of Engineers.

The Bombs in Your Backyard database, which is a simplified and restructured version of the original DERP database, contains 9 tables. The tables are available in the download both as 9 individual CSVs and as a single SQL dump file containing the 9 tables. The tables are:

  • installations
  • sites
  • media
  • contaminants
  • controls
  • restrictions
  • phases
  • remedies
  • states

To request additional information and download documentation for this dataset, please complete the form on this page.

Note: Our interactive database contains 4,785 military installations with at least one hazardous site, and 40,688 total hazardous sites. Because not all sites came with location data, only 3,611 installations and 24,809 sites appear on our maps (see “Site Locations” for more information), but all installations and sites in the DOD data appear in our tables.

Commercial and Industrial Property Values in Chicago and Cook County

Link Copied!
Source
Illinois Department of Revenue, Cook County Assessor's Office
Date released
December 2017
Dates covered
2003-2016
Rows
Varies
Methodology
How We Analyzed Commercial and Industrial Property Assessments in Chicago and Cook County
Topics
  • Business
  • Finance
Provided in collaboration with
Chicago Tribune
Featured use
How the Cook County Assessor Failed Taxpayers

Commercial and industrial property value data for Cook County and Chicago, including tax assessment values, property sales records, and assessment appeals for 2003-2016.

Learn more about this dataset

ProPublica Illinois & The Chicago Tribune have created a research-ready dataset of commercial and industrial assessment and appeals data from 2003 to 2016 for the City of Chicago and Cook County, Illinois. The data, which were the underpinning for our analysis in "How the Cook County Assessor Failed Taxpayers," include data on tens of thousands assessments.

Data from the Cook County Assessor's Office and the Illinois Department of Revenue have been cleaned, standardized, and combined to allow for easy analysis of data on various details, including assessed values, sale prices, appeals records, and attorney names. The dataset includes:

  • Initial property valuations over multiple reassessment periods within the City of Chicago: The data is compiled by Property Index Number (PIN), a unique identifier that geographically locates each parcel of property. Includes a description of the property class, the assessed value of the property (2008-2015), and a flag to identify properties with identical reassessments over multiple years.
  • Data resulting from a sales ratio study comparing the assessor’s valuations to actual sales prices, including comparisons with the first-pass (initial) assessment and the final (board of review) assessment. This analysis used data from the Illinois Department of Revenue (IDOR) real estate transfer declaration data. The data comes from the PTAX declaration form, which provides information on sales of properties throughout the state. The self-reported data includes the classification of the property (such as commercial or residential), the sales price and whether the sale is an arm’s-length transaction or a compulsory sale. The data has been hand-checked to ensure no related parties were included in the analysis, among other exclusions. This entailed examining the buyers and sellers and flagging any who appeared to be related.
  • Appeals of commercial and industrial property tax assessments in Cook County. This analysis used appeals data from the CCAO which include data on each PIN that was the subject of an appeal. The data includes initial assessed values (estimates based on market data and building characteristics), second-pass assessed values (which incorporate adjustments based on appeals granted by the assessor’s office) and the final assessed values (incorporating any successful appeals to the Cook County Board of Review). Attorney names are also included in this dataset. The 3.8 million records, which contained many spelling and typographical errors, were standardized using regular expressions and data cleaning tools in R, followed by extensive fact-checking and hand checks.

To request commercial pricing information or purchase the data, complete the form on this page.

Dollars for Docs (2009-2013)

Link Copied!
Source
Pharmaceutical Company Disclosures, ProPublica
Dates covered
2009-2013
Rows
3362932
Methodology
How We Assembled This Data
Topic
Health
Featured uses

A unique set of data of more than $4 billion in payments to doctors, other medical providers and health care institutions that were disclosed by 17 pharmaceutical companies from 2009 to 2013. ProPublica combined, cleaned, and standardized data from multiple sources.

Learn more about this dataset

A unique set of data of more than $4 billion in payments to doctors, other medical providers and health care institutions that were disclosed by 17 pharmaceutical companies from 2009 to 2013.

Prior to 2013, the federal government did not require pharmaceutical companies to disclose payments they made to medical providers. However, about $4 billion in payments to doctors, other medical providers and health care institutions were voluntarily disclosed by 17 pharmaceutical companies between 2009 and 2013. As of 2013, their combined prescription drug sales amounted to about 50 percent of the U.S. market.

ProPublica compiled individual disclosures into a single, comprehensive database that allows patients to search for their physician or medical center and receive a listing of all payments matching that name. The database can also be searched by state and by company. It can be filtered by year and payment category.

Compiling this database involved scraping and collating from a variety of formats. Caveats and additional details are listed in the readme file available as part of both the purchased dataset and the sample data available for download on this page.

IMPORTANT NOTE: This data set does not include provider NPI numbers, and it cannot be combined with the data available in the annually updated Dollars for Docs data available here.

Dollars for Docs Data (2013-2016)

Link Copied!
Source
Centers for Medicare & Medicaid Services, ProPublica
Date released
June 2018
Dates covered
August 2013-December 2016
Rows
38,231,551
Topic
Health
Featured uses

Ready-to-use data on more than $9 billion dollars paid by pharmaceutical companies to medical providers and health care institutions between August 2013 and December 2016. Updates annually

Learn more about this dataset

ProPublica has compiled ready-to-use data on billions of dollars paid by pharmaceutical companies to medical providers and health care institutions between August 2013 and December 2018. Dollars for Docs, ProPublica's free, online lookup tool, now covers nearly $12 billion in medical industry payments made to more than 1 million physicians in the United States. This premium dataset is a downloadable version of the database that powers our tool.

As of November 2019, the data is provided in two separate tranches: January 2017-December 2018 and August 2013-December 2016. (You can purchase or download a sample of the 2017-2018 data here.) This page includes information about the 2013-2016 data.

The data included is based on the January 2018 release of the Open Payments data, and has been cleaned, analyzed, and matched with additional information. The data can be combined with our exclusive NPI-Open Payments ID crosswalk, which enables users to match this information to other provider-level data.

The data is provided as CSV files, with documentation, and your purchase includes:

  • Documentation outlining our methodology, data sources, caveats for using the data, and a data dictionary for each of the included tables.
  • A summary for each provider in the database, including their specialty, contact information, and aggregated payment information (number of payments received, total monetary value of payments received, and number of companies from which payments were received).
  • A ready-to-use version of the federal Open Payments data, for which we have cleaned company names, drug names, and device names. We have standardized how each company, drug, and device is listed, and eliminated duplicates (meaning if a product was listed as both a drug and device, we figured out which was correct).
  • An analysis-ready version of the payments file, with each drug or device associated with a payment represented on its own line. Open Payments records sometimes include multiple drugs and/or devices associated with each payment. (Each drug or device is assigned the full value of the payment.) Our flattened file makes it easier to identify the number of payments attributed to a drug or device or identify which providers received payments associated with particular products.

Please note: CMS updates the data every six months, with updates to the data, including corrections, additions, and removal of payments. Updates provided by CMS after January 2018 have not been included in this cleaned version of the data.

NPI-Open Payments ID Crosswalk

The NPI-Open Payments crosswalk, which matches providers Open Payment IDs to their National Provider Identifier (NPI) number, was last updated October 3, 2019. To request student or journalist discount pricing for the crosswalk, contact us.

Custom data segments are also available, with selections based on information such as provider specialty, location, NPI number, or hospital affiliation, as well as drug, device, and/or manufacturer.

Dollars for Docs Data (2017-2018)

Link Copied!
Source
Centers for Medicare & Medicaid Services, ProPublica
Date released
June 2019
Dates covered
January 2017-December 2018
Topic
Health
Featured uses

Ready-to-use data on more than $12 billion dollars paid by pharmaceutical companies to medical providers and health care institutions. Updates annually

Learn more about this dataset

ProPublica has compiled ready-to-use data on billions of dollars paid by pharmaceutical companies to medical providers and health care institutions between August 2013 and December 2018. Dollars for Docs, ProPublica's free, online lookup tool, now covers nearly $12 billion in medical industry payments made to more than 1 million physicians in the United States. This premium dataset is a downloadable version of the database that powers our tool.

As of November 2019, the data is provided in two separate tranches: January 2017-December 2018 and August 2013-December 2016. This page includes information about the 2017-2018 data.

This data set is based on the June 2019 release of the federal Open Payments database, and has been cleaned, analyzed, and matched with additional information. The data can be combined with our exclusive NPI-Open Payments ID crosswalk, which enables users to match this information to other provider-level data.

The data will be provided as six separate CSV files, with documentation, and your purchase includes:

  • Documentation outlining our methodology, data sources, caveats for using the data, and a data dictionary for each of the included tables.
  • A ready-to-use version of the federal Open Payments general payments data, for which we have cleaned company names, drug names, and device names. We have standardized how each company, drug, and device is listed, and eliminated duplicates (meaning if a product was listed as both a drug and device, we figured out which was correct).
  • A summary for each provider in the database, including their specialty, contact information, and aggregated payment information (number of payments received, total monetary value of payments received, and number of companies from which payments were received).
  • A companion table to the payments file, that identifies each drug or device associated with a payment on its own line. Open Payments records sometimes include multiple drugs and/or devices associated with each payment. This is a many-to-many table of payment-product pairs, which allows users to identify the number of payments attributed to a drug or device or identify which providers received payments associated with particular products.
  • Additional tables with basic information on products, companies, and teaching hospital payment recipients that can be joined with the other tables for more detailed analyses.
  • Not included: The NPI-Open Payments crosswalk, which matches providers Open Payment IDs to their National Provider Identifier (NPI) number.

Please note: CMS updates the data every six months, with updates to the data, including corrections, additions, and removal of payments. Updates provided by CMS after January 2018 have not been included in this cleaned version of the data.

Custom data segments are also available, with selections based on information such as provider specialty, location, NPI number, or hospital affiliation, as well as drug, device, and/or manufacturer.

Dollars for Docs: Hospital Analysis

Link Copied!
Source
Centers for Medicare & Medicaid Services
Date released
June 2016
Dates covered
January-December 2014
Rows
4816
Methodology
How We Compiled the Dollars for Docs Hospital Data
Topic
Health
Featured uses

A data set of U.S. hospitals and the percentage of their affiliated physicians who receive payments of various sizes from pharmaceutical and medical device companies.

Learn more about this dataset

This data set provides a summary analysis of Open Payments data, grouping total payments received by doctors at hospitals around the United States. Our goal was to compare U.S. hospitals based on the percentage of their affiliated physicians who receive payments of various sizes from pharmaceutical and medical device companies.

ProPublica's analysis of the Open Payments data was based on physicians’ primary hospital affiliations as reported by Medicare's Physician Compare tool in December 2014. This means our analysis excludes physicians who don't participate in Medicare and don't admit many patients to the hospital. The complete methodology is available below.

Detailed information about specific payments made to individual providers, including custom analysis of providers associated with individual hospitals, is available here.

Dollars for Profs

Link Copied!
Source
National Institutes of Health, various universities
Date released
December 2019
Rows
37,204
Topics
  • Education
  • Business

A data set of more than 35,000 financial disclosure and conflict of interest records from faculty members at U.S. universities.

Learn more about this dataset

This unique data set contains 37,204 financial disclosure and conflict of interest records from university faculty members across the country from The National Institutes for Health and several public universities.

The NIH collects significant disclosures of financial relationships that could affect the design, conduct or reporting of the NIH-funded research. ProPublica then combined those disclosures with data about NIH grants, as the disclosure database did not originally include the researchers’ university. ProPublica requested additional information about outside income from at least one public university in all 50 states. About 20 state universities complied with our requests (list below). ProPublica then combined the NIH data with the state disclosure data to create a single database.

The data download includes one Excel file, where each row is a single disclosed conflict of interest or financial disclosure.

Financial disclosure and conflict of interest records included were received from:

  • The National Institutes for Health
  • University of California (all locations)
  • The Office of the Illinois Secretary of State
  • Arizona State University
  • The University of Arizona
  • The University of Florida Medical School
  • The Georgia Institute of Technology
  • The University of Kentucky
  • Louisiana State University
  • The University of Texas (all locations)
  • The University of Utah

FBI Uniform Crime Reports (2014)

Link Copied!
Source
Federal Bureau of Investigation
Date released
November 2016
Dates covered
2014
Rows
Varies
Topic
Criminal Justice
Provided in collaboration with
Investigative Reporters and Editors
Featured use
Where Hate Crimes Aren't Reported

Local, county, and state police departments across the country voluntarily report aggregate numbers for different categories of crime in their jurisdictions. Updated annually.

Learn more about this dataset

Local, county, and state police departments across the country voluntarily report aggregate numbers for different categories of crime in their jurisdictions. The Uniform Crime Reports, comprised of six databases, includes crime information reported to the FBI by law enforcement agencies around the country. Most of the data consist of the "index" crimes: murder, nonnegligent manslaughter, forcible rape, robbery, aggravated assault, burglary, larceny-theft, motor-vehicle theft and arson. These crimes, with the exception of arson, were chosen in 1929 to serve as an index for gauging fluctuations in the overall volume and rate of crime. Arson was added by Congressional mandate in 1979.

The six databases of the Uniform Crime Report include: Return-A, Return-A Supplement, Supplemental Homicide Report (SHR), Police, Arson and Age, Sex and Race (ASR also known as Arrests). Each database, with the exception of SHR, is arranged by police reporting agencies. Occurrences are presented as aggregates. The data is broken down by month. All the databases provide the region, state, county, city, metropolitan statistical area (MSA) and reporting agency identifier.

This version of the dataset, available only from Investigative Reporters and Editors in partnership with ProPublica, is an analysis-ready version of the data provided by the FBI, pre-processed as CSV or SQL files. (While the source data comes as unpacked, fixed-width .DAT files on a disc, IRE has processed the files to slice the fixed-width data into columns and, in the ASR and Return A tables, processed hexadecimal numbers into their true negative values.) Additional data years, from 1993 to 2013, are also available upon request.

Home Mortgage Disclosure Act

Link Copied!
Source
Federal Financial Institutions Examination Council
Date released
September 2015
Dates covered
2014
Rows
11,875,464
Topic
Business
Provided in collaboration with
Investigative Reporters and Editors

A research-ready data set of individual home mortgage applications submitted to all banks, savings and loans, savings banks and credit unions with assets of more than $33 million. Includes demographic and Census data related to each application.

Learn more about this dataset

A research-ready data set of U.S. home mortgage loan applications, based on data from the federally mandated Home Mortgage Disclosure Act. In 2014, the most recent year for which data is available, there were about 11.7 million loan records reported by 7,062 financial institutions in 2014. These records include applications for home purchase, for home improvement, and for refinancing.

Data generated by HMDA provides information on lending practices. This data set includes multiple files; the primary table is the Loan Application Register (LAR), which contains:

  • demographic information about loan applicants, including race, gender and income;
  • the purpose of the loan (i.e. home purchase or improvement);
  • whether the buyer intends to live in the home; the type of loan (i.e. conventional, FHA insured, etc.);
  • the outcome of the loan application (i.e. approved or declined).
  • geographical information on applicants, such as Census tract, MA (metropolitan area), state and county, total population and percentage of minority population by Census tract.

Since 2004, the data also includes "spread," showing the difference between Treasury security interest rate and the loans interest rate. Lenders are also given the opportunity to note reasons for denial in three fields, but those are seldom used.

Names, addresses, and other information on lending institutions are stored in additional tables and can be joined to the LAR data.

HMDA requires all banks, savings and loans, savings banks and credit unions with assets of more than $33 million and offices in metropolitan areas to report mortgage applications. This act was enacted by Congress in 1975 and is implemented by the Federal Reserve Boards Regulation C. Banks, savings and loan associations, credit unions, and mortgage and consumer finance companies are required to report HMDA data if they meet legal criteria for coverage.

Maximum Workers' Comp for Body Parts by State

Link Copied!
Source
ProPublica research of state workers’ compensation laws
Date released
March 2015
Rows
55
Topic
Business
Featured uses

This dataset, based on detailed analysis by ProPublica reporters, includes the maximum permanent partial disability compensation that injured workers can receive for various body parts in 50 states, the District of Columbia and the federal system.

Learn more about this dataset

This dataset, based on detailed analysis by ProPublica reporters, includes the maximum permanent partial disability compensation that injured workers can receive for various body parts in 50 states, the District of Columbia and the federal system.

Creating the data involved more than 600 calculations relying on 52 separate formulas. We researched laws for all 50 states, the District of Columbia and the Federal Employees’ Compensation Act to calculate the maximum benefit injured workers can receive for the total loss or amputation of various body parts. Researchers then checked their calculations with state officials, attorneys and judges in those states. The benefit, often known as “permanent partial disability,” is in addition to temporary wage-replacement benefits and is intended to compensate workers who’ve suffered severe injuries, but can still work in some capacity.

Medical Marijuana Registry Programs

Link Copied!
Source
Associated Press, state agencies, media reports
Date released
July 2019
Dates covered
Varies by state
Rows
varies
Methodology
Data illuminates marijuana legalization impact
Topics
  • Health
  • Business
Provided in collaboration with
Associated Press
Featured uses

State- and county-level data on patient enrollment in medical marijuana programs in the U.S. and Puerto Rico

Learn more about this dataset

The Associated Press has compiled a comprehensive dataset on medical marijuana registry programs as of April 2019 across the U.S and Puerto Rico through formal records requests, published program reports, confirmed media reports and miscellaneous department documents. The dataset includes patient numbers at the state level for all participating states, for 12 states at the county level and DC at the ward level, but does not include data from California, Washington state or Maine. Years available vary by state, based on program age and data availability.

The data provided with this purchase includes:

  • A layout table. Outlines what data is available for each state (along with a PDF outlining state-by-state details and caveats).
  • State-level medical marijuana registry data. Includes number of total patients, minor patients, patients over 65 (70 or 71) years old, qualifying conditions, and gender of patients over time.
  • County-level medical marijuana registry data. Total number of patients over time. Not available for all states.
  • Qualifying conditions. Outlines which qualifying conditions are included in the data for each state. States may allow access to medical marijuana for more qualifying conditions that listed here, but these are the categories available in the data.

National Consumer Bankruptcy Cases

Link Copied!
Source
Department of Justice, American Community Survey
Date released
October 2017
Dates covered
2008-2015
Rows
9,099,556
Topic
Business
Featured use
Too Broke for Bankruptcy

A national data set of U.S. consumer bankruptcy cases initiated under Chapter 7 or Chapter 13 between 2008 and 2015.

Learn more about this dataset

ProPublica has created a research-ready national data set of consumer bankruptcy cases filed from 2008 through 2015 either under Chapter 7 or Chapter 13. The dataset, which was the underpinning for our analysis in "How The Bankruptcy System is Failing Black Americans," includes data on the approximately 9 million consumer bankruptcy cases in the United States. In addition to basic filing information such as important dates and locations, each case includes the debtor’s zip code, as well as income, asset, and liability information.

The file was created by combining and cleaning data from the Department of Justice and the American Community Survey.

This Department of Justice data is provided as snapshots. ProPublica cleaned and aggregated records to create a single record for each case. Demographic information for the filer's zip code was added to each case including racial composition, median household income, and education level. In some cases, demographic variables included in the file were calculated by combining measures from the ACS data. (For example, percent black is calculated as the estimated size of the black population divided by the total estimated population.)

Note: The Justice Department’s dataset does not include identifying information about the debtor, such as name and address. Also, this data does not list the debtor’s attorney. Furthermore, a significant number of cases lack asset, liability or income data, most likely because such information was never filed in the first place. This is particularly prevalent among pro se cases.

Download the raw source data on bankruptcy case filings here.

National Flood Insurance Program Coverage

Link Copied!
Source
FEMA
Date released
October 2017
Dates covered
2012, 2017, and 1978-2017
Rows
Varies
Topic
Environment
Provided in collaboration with
Associated Press
Featured uses

A cleaned, analysis-ready look at participation in the National Flood Insurance Program across more than 18,000 communities.

Learn more about this dataset

This dataset is a cleaned, analysis-ready look at community participation in the National Flood Insurance Program.

The original data, obtained from FEMA, captures in-force (active) policies, total insured value and total annual premium costs in more than 18,000 communities — across all 50 states and Washington, D.C. — on December 31, 2012 and June 30, 2017. Additionally, the Associated Press cleaned and enhanced the data with additional information, including:

  • standardization of community names across both survey years to enable comparisons between 2012 and 2017 participation,
  • aggregation of data at the county and state level, and
  • appending Census FIPS codes to county-level data.

The data also includes a summary table of total active policies, value, and premiums annual, between 1978 and 2017.

Using this dataset, the Associated Press showed that the number of active insurance policies has dropped 14 percent since peaking at 5.7 million in 2009. The steepest decline has been over the past five years. In 2017, there were a total of 4,943,218 policies active, and the insured property had a value of $1.23 trillion. In 2012, there were 5,496,457 policies active, and the insured property had a value of $1.27 trillion.

Since 2012, the number of properties covered under the flood insurance program has dropped 10 percent, from nearly 5.5 million to about 4.9 million. This is as the program has struggled financially to cover increasing amounts of losses from increasingly frequent flooding events.

National Inventory of Dams

Link Copied!
Source
U.S. Army Corps of Engineers
Date released
September 2015
Dates covered
2002, 2013
Rows
90,580
Topic
Business
Provided in collaboration with
Investigative Reporters and Editors

Structural information on dams in the United States, including recent inspections, that provides insight into aging infrastructure, emergency preparedness, and dam inspections.

Learn more about this dataset

The National Inventory of Dams (NID) provides structural information on dams in the United States, including some information from recent inspections.

Dams are included if they meet at least one of the following criteria:

  • High hazard classification - loss of one human life is likely if the dam fails,
  • Significant hazard classification - possible loss of human life and likely significant property or environmental destruction,
  • Equal or exceed 25 feet in height and exceed 15 acre-feet in storage,
  • Equal or exceed 50 acre-feet storage and exceed 6 feet in height.

For a reporter covering infrastructure or breaking news involving one of these structures, the NID is an important resource. Journalists have used the data to produce stories on aging infrastructure, emergency preparedness and lack of adequate dam inspections.

Data is available from both 2002 and 2013, based on periodic reports by the U.S. Army Corps of Engineers.

New York State Nuisance Abatement Actions

Link Copied!
Source
New York State Supreme Court nuisance abatement filings, 2010 U.S. Census, NYPD precinct shapefiles
Date released
May 2016
Dates covered
January 2013-June 2014
Rows
1162
Topics
  • Criminal Justice
  • Business
Provided in collaboration with
New York Daily News
Featured uses

A unique data set of 1,162 nuisance abatement actions filed in five New York State Supreme Courts between January 2013 and June 2014, geocoded and matched with racial and ethnographic demographics of location's census tract.

Learn more about this dataset

This data set contains information entered from nuisance abatement actions filed by the New York Police Department (NYPD) during 2013 and the first half of 2014. The actions, which target businesses and homes NYPD says have been used for illegal activity, were filed in five state Supreme Courts covering New York, Bronx, Kings, Queens and Richmond counties.

NPI-Open Payments Crosswalk

Link Copied!
Source
ProPublica, Centers for Medicare & Medicaid Services
Date released
October 2019
Dates covered
2017-2018
Topic
Health
Featured uses

A unique NPI-Open Payments crosswalk, which matches providers Open Payment IDs to their National Provider Identifier (NPI) number. Last updated October 3, 2019.

Learn more about this dataset

Dollars for Docs, ProPublica's free, online lookup tool, compiles data on nearly $12 billion dollars in medical industry payments made to more than 1 million physicians in the United States between August 2013 and December 2018. Our unique crosswalk, which matches providers Open Payment IDs to their National Provider Identifier (NPI) number, enables our premium Dollars for Docs dataset to be used in conjunction with other provider-specific date sets that use NPI numbers. ProPublica matched approximately 99% of physicians to their NPI number.

We matched information from the Open Payments physician table to the National Plan & Provider Enumeration System (NPPES) to associate a National Provider Identifier to each physician's record in the Open Payments data. To increase the chances of a successful match, we performed this match against four versions of the NPPES database: March 11, 2018; September 11, 2016; July 12, 2015; and April 11, 2011. For records where a clear match was not found, we researched these records and manually selected the match where possible. We did not guess in any circumstance. More details are available in the documentation included as a sample download on this page.

Last updated October 3, 2019. To request student or journalist discount pricing for the crosswalk, contact us.


Nursing Home Sanctions in Pennsylvania

Link Copied!
Source
Pennsylvania Department of Health, Center for Medicare and Medicaid Services and news reports
Date released
March 2016
Dates covered
January 1997- February 2016
Rows
897
Topic
Health
Provided in collaboration with
Reading Eagle

This unique data set aggregates information from nearly 900 administrative orders, between 1997 and 2016, to provide a detailed look at health and safety sanctions in Pennsylvania nursing homes. The data has been cleaned and verified by Reading Eagle reporters.

Learn more about this dataset

Over the past two decades, half of Pennsylvania’s nursing homes have received a fine, ban, temporary or revoked license for serious deficiencies that jeopardized the health and safety of their residents. This unique data set aggregates information from nearly 900 administrative orders, dating back to 1997, to provide a detailed look at these sanctions.

The data was culled from a Right-to-Know request to the Pennsylvania Department of Health and it includes the sanction; order date; fine amounts and type of sanction, whether administrative in nature or a health and safety deficiency.

The database also includes the following information (updated as of Sept. 30, 2016) from Nursing Home Compare: number of certified beds; participation in Medicare and/or Medicaid; and star ratings.

Because nursing homes frequently change names and owners, The Reading Eagle updated this using information from the Center for Medicare and Medicaid Services, the health department, nursing home websites and press reports about facility closures.

The data was entered manually from letters provided in a PDF format, cleaned and standardized to ensure that all entries were presented in the same format. Then, ownership information was cross-checked against facility ID numbers and addresses.

Partisan Advantage in the 2016 and 2018 Elections

Link Copied!
Source
Associated Press election services, individual states’ certified election results
Date released
April 2019
Dates covered
2016, 2018
Rows
varies
Methodology
How to quantify gerrymandering? Reporters find a way
Topic
Politics
Provided in collaboration with
Associated Press
Featured uses

This dataset includes vote totals by party in U.S. House races and 4,900 state-level House or Assembly races, as well as calculated measures of partisan advantage.

Learn more about this dataset

This data set contains the raw election data and an analysis of partisan advantage in all U.S. House races as well as about 4,900 state House and Assembly races in both 2018 and 2016 elections.

The analysis, conducted by the Associated Press after the 2016 and 2018 elections, used a mathematical formula called the "efficiency gap" to measure partisan advantage in the elections. The statistical analysis is designed to detect cases in which one party may have won, widened or retained its grip on power through partisan gerrymandering, the process of drawing congressional and state legislative seats to favor the majority party.

To produce efficiency gap scores for each state, the AP obtained vote totals for all U.S. and state House elections and calculated the share of the votes received by Republicans and Democrats in each district, excluding votes cast for independent and third-party candidates. From those figures, the AP calculated the statewide average share of the vote that each party received in state legislative races, and then compared that figure to the party's share of the seats won in that state.

This data set includes four files:

  • State-level vote totals and efficiency gap calculations for the 2018 U.S. House elections. Seven states have only one U.S. House seat and aren't included. North Carolina's calculations are based on its 12 certified districts.
  • State-level vote totals and efficiency gap analysis for U.S. House races for the 2016 elections. Seven states have only one U.S. House seat and thus have no district data. Those states are Alaska, Delaware, Montana, North Dakota, South Dakota, Vermont and Wyoming.
  • Efficiency gap analysis on state legislature races for House or Assembly seats in 2018. Six states excluded from the data (see caveats): Louisiana, Mississippi, Nebraska, New Jersey, North Dakota and Virginia.
  • Efficiency gap analysis on state legislature races for House or Assembly seats in 2016. Eight states are excluded from the data (see caveats): Alabama, Louisiana, Maryland, Mississippi, Nebraska, New Jersey, North Dakota and Virginia.

Additional details about the methodology and findings from the Associated Press analysis are available in the documentation included with purchase, as well as in the sample download available on this page.

Police Officer and State Trooper Earnings (New Jersey)

Link Copied!
Source
463 municipal police departments, New Jersey State Police
Date released
January 2022
Dates covered
2019
Rows
23890
Methodology
How we tracked the pay of 24,000 cops
Topic
Criminal Justice
Provided in collaboration with
NJ Advance Media
Featured use
The Pay Check

The 2019 earnings of of 21,000 local officers and 2,900 state troopers in New Jersey, including overtime, off-duty jobs and contractual perks.

Learn more about this dataset

This data set, created as part of a two-year investigation by NJ Advance Media for NJ.com, includes the 2019 earnings of of 21,000 local officers and 2,900 state troopers in New Jersey, including overtime, off-duty jobs and contractual perks that can add tens of thousands of dollars to the average officer’s paycheck. It is a comprehensive look at police compensation that researchers said has few parallels in the nation.

The effort required more than 700 public records requests and hundreds of hours of data inputting, since many municipalities provided only paper records that had to be broken down by hand. The resulting interactive database can be found at https://projects.nj.com/paycheck/. For more details on how we cleaned and analyzed the data, see our methodology.

This data is based on payroll records provided to the media outlet through public records requests to 463 municipal police departments and New Jersey State Police. To establish uniformity in the salary figures for each officer, state pension data was overlaid onto those records. (See our methodology).

NJ Advance Media has standardized officer names. The data covers earnings for 2019.

What’s Included

The master file (paycheck-data.csv) contains earnings figures of every local and state police officer by name and department in New Jersey. Descriptions of each column can be found in the data dictionary. Also included are pc_notes.docx, which has notes to the data; and pc_data_diary.xlsx, which defines the breakdown of pay at each department.

Police Use of Force Reports (New Jersey)

Link Copied!
Source
New Jersey state and municipal police departments; NJ.com
Date released
November 2018
Dates covered
2012-2016
Rows
70,556
Topic
Criminal Justice
Provided in collaboration with
NJ Advance Media
Featured use
The Force Report

This research-ready data set includes data on more than 70,000 police use-of-force incidents across all 468 New Jersey municipal police departments and the New Jersey State Police between 2012 and 2016.

Learn more about this dataset

This research-ready data set includes data on police use-of-force incidents across all 468 New Jersey municipal police departments and the New Jersey State Police between 2012 and 2016, compiled by NJ Advance Media.

Police are required to fill out the forms and detail what happened when they punch, pepper spray or use other force against someone in the state. NJ Advance Media filed 506 public records requests and received more than 70,000 forms covering 2012 through 2016. The data was cleaned, analyzed and compiled by department and officer to create a searchable database of use-of-force incidents and a premium data download, available here.

The premium data download includes:

  • Police use-of-force data that is sortable and quantifiable by town, officer, incident date or type of force, standardized by commonly accepted categories of force.
  • A full accounting of every form, which each has 40 to 60 data points with every detail an officer included about a specific incident. Among the data: date, type of force, race, age, incident type, injuries, reasons for force and charges.
  • Documentation outlining the methodology, data sources and a data dictionary.

For more information about the available licenses, please review our Terms of Use.

Prescriber Checkup (2011)

Link Copied!
Source
Center for Medicare and Medicaid Services, American Geriatrics Society, First Databank, ProPublica
Date released
January 2013
Dates covered
2011
Rows
Varies
Topic
Health
Featured use
Prescriber Checkup

Ready-to-use data on prescriptions made under Medicare's popular prescription-drug program, Medicare Part D, which serves 39 million people and pays for more than 1/4 of prescriptions written nationwide.

Learn more about this dataset

This dataset is ProPublica's premium version of the Medicare Part D data for 2011. Medicare’s popular prescription-drug program serves 39 million people and pays for more than one of every four prescriptions written nationwide. We've cleaned and analyzed Medicare Part D data from 2011 to offer a detailed look at individual providers prescribing habits. There are five total files:

  • A main provider file that includes overall information for each provider, including name, location and summary prescription tallies. This includes all providers who wrote at least 50 prescriptions for at least one drug in 2011. All totals refer only to Part D.
  • A breakdown of the drugs prescribed by each provider.
  • A lookup file for drugs to determine if they are antipsychotics, benzodiazepines, Schedule 2/Schedule 3 or Beers drugs. We used information from the American Geriatrics Society to identify Beers drugs. We used information from CMS and First Databank to identify narcotic, benzodiazepine and antipsychotic drugs. CMS provided a list of antipsychotic medications, as well as Schedule 2/Schedule 3 drugs as they appeared on the DEA’s list in 2011.
  • A summary file of the top 500 drugs dispensed in Medicare Part D in 2011, based on the number of fills, including the respective fill rank and cost rank in each state.
  • Rankings for drugs within each specialty and state.

Optional add-ons include:

  • Custom runs by location, drug, or provider

Prescriber Checkup (2012)

Link Copied!
Source
Center for Medicare and Medicaid Services, ProPublica
Date released
January 2015
Dates covered
2012
Rows
Varies
Topic
Health

Ready-to-use data on $103.7 billion in prescriptions made under Medicare's popular prescription-drug program, Medicare Part D, which serves 39 million people and pays for more than 1/4 of prescriptions written nationwide.

Learn more about this dataset

This dataset is ProPublica's premium version of the Medicare Part D data for 2012. Medicare’s popular prescription-drug program serves 39 million people and pays for more than one of every four prescriptions written nationwide. We've cleaned and analyzed Medicare Part D data from 2012 to offer a detailed look at individual providers prescribing habits. There are five total files:

  • A main provider file that includes overall information for each provider, including name, location and summary prescription tallies. This includes all providers who wrote at least 50 prescriptions for at least one drug in 2012. All totals refer only to Part D.
  • A breakdown of the drugs prescribed by each provider.
  • A lookup file for drugs to determine if they are antipsychotics, benzodiazepines, Schedule 2/Schedule 3 or Beers drugs. We used information from the American Geriatrics Society to identify Beers drugs. We used information from CMS and First Databank to identify narcotic, benzodiazepine and antipsychotic drugs. CMS provided a list of antipsychotic medications, as well as Schedule 2/Schedule 3 drugs as they appeared on the DEA’s list in 2012.
  • A summary file of the top 500 drugs dispensed in Medicare Part D in 2012, based on the number of fills, including the respective fill rank and cost rank in each state.
  • Rankings for drugs within each specialty and state.

Prescriber Checkup (2013)

Link Copied!
Source
Center for Medicare and Medicaid Services
Date released
June 2015
Dates covered
2013
Rows
Varies
Topic
Health

Ready-to-use data on $103.7 billion in prescriptions made under Medicare's popular prescription-drug program, Medicare Part D, which serves 39 million people and pays for more than 1/4 of prescriptions written nationwide.

Learn more about this dataset

This dataset is ProPublica's premium version of the Medicare Part D data for 2013. Medicare’s popular prescription-drug program serves 39 million people and pays for more than one of every four prescriptions written nationwide. We've cleaned and analyzed Medicare Part D data from 2013 to offer a detailed look at individual providers prescribing habits. There are five total files:

  • A main provider file that includes overall information for each provider, including name, location and summary prescription tallies. This includes all providers who wrote at least 50 prescriptions for at least one drug in 2013. All totals refer only to Part D.
  • A breakdown of the drugs prescribed by each provider.
  • A lookup file for drugs to determine if they are antipsychotics, benzodiazepines, Schedule 2/Schedule 3 or Beers drugs. We used information from the American Geriatrics Society to identify Beers drugs. We used information from CMS and First Databank to identify narcotic, benzodiazepine and antipsychotic drugs. CMS provided a list of antipsychotic medications, as well as Schedule 2/Schedule 3 drugs as they appeared on the DEA’s list in 2013.
  • A summary file of all drugs dispensed in Medicare Part D in 2013, based on the number of fills, including the respective fill rank and cost rank in each state.
  • Rankings for drugs within each specialty and state.

Optional add-ons include:

  • Custom runs by location, drug, or provider ID
  • Open Payments/NPI number crosswalk for linking prescribing data to Dollars for Docs data

Prescriber Checkup (2014)

Link Copied!
Source
Center for Medicare and Medicaid Services
Date released
November 2016
Dates covered
2014
Rows
Varies
Topic
Health

Ready-to-use data on 1.4 billion prescriptions made under Medicare's popular prescription-drug program, Medicare Part D, which serves 41 million people and pays for more than 1/4 of prescriptions written nationwide.

Learn more about this dataset

This dataset is ProPublica's premium version of the Medicare Part D data for 2014. Medicare’s popular prescription-drug program serves 41 million people and pays for more than one of every four prescriptions written nationwide. We've cleaned and analyzed Medicare Part D data from 2014 to offer a detailed look at individual providers' prescribing habits. There are five total files:

  • A main provider file that includes overall information for each provider, including name, location and summary prescription tallies. This includes all providers who wrote at least 50 prescriptions for at least one drug in 2014. All totals refer only to Part D.
  • A breakdown of the drugs prescribed by each provider.
  • A lookup file for drugs to determine if they fall into specific categories, including antibiotics, Beers list drugs, and more.
  • A summary file of all drugs dispensed in Medicare Part D in 2014, based on the number of fills, including the respective fill rank and cost rank in each state.
  • Rankings for drugs within each specialty and state.

Optional add-ons include:

  • Custom runs by location, drug, or provider ID
  • Open Payments/NPI number crosswalk for linking prescribing data to Dollars for Docs data

Prescriber Checkup (2015)

Link Copied!
Source
Center for Medicare and Medicaid Services
Date released
November 2016
Dates covered
2015
Rows
Varies
Topic
Health

Ready-to-use data on 1.4 billion prescriptions made under Medicare's popular prescription-drug program, Medicare Part D, which serves 41 million people and pays for more than 1/4 of prescriptions written nationwide.

Learn more about this dataset

This dataset is ProPublica's premium version of the Medicare Part D data for 2015. Medicare’s popular prescription-drug program serves 41 million people and pays for more than one of every four prescriptions written nationwide. We've cleaned and analyzed Medicare Part D data from 2015 to offer a detailed look at individual providers' prescribing habits. There are five total files:

  • A main provider file that includes overall information for each provider, including name, location and summary prescription tallies. This includes all providers who wrote at least 50 prescriptions for at least one drug in 2015. All totals refer only to Part D.
  • A breakdown of the drugs prescribed by each provider.
  • A lookup file for drugs to determine if they fall into specific categories, including antibiotics, Beers list drugs, and more.
  • A summary file of all drugs dispensed in Medicare Part D in 2015, based on the number of fills, including the respective fill rank and cost rank in each state.
  • Rankings for drugs within each specialty and state.

Prescriber Checkup (Current Year)

Link Copied!
Source
Center for Medicare and Medicaid Services
Date released
August 2017
Dates covered
2011-2015, years sold separately
Rows
Varies
Topic
Health
Featured uses

Ready-to-use data on billions prescriptions made under Medicare's popular prescription-drug program, Medicare Part D, which serves 42 million people and pays for more than 1/4 of prescriptions written nationwide. Updates annually

Learn more about this dataset

This dataset is ProPublica's premium version of the Medicare Part D data. We use it to publish our online lookup tool, Prescriber Checkup.

In 2016, the latest year for which our data is available, Medicare’s popular prescription-drug program served 42 million people and pays for more than one of every four prescriptions written nationwide. We've cleaned and analyzed Medicare Part D data each year from 2011-2016 to offer a detailed look at individual providers' prescribing habits. This page includes a representative sample of the data and the option to purchase the data from 2016, which is the most current year of data available; individual previous years can be purchased separately here.

Purchase of this data set includes five total files:

  • A main provider file that includes overall information for each provider, including name, location and summary prescription tallies. This includes all providers who wrote at least 50 prescriptions (in Medicare Part D) for at least one drug in the covered calendar year. About 447,000 providers met that criteria.
  • A breakdown of the drugs prescribed by each provider.
  • A lookup file for drugs to determine if they fall into specific categories, including antibiotics, Beers list drugs, and more.
  • A summary file of all drugs dispensed in Medicare Part D in the covered calendar year, based on the number of fills, including the respective fill rank and cost rank in each state.
  • Rankings for drugs within each specialty and state.

To purchase previous years, browse the archived years.

Prescribing Patterns and Industry Payments for Top 50 Drugs

Link Copied!
Source
Centers for Medicare and Medicaid Services
Date released
July 2018
Dates covered
2016
Rows
2,677,042
Methodology
How We Analyzed Doctors’ Pharma Industry Ties and Medicare Prescribing
Topics
  • Health
  • Business
Featured uses

A custom data set for evaluating the relationship between provider prescribing habits and industry payments, based on 2016 Medicare Part D and Open Payments data.

Learn more about this dataset

This analysis-ready data set includes information on health care providers who prescribed any of the top 50 most-prescribed or top 50 most-costly brand-name drugs in Medicare’s prescription drug program, known as Medicare Part D, in 2016. ProPublica linked this prescribing data with information on industry payments to doctors under the Open Payments program to generate more than 2.6 million doctor-drug combinations.

Each doctor-drug combination includes the provider’s NPI number and speciality, the drug name, whether it is a top 50 drug by prescribing volume or price, the number and value of Medicare Part D claims made for each drug, the total number and value of industry payments received (by type), and total claims filled by each provider’s patients for all drugs under Medicare Part D.

ProPublica’s analysis found that for almost all of the 50 most-prescribed brand-name drugs in Medicare’s prescription drug program in 2016, physicians who had an interaction with the manufacturer involving that drug prescribed the drug at higher rates than physicians who did not. We also found that among providers who had such interactions, the dollar value of those interactions was larger for physicians who prescribed the drug than for those who did not. (As an additional sensitivity check, we conducted the same analysis looking at the 50 most-costly brand-name drugs in Medicare’s prescription drug program.)

With the available observational data we are not able to say whether payments lead to prescribing that is counter to patients’ interests, but our analysis provides new insight into the dynamics between doctors’ industry interactions and their prescribing.

Code behind our analysis is available on Github.

For each drug, health care providers are included in the dataset if they appeared in Open Payments in relation to the drug and/or if there were 11 or more claims for the drug from the provider under Medicare Part D. (The Part D data released from CMS redacts prescribers with fewer than 11 claims for a drug.)

Open Payments data from ‘general payments’ are included. Research payments and ownership interests are excluded.

Providers whose NPI could not be determined are not included in the data, nor are providers such as nurse practitioners, who are not covered by Open Payments.

ProPublica's Bailout Data

Link Copied!
Source
Treasury Department; SEC filings
Dates covered
2009-Present
Rows
Varies
Topic
Business
Featured use
The Bailout: By The Actual Numbers

A database of expenditures by the Treasury Department through both the TARP bill and the separate bailout of Fannie Mae and Freddie Mac.

Learn more about this dataset

This is ProPublica’s Bailout data. The dataset is primarily derived from reports issued by the Treasury Department. The sole exception is data for Fannie Mae and Freddie Mac, which is derived from quarterly and annual SEC filings by the two companies (follow the links to each company’s filings).

The database includes data on expenditures by the Treasury Department via both the broader $700 billion TARP bill (later reduced to $475 billion) and the separate bailout of Fannie Mae and Freddie Mac. The downloadable version of the data available for purchase includes summary data for each entity that received funds from the bailout programs, as well as detailed data on all transactions with the Treasury by the given entity – both disbursements to the entity and payments back to the Treasury.

A readme file, included with the sample, provides more detailed information about what data is available.

Recovery Tracker Data

Link Copied!
Source
Recovery.gov, USAspending.gov
Date released
July 2012
Dates covered
Feburary 2009 - June 2012
Rows
472059
Methodology
How We Compiled and Analyzed Stimulus Spending
Topic
Business
Featured use
Recovery Tracker

A clean, research-ready database of how federal stimulus money was spent between February 2009 and June 2012.

Learn more about this dataset

This database combines records from the recipient-reported data on Recovery.gov and Recovery Act grants and loans reported by agencies on USAspending.gov . In cases where we found the same record reported in both data sets, we removed the duplicates. It’s possible we missed some duplicates because of differences in the records. We filled in missing information and corrected data entry errors when we found them and could verify the information.

With our data set, you can query by federal agency, state, county, or recipient, as well as total cash amount. Data also includes payment descriptions. You also can track companies by DUNS number to see what grants, loans or contracts they received. A DUNS number is a unique nine-digit number used to identify a company or an organization. The numbers are issued by Dun & Bradstreet, which provides business information for credit and marketing. (Warning: Some companies have multiple DUNS numbers for different locations.)

We’ve also edited the code for the Catalog of Federal Domestic Assistance (CFDA code). Contracts from Recovery.gov, do not include a CFDA number. So we’ve generated one based on the awarding agency to help you figure out what sorts of projects were funded in our area.

School Segregation Data

Link Copied!
Source
Department of Education (Ed Facts and Common Core of Data), Stanford Education Data Archive, State Education Agencies, Associated Press
Date released
December 2017
Dates covered
2000-2001 to 2014-2015 school years
Rows
Varies
Topic
Education
Provided in collaboration with
Associated Press

A cleaned, analysis-ready look at diversity, achievement and segregation in each school division and in each individual traditional and charter K-12 school in the country.

Learn more about this dataset

This dataset is a cleaned, analysis-ready look at diversity, achievement and segregation in each school division and in each individual traditional and charter K-12 school in the country.

The original data, obtained from the National Center for Education Statistics and ED Facts under the Department of Education, provide annual enrollment figures and proficiency measures at all U.S. public schools from the 2000-01 school year to the 2014-15 school year. The AP cleaned and enhanced this data with additional information, including using the charter school crosswalk from the Stanford Education Data Archive to identify the district ID of the school district that geographically contains a given charter school.

Additionally, The Associated Press created several measures to weigh a school’s demographic similarity to its district, its concentration of certain demographic groups of students, and its ‘Entropy’ – a measure of evenness of demographic categories. At the district level, the AP has provided a similarity score and created an ‘Entropy Index’ for each district, so reporters could identify districts that are more partitioned along racial lines.

These measurements can be used to reach conclusions such as “50 percent of black students in Milwaukee attend schools that are at least 90 percent black” and “Norland Middle School is one of the most dissimilar schools in the Dade County School District: over 95% of its students are black, compared to only 22% in the district as a whole.”

Using this dataset, the Associated Press reported that:

  • Nationwide, segregation metrics such as the exposure index show that school segregation has been returning to its Civil Rights-era levels.
  • The proportion of charter schools that are over 99% nonwhite nationwide is 17%, in contrast to only 4.5% of traditional schools.
  • While charter school quality is highly varied across districts, charter schools that are overwhelmingly minority lag behind both more integrated charter and traditional schools in the same district.

Download additional documentation and a sample of the data by completing the form on this page.

Small Business Loans

Link Copied!
Source
U.S. Small Business Administration
Date released
January 2016
Dates covered
1990 - 2015
Rows
1,357,810
Topic
Business
Provided in collaboration with
Investigative Reporters and Editors

Information on the loans given to small businesses owners and franchises under the Small Business Administration's popular 7a program. Includes all data 1990-2015.

Learn more about this dataset

The SBA 7a business loans database contains information about loans guaranteed by the U.S. Small Business Administration under its main lending program, known as 7a. The data include loans approved by the SBA since 1990, when Congress created the agency to help entrepreneurs form or expand small enterprises.

The SBA's 7a program provides loans to small business owners who can't obtain financing through traditional channels. The program operates through private-sector lenders who provide loans that are, in turn, guaranteed by the SBA. The SBA7a program itself has no funds for direct lending or grants.

The data contain information on the business getting the loan including address and industry code, the bank lending the money, the amount loaned, and (where applicable) whether the loan was paid in full or charged off.

This data set includes all records 1990 - 2015.

Surgeon Scorecard

Link Copied!
Source
Centers for Medicare & Medicaid Services Inpatient Limited Data Set; ProPublica
Date released
September 2015
Dates covered
2009-2013
Rows
23370
Methodology
How We Measured Surgical Complications
Topic
Health
Featured uses

ProPublica's Surgeon Scorecard provides a unique quality-of-care metric, based on an analysis of nearly 17,000 surgeons performing one of eight elective procedures in Medicare.

Learn more about this dataset

ProPublica's Surgeon Scorecard provides a unique quality-of-care metric, based on an analysis of nearly 17,000 surgeons performing one of eight elective procedures in Medicare. This dataset is the raw data behind our online lookup tool.

The data, which has been carefully adjusted for differences in patient health, age and hospital quality, reveals wide variations in complication rates for some of the most routine elective procedures. We focused on eight common elective surgeries – knee replacements, hip replacements, three types of spinal fusions, one in the neck and two in the lower back, gall bladder removals, prostate removals, and prostate resections.

Derived from billing records for in-patient hospital stays from 2009 through 2013, this data set includes:

  • Basic identifying information for each included surgeon, as well as his/her performance measures, including the Adjusted Complication Rate, by procedure.
  • Basic descriptive information for each hospital as it appears in Surgeon Scorecard
  • A crosswalk that allows the matching of surgeons to hospitals
  • A table of "low-volume" surgeons who performed at least one, but fewer than 20 of a particular procedure in our data.

Read a longer, more technical methodology and its appendices.

Treatment Tracker Data

Link Copied!
Source
ProPublica, Centers for Medicare and Medicaid Services, National Plan and Provider Enumeration System, American Medical Association
Dates covered
2013-2015 (years sold separately)
Methodology
Treatment Tracker Methodology
Topic
Health
Featured uses

This dataset provides details on payments made through Medicare Part B to individual doctors and other health professionals for services as varied as office visits, ambulance mileage, lab tests, and the doctor’s fee for open-heart surgery. Updates annually

Learn more about this dataset

This dataset is ProPublica's version of Medicare Part B data. It includes information on payments to individual doctors and other health professionals serving more than 33 million seniors and disabled individuals in its Part B program. Part B covers services as varied as office visits, ambulance mileage, lab tests, and the doctor’s fee for open-heart surgery. We use it to publish our online lookup tool, Treatment Tracker. Data is available for 2013, 2014, and 2015.

Our cleaned, aggregated version of this data includes two tables:

  • Providers: a summary table of the total number of patients treated by each provider in Medicare's Part B program and the total amount paid, among other things; and
  • Treatments: for each provider who received payments under Part B, our treatment-level data includes the number of times a service was performed, the number of patients treated, and the number of unique patient visits during which each service was performed; as well as detailed information about allowed, total, and average payments for the services provided.

Please note that the data does not include Medicare Advantage plans, which are the health plans Medicare beneficiaries can choose in place of the traditional program. (These programs are more popular in some parts of the country than others.) Nor does it include services delivered to patients with other coverage, such as private health insurance or Medicaid.

The Medicare Part B data captures both medical services as well as drugs dispensed in a facility or a physician's office. Because doctors have to purchase drugs, and they can be expensive, CMS specifically denotes payments relating to drugs and those relating to medical services. Those distinctions are noted in both files.

Note: The sample provided here includes documentation for data from 2015 and sample data from 2014. Both are representative of the data from all three available years: 2013, 2014, and 2015.

Treatment Tracker Data (2013)

Link Copied!
Source
ProPublica, Centers for Medicare and Medicaid Services, National Plan and Provider Enumeration System, American Medical Association
Dates covered
2013-2015 (years sold separately)
Methodology
Treatment Tracker Methodology
Topic
Health
Featured uses

This dataset provides details on payments made through Medicare Part B to individual doctors and other health professionals for services as varied as office visits, ambulance mileage, lab tests, and the doctor’s fee for open-heart surgery. Updates annually

Learn more about this dataset

This dataset is ProPublica's version of Medicare Part B data. It includes information on payments to individual doctors and other health professionals serving more than 33 million seniors and disabled individuals in its Part B program. Part B covers services as varied as office visits, ambulance mileage, lab tests, and the doctor’s fee for open-heart surgery. We use it to publish our online lookup tool, Treatment Tracker.

Our cleaned, aggregated version of this data includes two tables:

  • Providers: a summary table of the total number of patients treated by each provider in Medicare's Part B program and the total amount paid, among other things; and
  • Treatments: for each provider who received payments under Part B, our treatment-level data includes the number of times a service was performed, the number of patients treated, and the number of unique patient visits during which each service was performed; as well as detailed information about allowed, total, and average payments for the services provided.

Please note that the data does not include Medicare Advantage plans, which are the health plans Medicare beneficiaries can choose in place of the traditional program. (These programs are more popular in some parts of the country than others.) Nor does it include services delivered to patients with other coverage, such as private health insurance or Medicaid.

The Medicare Part B data captures both medical services as well as drugs dispensed in a facility or a physician's office. Because doctors have to purchase drugs, and they can be expensive, CMS specifically denotes payments relating to drugs and those relating to medical services. Those distinctions are noted in both files. Data is available for 2013, 2014, and 2015.

Note: The sample provided here includes documentation for data from 2015 and sample data from 2014. Both are representative of the data from all three available years: 2013, 2014, and 2015. See all years available here.

Treatment Tracker Data (2014)

Link Copied!
Source
ProPublica, Centers for Medicare and Medicaid Services, National Plan and Provider Enumeration System, American Medical Association
Dates covered
2013-2015 (years sold separately)
Methodology
Treatment Tracker Methodology
Topic
Health
Featured uses

This dataset provides details on payments made in 2014 through Medicare Part B to individual doctors and other health professionals for services as varied as office visits, ambulance mileage, lab tests, and the doctor’s fee for open-heart surgery.

Learn more about this dataset

This dataset is ProPublica's version of Medicare Part B data. It includes information on payments to individual doctors and other health professionals serving more than 33 million seniors and disabled individuals in its Part B program. Part B covers services as varied as office visits, ambulance mileage, lab tests, and the doctor’s fee for open-heart surgery. We use it to publish our online lookup tool, Treatment Tracker.

This is the data from 2014. Our cleaned, aggregated version of this data includes two tables:

  • Providers: a summary table of the total number of patients treated by each provider in Medicare's Part B program and the total amount paid, among other things; and
  • Treatments: for each provider who received payments under Part B, our treatment-level data includes the number of times a service was performed, the number of patients treated, and the number of unique patient visits during which each service was performed; as well as detailed information about allowed, total, and average payments for the services provided.

Please note that the data does not include Medicare Advantage plans, which are the health plans Medicare beneficiaries can choose in place of the traditional program. (These programs are more popular in some parts of the country than others.) Nor does it include services delivered to patients with other coverage, such as private health insurance or Medicaid.

The Medicare Part B data captures both medical services as well as drugs dispensed in a facility or a physician's office. Because doctors have to purchase drugs, and they can be expensive, CMS specifically denotes payments relating to drugs and those relating to medical services. Those distinctions are noted in both files.

Note: The sample provided here includes documentation for data from 2015 and sample data from 2014. Both are representative of the data from all three available years: 2013, 2014, and 2015. See all years available here.

Treatment Tracker Data (2015)

Link Copied!
Source
ProPublica, Centers for Medicare and Medicaid Services, National Plan and Provider Enumeration System, American Medical Association
Dates covered
2013-2015 (years sold separately)
Methodology
Treatment Tracker Methodology
Topic
Health
Featured uses

This dataset provides details on payments made in 2015 through Medicare Part B to individual doctors and other health professionals for services as varied as office visits, ambulance mileage, lab tests, and the doctor’s fee for open-heart surgery.

Learn more about this dataset

This dataset is ProPublica's version of Medicare Part B data. It includes information on payments to individual doctors and other health professionals serving more than 33 million seniors and disabled individuals in its Part B program. Part B covers services as varied as office visits, ambulance mileage, lab tests, and the doctor’s fee for open-heart surgery. We use it to publish our online lookup tool, Treatment Tracker.

This is the data from 2015. Our cleaned, aggregated version of this data includes two tables:

  • Providers: a summary table of the total number of patients treated by each provider in Medicare's Part B program and the total amount paid, among other things; and
  • Treatments: for each provider who received payments under Part B, our treatment-level data includes the number of times a service was performed, the number of patients treated, and the number of unique patient visits during which each service was performed; as well as detailed information about allowed, total, and average payments for the services provided.

Please note that the data does not include Medicare Advantage plans, which are the health plans Medicare beneficiaries can choose in place of the traditional program. (These programs are more popular in some parts of the country than others.) Nor does it include services delivered to patients with other coverage, such as private health insurance or Medicaid.

The Medicare Part B data captures both medical services as well as drugs dispensed in a facility or a physician's office. Because doctors have to purchase drugs, and they can be expensive, CMS specifically denotes payments relating to drugs and those relating to medical services. Those distinctions are noted in both files.

Note: The sample provided here includes documentation for data from 2015 and sample data from 2014. Both are representative of the data from all three available years: 2013, 2014, and 2015. See all years available here.

U.S. Freeze Seasons, 1917-2017

Link Copied!
Source
National Oceanic and Atmospheric Administration
Date released
October 2017
Dates covered
1917-2017
Topic
Environment
Provided in collaboration with
Associated Press
Featured use
Science Says: Jack Frost nipping at your nose ever later

The dates of the first fall freeze and last spring freeze for 700 locations across the United States.

Learn more about this dataset

Each year, the National Weather Service records the dates of the first fall season freeze and the last spring season freeze at hundreds of weather stations around the country. The Associated Press has collected data from 700 of the most complete locations between 1917 and 2017, and combined it with additional geographic information about the location of each station, calculated the mean and median for each location, and appended a NOAA-provided probability of first freeze dates, based between 1981 and 2010.

The raw data was provided by the National Centers for Environmental Information at the National Oceanic and Atmospheric Administration (NOAA). The data shows that the date of the first freeze of the year has fallen further down the calendar over the last several decades — and last year extended that trend. Last year, the average first frost day at the included stations was two weeks later than average, while the last frost of spring was nine days earlier.