Miseducation: About Our Data
Details behind our database on racial inequality across America’s schools.
ProPublica’s Miseducation interactive reveals disparities in discipline and educational opportunities in more than 96,000 public schools and 17,000 districts in the United States.
Most of the data in our interactive comes from the Civil Rights Data Collection (CRDC), which is administered by the U.S. Department of Education’s Office for Civil Rights. The department collects data every two years from all schools and districts across the country on a range of topics from Advanced Placement enrollment to suspension rates. The most recent data release, and the one displayed in our interactive, covers the 2015-16 school year. All public schools and districts are required to report data to the department. Many of the fields are broken down by race and ethnicity, providing a snapshot of inequities across the nation’s schools. The CRDC data was used as the master list for all schools and districts to be included in the interactive.
Our interactive also includes data from the department’s Common Core of Data (CCD) from the 2015-16 school year. The maps in our interactive include geographic information from the National Center for Education Statistics’ Education Demographic and Geographic Estimates (EDGE) dataset from the 2015-16 school year.
For the homepage and district pages, we included data from the Stanford Education Data Archive (SEDA), which was compiled and analyzed by researchers from the Stanford Center for Education Policy Analysis, including Sean F. Reardon, Demetra Kalogrides, Andrew Ho, Ben Shear, Kenneth Shores and Erin Fahle. The SEDA dataset, which is comprised of pooled test score data from the 2008-09 to 2014-15 school years, reveals the average difference in grade-level equivalence of students from different racial groups.
For many schools, districts and states, we developed measures to illustrate segregation as well as disparities in discipline and access to educational opportunities.
To illustrate gaps in opportunity, we calculated incidence rates, or risk ratios, showing the likelihood of students in each racial group to participate in AP courses or gifted programs.
For our ratios, we used data from the CRDC, which is disaggregated by race and ethnicity. The gifted and talented data tallies students who participate in accelerated programs. The AP measure consists of high school students who are enrolled in at least one advanced course that gives them college credit if they pass a standardized exam. The discipline measure reflects the number of students who receive out-of-school suspensions.
When broken down by race and ethnicity, many student groups are too small to calculate risk ratios that are statistically significant. If a ratio is not statistically significant, we cannot be confident in the direction of the effect. In an effort to reduce the uncertainty of our measures, we calculated 95 percent confidence intervals for each risk ratio and do not display figures where the results were not statistically significant.
For more than 2,500 districts, we calculated a dissimilarity index, a commonly used measure of segregation. This index shows the distribution of students of different races across schools in a district. We grouped districts into three categories — low, medium and high — to describe levels of racial imbalance in a district. We broke down the data into six groups using Jenks breaks, and pooled the two lowest, middle and highest groups together to make the three categories. We only calculated indices for districts with more than five schools, more than 1000 students and more than 100 students in each racial subgroup.
We also display an achievement gap measure from SEDA, which shows the average difference in grade-level equivalence between black and white students or Hispanic and white students. This measure is only available for districts where both student subgroups have at least 20 students. We display an Empirical Bayes estimate for the measure.
For each school, district and state, we display a handful of measures that illustrate a student’s access to educational opportunities. We show a racial breakdown of students who are enrolled in gifted and talented programs or in at least one AP course.
For each school, district and state, we show the percent of students who receive free or reduced-price lunch, which is frequently used by researchers as a proxy for student poverty and comes from the CCD dataset. We also show a school’s four-year adjusted-cohort graduation rate, which is the rate of students who graduate from high school within four years with a diploma. To protect the privacy of students, the department has released graduation rate ranges for some small schools and districts, instead of exact numbers.
We also show the percentage of high school students who are enrolled in several types of advanced coursework, including:
- AP participation: The percentage of high school students enrolled in at least one AP course.
- SAT/ACT participation: The percentage of high school students who take the SAT or ACT.
- Science: The percentage of high school students enrolled in college-preparatory physics, chemistry or biology classes.
- Mathematics: The percentage of high school students enrolled in college-preparatory geometry, calculus or other advanced mathematics course, like trigonometry, precalculus or statistics. We also include the percentage of all students who take algebra in eighth grade.
We also show a number of measures of staffing and other resources, including:
- Students per teacher: The number of students per full-time teacher at a school.
- Inexperienced teachers: The percentage of full-time teachers who have less than two years of teaching experience.
- Chronically absent teachers: The percentage of full-time teachers who missed more than 10 days of school, not including administratively approved leave like professional development.
- Average number of AP courses: The average number of AP courses per high school.
- Support staff: The number of counselors, social workers and licensed psychologists in a school. For districts and states, we show the support staff per 1,000 students. This figure may not be a whole number because of part-time workers.
- Credit recovery programs: Whether a school has a program to help failing students recover credits.
- Dual enrollment programs: Whether a school has a program that allows students to take courses at colleges.
- International Baccalaureate programs: Whether a school provides students with the specialized, college-preparatory program.
Some school districts are under a court-imposed desegregation order or have a desegregation plan supervised by a state or federal agency. Our database shows which districts are subject to such an order or plan. School districts are required to designate specific staff members to coordinate compliance with civil rights laws, such as handling complaints related to race, sex or disability discrimination. We show the contact information for the designated civil rights coordinators for each district.
When a school, district or state is in the top or bottom 10 percent for an opportunity measure among other schools, districts or states, we indicated as such.
For each school, district and state, we display figures that illustrate a student’s likelihood of being disciplined, including a racial breakdown of students who have been expelled or suspended out of school. We include other measures of school discipline, including:
- Out-of-school suspensions: The total number and percentage of students who have been temporarily removed from school at least once for disciplinary reasons.
- In-school suspensions: The total number and percentage of students who have been temporarily removed from a classroom, while still under the supervision of school personnel, at least once for disciplinary reasons.
- Expulsions: The total number of students who have been removed permanently from a school for disciplinary reasons.
- Total days missed to suspension: The number of days that all students missed due to one or more out-of-school suspensions over an entire school year.
- Average number of days of a suspension: All instances of out-of-school suspension divided by the total number of days all students missed in a school, district or state.
- Student referrals to law enforcement: The number of times a school staff member referred a student to any law enforcement agency, including school police units, for incidents that occur on school ground or during school-related events, regardless of whether an action is taken by law enforcement. Referrals may include citations, tickets and arrests.
- Student arrests: The total number of school-related arrests of students for any offense that takes place on school grounds or during school-related events. All arrests are considered referrals to law enforcement.
- Transfers to alternative schools: The total number of students who have been transferred to a nontraditional school for disciplinary reasons.
- Security staff: The total number of security guards and sworn law enforcement officers, who have arrest authority, at a school. This figure may not be a whole number because of part-time workers. For districts and states, we show the security staff per 1,000 students. Because of a technical error in the original survey related to the question on sworn law enforcement officers, this measure is an undercount (see more below).
When a school, district or state is in the top or bottom 10 percent for a discipline measure among other schools, districts or states, we indicated as such.
There may be errors in the CRDC, as with any self-reported data. Though districts are required to ensure accuracy of their data, some may still report incorrect figures. The Office for Civil Rights attempts to identify and probe data anomalies and occasionally releases updates. We intend to update the data in our interactive shortly after updates or changes.
Hawaii’s Department of Education incorrectly reported data on its gifted and talented participation, so we removed these measures for the state. For schools that did not answer a survey question, or if there was an error in the data collection process, we marked the data as “Not Available.”
Other than reporting errors, the survey had technical issues. The survey question related to sworn law enforcement officers was incorrectly displayed in the most recent data collection, causing more than 69,000 schools to skip this required question. We have shown the data from the schools that did report law enforcement officers as required and have indicated that this is a minimum, not exact, number of staff.
CRDC has replaced with a special code or rounded some data values to prevent identification of students. Sometimes, this type of privacy protection occurs in fields with small numbers. For example, the gifted and talented program data redacts two or fewer students in a category. In these cases, we rounded to one. Additionally, for some variables, the CRDC rounds students in groups of three for privacy reasons. For example, student counts from four to six are rounded to five, and from seven to nine are rounded to eight. In these cases, groups of students may represent a slight undercount or overcount.
For our analysis, we merged several different data sources. Our two main sources were datasets from the federal civil rights office and Common Core. While most schools have a universal identifier that is used in both sources, a number of schools do not have matching identifiers. We made an effort to link the two different identifiers, using a crosswalk that we pulled from individual school pages on the federal site, but there are some schools which we weren’t able to match and they may have fewer data points than the rest. The federal data designates the majority of schools as either a primary, elementary or high school. This enabled us to create district measures that specifically reflect high school enrollment (for example, the number of high school students taking an AP course). However, there are a handful of schools for which the data did not note the school level. For these schools, the denominator for a number of our measures are total district enrollment, instead of high school enrollment across the district.
School addresses were sometimes missing in the 2015-16 school year data. For those schools, we used address data from the 2016-17 and 2017-18 school years. For the free and reduced-price lunch measure, Massachusetts did not report data for the 2015-16 school year, so we used figures from the 2014-15 school year.
Have tips? Find errors? Let us know.
Are you looking at our database and have some information you want to share about local disparities? Or are you finding data inaccuracies? Either way, we want to hear your thoughts. Please keep in mind that the data we reported is from the 2015-16 school year and comes from the U.S Department of Education. Please send an email to [email protected].