ProPublica Data Institute 2016

An intensive workshop on how to use data, design and code for journalism. From June 1st to 15th in New York City.

Our Materials

Here are all of the materials we used to teach the 2016 ProPublica Data Institute: slides, exercises, links, and homework. This is not an online course and doesn’t have all the context or instruction to be a standalone class. However, we’re working on creating one. Sign up here to be notified of any updates to these materials, as well as any announcements about future workshops.

Want to use our slides? Our teaching materials fall under the same Creative Commons license we use across our site. Get more details here.


Welcome: May 31, 2016

Welcome Reception & Install Party

Command Line Tools (Macs Only)
  • Open your Terminal app (comes with all Macs) and paste this exact command into the window and press enter:
    • xcode-select --install
Day 1: June 1, 2016

Intro to Data Journalism

Intro to Spreadsheets


Where to Find and Load Data

In-Class Demos
  • Using Socrata to look at 3-1-1 calls from NYC
  • Tabula
  • Types of data (numeric, text, date)
  • Quirks of Excel (reformatting dates, dropping leading zeros)
  • Text files and types (csv, tab, fixed width, pipe)
  • Text delimiter (probably quotes, but maybe not)
  • Open your text file in a reader and examine it

Advanced Spreadsheets: String Functions


Advanced Spreadsheets: Pivot Tables

  • What are pivot tables
  • How to copy and move them


  • Find and review a few potential data sources for your individual project. Be prepared to discuss what's in those datasets tomorrow.
Day 2: June 2

Evaluating Data

  • Fairfax County Arrest data
  • Look for missing values
  • Look for out of range values
  • Look at all the values that appear in a column/row
  • Look for truncation
  • Pivot - look for similar names

Open Refine

In-Class Demos
  • Efficiently fix common data problems
  • Data clustering with Open Refine

Best Practices

In-Class Demos
  • Create a text document
  • Save a clean copy of your data
  • Keep track of your work
  • Describe your steps
  • Copy/paste functions
  • Screen grabs of dialogue boxes

Analyzing Data: One Variable


Analyzing Data: Two Variables

  • Cartesian coordinate review
  • Correlation
  • Scatterplots
  • Percent Change

Analyzing Data: M&Ms

  • Sampling
  • Introduction to hypothesis testing


  • Pick a dataset that can answer the questions you're story is focused on, or adjust your project so that your questions are answerable with the data you have.
Day 3: June 3

Percent Change


Intro to Code

In-Class Demos
  • What coding languages have you heard of?
  • Using the web inspector

How Websites Work

In-Class Demos
  • How the Internet passes websites around
  • What HTML, CSS and Javascript contribute to a webpage
  • Drawing a Website


In-Class Demos
  • How to create your first HTML file
  • Shortcut to the basic HTML template
  • How to use:
    • <h1>
    • <h2>
    • <h3>
    • <p>
    • <img>
    • <a>
    • <ul>
    • <!-- Comments -->
  • Copy and paste this code and follow the instructions inside to format the page.
  • Can you fix this broken code?

Basic CSS

In-Class Demos
  • How to create your first CSS file
  • Shortcut to linking to your CSS file
  • How CSS styles work
  • Using your practice HTML file from before, add CSS styles to it such you change the:
    • color
    • font-family
    • font-size
  • On your own, look up how to do the following in CSS, and add it to your HTML file as well:
    • underline text
    • bold text
    • italicize text
  • Going back to the Supreme Court article you formatted earlier, do the following using CSS:
    • Make the main headline dark red.
    • Use the font family "Georgia" for the main headline and the subheadline.
    • Center the text of the main headline and the subheadline.
    • Give the paragraphs a line height of 19 pixels.
    • Remove the underline from the links.
    • Make the "Related articles" label all uppercase.
    • Bonus: Make an underline appear when you hover over a link.

CSS Classes

In-Class Demos
  • How to write your own CSS Class
  • How CSS deals with conflict


  • Save this HTML onto your computer. Link to a new CSS file that you create. Write CSS to make the end result look like this image. You may only write CSS. You cannot edit the HTML file.
  • Report out and research your individual project. Know what sources you'll need to talk to and clarifications you might need on datasets.
  • Using HTML and CSS, layout a one-page web portfolio for yourself. Don't worry too much about the final design, just make sure to get all of your information on the page.
📚 Weekend! 🎉
Day 4: June 6

CSS Layout

Quick Review
  • Review of Day 3 using the Paper Code exercise
  • Mars Rover exercise solution.
In-Class Demos
  • Why the Internet is just a bunch of boxes
  • How the box model works
  • How floats work
  • How positioning works
  • Clearfix demo
  • How to really use the Web Inspector
  • Open the web portfolio you made over the weekend. Try using the box model and floats to improve your layout.
  • Save this code onto your computer. Using the box model, floats, and positioning, write CSS such that it looks like this image.

Intro to Design Principles



  • Using the principles we discussed today, redesign your résumé. Email the before and after version to [email protected].
Day 5: June 7

Type, Layout & Color

Homework Review
  • Let’s take a look at those résumés! (Group Critique)
  • Name that Font!
  • Type Crimes!


In-Class Demos
  • Easy ways to sketch, even if you don't think of yourself as an artist
  • Start sketching out a basic layout for your project

Intro to Data Visualization

In-Class Demos
  • Graphic Evaluations
  • Why Visualize Data
  • History


  • Start laying out your individual projects, using both the CSS and the design principles you've learned.
Day 6: June 8

Intro to Data Visualization, Continued


Git & GitHub

In-Class Demos
  • Creating a repository and making commits
  • Rolling back to an earlier commit
  • Cloning someone else's repository
  • Forking a repository to make your own edits
  • Editing files & making commits in the GitHub website
  • Publishing web pages on GitHub Pages, in a gh-pages branch


  • If your individual project will include a data visualiztion, begin creating sketches of what you want it to look like. Otherwise, keep making progress on your project.
Day 7: June 9

Command Line (& More Git)

In-Class Demos
  • Basics: commands, arguments, option flags
  • Moving / renaming / deleting files
  • "open ." / "explorer ." to "double-click" on something
  • How to get more information about a command
  • Git on the Command Line (cheat sheet)

Intro to Programming

Github Repo
  • Clone this repository, which has all the code and slides for your next day and a half.
In-Class Demos
  • Basics of programming (variables, types, conditionals, iteration)
  • Methods and classes in Ruby
  • Handling errors with exceptions
  • Debugging classes
  • Sample class about dogs barking
  • Writing and refactoring a class to keep track of students and teachers in a school


  • Write a class that makes a routine task easier (for example, a todo list)
  • Continue working on your individual projects
Day 8: June 10

Programming for Data Manipulation and Scraping

In-Class Demos
  • Data interchange with JSON
  • Reading/writing files
  • Adding functionality to Ruby with Rubygems
  • Getting data from the Internet with RestClient
  • Parsing APIs with Ruby
  • Writing CSV spreadsheets with Ruby
  • Read and write data in JSON and CSV
  • Consume a JSON or XML API and transform it to get the data you want


  • Continue working on your scraper
  • Continue working on your individual projects. When we come back after the weekend, it should be in pretty good shape.
📚 Weekend! 🎉
Day 9: June 13

Programming for Data Manipulation and Scraping (Cont.)

In-Class Demos
  • Dealing with well-formed XML
  • Parsing HTML with Nokogiri
  • Scrape a site with Nokogiri and store the data in JSON or CSV
Day 10: June 14

Basic Javascript and jQuery


  • Work on your individual projects and write down exactly what you still need help with tomorrow.
  • Start thinking about your 5-minute final presentations for Wednesday

User Testing


  • Finish your individual projects
  • Prepare for your 5-minute presentations tomorrow
Day 11: June 15

Final Presentations

Going Forward

  • How to keep learning on your own
  • How to publish your projects

Meet the Class of 2016

We're thrilled to announce the 12 outstanding journalists who will be joining us for the ProPublica Data Institute this year. Find out more about the Data Institute »

Adrian Garcia (@adriandgarcia) recently joined a news startup in Denver backed by the founder of Business Insider. He previously covered Northern Colorado trends, startup news and developments as a growth and data reporter for the Fort Collins Coloradoan. Adrian joined the Coloradoan in 2014 after graduating from the University of Colorado Boulder. His previous experience includes reporting for the Denver Post, I-News at Rocky Mountain PBS and his hometown paper The Pueblo Chieftain.

Allison Ross (@allisonsross) covers Kentucky’s largest school district as an education reporter for the Courier-Journal in Louisville. Prior to coming to the Bluegrass State, she worked as a senior banking reporter at Bankrate and also spent six years as a reporter at the Palm Beach Post in Florida.

Anthony Martinez (@amartinez1208) is a production assistant at public radio station WBEZ in Chicago. His work has appeared on National Public Radio's Latino USA, State of the Re:Union (R.I.P.), and at the School of the Art Institute of Chicago. He's also a proud alumnus of the Vocalo Storytelling Workshop and member of the Association of Independents in Radio where he's been a New Voice scholar, New Voice Captain and Entrepreneurial Fellow. When not at the radio station, he's often behind the camera at public access television's Chic-A-Go-Go, “Chicago’s Dance Show for Kids of All Ages.”

Carol Angela Davis (@carolangelad), JD, is a media entrepreneur, journalist, professor and pioneer in online and mobile media with the distinction of being one of the first online video bloggers in the U.S. (August 2000). She holds an A.B. in Political Science from Bryn Mawr College, a J.D. from Case Western Reserve University School of Law, certification in Gamification from the University of Pennsylvania Wharton School, in business journalism from Arizona State University's Walter Cronkite School of Journalism and Mass Communication where she was a Reynolds Fellow and in Digital Journalism from the Dow Jones News Fund at Western Kentucky University. In 2016 she was selected to participate in the Scripps Howard Academic Leadership Academy and Hampton University awarded her its prestigious Academic Excellence Award for teaching and innovation in the design and delivery of course content. She is the creator of ¡MPACT-Ed! TM – Impactful, Measurable, Personal and Collaborative Teaching for Engaging Education, a teaching methodology to increase student engagement in all subjects including STEM. She is married, the mother of two adult children and the author of children’s books Itty Bitty Kitty, The Fool from Bonderpool (He Wouldn’t Go to School!) and Teeth Have Feelings Too! So Brush Them!

Lakeidra Chavis (@lakeidrachavis) is a reporter for KTOO Public Radio in Juneau, Alaska. Her work focuses on in-depth critical reporting while on the daily grind, covering social justice issues and institutional accountability. Prior to KTOO, Chavis interned for National Public Radio’s Morning Edition. She also spent time working in the Alaska bush, covering nearly 60 indigenous villages that can only be reached by plane. She received a B.A. in Psychology from the University of Alaska Fairbanks in May 2015.

Lisa Song (@lisalsong) is a reporter at InsideClimate News, where she covers climate change, environmental health and natural gas drilling. She is co-author of the "Dilbit Disaster" series, which won the 2013 Pulitzer Prize for National Reporting, and worked on stories for "Exxon: the Road Not Taken," which was a finalist for the 2016 Pulitzer for Public Service. Lisa has degrees in environmental science and science writing from MIT. She is also a fan of cat videos.

Marissa Gaston (@mrmarissalea) is 20 years old, and from Murfreesboro, Tennessee where she studies journalism at Middle Tennessee State University. She likes to read and write about topics and issues pertaining to women and minorities. This summer, she'll be working on launching a personal blog that explores such, as well as taking a traveling writing course. Her other interests include music, pop culture, and fashion. In her spare time, she's watching SNL reruns, listening to records, or casually working movie quotes into conversation. :)

Marquita Brown (@mbrownNR) is an education reporter who loves digging through data and is eager to integrate more interactive and digital elements into her work. She currently lives and works in Greensboro, N.C. She has also worked at newspapers in Roanoke, Va., and Jackson, Miss., her hometown. She is a University of Mississippi graduate. In her reporting, she has examined issues such as opportunity gaps facing students who are immigrants or refugees and inequities in teacher quality, school funding and student discipline. That work has earned her a number of awards and fellowships, including several from the Mississippi Press Association and a 2014 fellowship from Renaissance Journalism.

Princess Ojiaku (@artfulaction) is a science writer interested in topics of neuroscience, policy, culture, and society. In her past life as a neuroscience graduate student, she examined the neurons of growing young zebrafish and combed through over 20 years of data on the emotional and social development of people tracked from birth to young adulthood. Master's degrees in Biology and Public Policy helped to kickstart her journey with data, as she's found that knowing how to work with data and present it helps to tell a story well. She is thrilled to join the ProPublica Summer Data Institute and bring these skills to her writing.

Tesalon Felicien (@TesalonF) is a journalist from the Caribbean island of St. Lucia. He currently works as a freelancer for The Greenville News in Greenville, South Carolina, where he covers high school sports. He previously worked as a freelancer for The Advocate and served as an editorial intern at He received his bachelor’s from LSU Manship School of Mass Communication in 2013, where he was part of the first web production team for the school's student newspaper, The Daily Reveille. His work has ranged from creating interactive visuals on college student enrollment statistics to covering high school soccer.

Taylor Tiamoyo Harris (@ladytiamoyo) is a 2016 honor graduate of Howard University originally from Dallas, Texas. While at Howard, Taylor served as Editor-in-Chief of her college newspaper, The Hilltop. She also interned at the Dallas Morning News, The Washington Post, the Public Affairs Office of the U.S. Army, the Investigative Reporting Workshop at American University and various other organizations and companies while a student at Howard. Taylor is a 2016 and 2015 Knight Fellow for the Investigative Reporters and Editors (IRE) Conference, and a 2015 HBCU Digital Media Fellowship recipient for the Online News Association (ONA). She has also served on the 2016 nominating committee for the National Association of Black Journalists (NABJ), and received scholarships from NABJ and the Dallas Fort Worth Association of Black Journalists as well. Harris enjoys covering sports, education, metro news and politics, and aspires to become a data and investigative reporter.

Wendi C. Thomas (@wendi_c_thomas) is an award-winning independent journalist and a 2016 fellow at the Nieman Foundation for Journalism at Harvard University. Her work focuses on economic and racial justice. She is a regular columnist for The Memphis Flyer and a senior writing fellow with the Center for Community Change. From 2003 to 2014, she was the metro columnist and assistant managing editor at The (Memphis) Commercial Appeal. Previously she was an editor or reporter at The Charlotte Observer, The Tennessean and The Indianapolis Star.

Code of Conduct

1. Purpose

ProPublica believes the Summer Data Institute should be truly open for everyone. As such, we are committed to providing a friendly, safe and welcoming environment for all, regardless of gender, sexual orientation, disability, ethnicity or religion.

This code of conduct outlines our expectations for participant behavior as well as the consequences for unacceptable behavior.

We expect all of our instructors and students to help us create a safe and positive workshop for everyone.

2. Expected Behavior

Be considerate, respectful, and collaborative.

Refrain from demeaning, discriminatory or harassing behavior and speech.

Be mindful of your surroundings and of your fellow participants. Alert the Summer Data Institute organizers if you notice a dangerous situation or someone in distress.

3. Unacceptable Behavior

Unacceptable behaviors include: intimidating, harassing, abusive, discriminatory, derogatory or demeaning conduct by anyone participating in the Summer Data Institute.

Harassment includes: offensive verbal comments related to gender, sexual orientation, race, religion, disability; inappropriate use of nudity and/or sexual images in public spaces (including presentation slides); deliberate intimidation, stalking or following; harassing photography or recording; sustained disruption of talks or other events; inappropriate physical contact, and unwelcome sexual attention.

4. Consequences of Unacceptable Behavior

Unacceptable behavior will not be tolerated whether by instructors, students or ProPublica staff.

Anyone asked to stop unacceptable behavior is expected to comply immediately.

If someone engages in unacceptable behavior, the Summer Data Institute organizers may take any action we deem appropriate, up to and including discontinuation of any stipends and expulsion from the Institute.

5. What to Do If You Witness or Are Subject to Unacceptable Behavior

If you are subject to unacceptable behavior, notice that someone else is being subject to unacceptable behavior, or have any other concerns, please notify a Summer Data Institute organizer as soon as possible.

The Summer Data Institute organizers will be available to help participants contact building security or local law enforcement, to provide escorts, or to otherwise assist those experiencing unacceptable behavior to feel safe for the duration of the Institute.

Special thanks to the Portland Tech Workshops for creating their Code of Conduct and licensing it under Creative Commons Attribution-ShareAlike.