Cleaner Data


Along with David Eads, I gave a talk on cleaning data based on my experience working with convictions and elections data at the Chicago School of Data Conference. The session detail page is here and my slides are here. You might find my speaker notes helpful. They’re in the GitHub repo for the slide deck.

Image: Mud Volcanos creative commons licensed (BY-ND) flickr photo by “Caveman Chuck” Coker

“A/B testing” the impact of school closures using crime data

This weekend, the Knight Lab is sponsoring a hack day focusing on Chicago Crime Data as made available by the Tribune’s crime data portal and API.

I’m a little wary of crime data first because crime data does not equal a resident’s experience of safety.  It’s easy to think of situations where crimes go unreported, or where increased community cohesiveness might lead to an increase in crime reports.  Second, the way crime stats are framed and parsed by Chicago residents often seems to be alarmist and often further stresses racial and economic tensions in gentrifying communities rather than offering a space for increased community collaboration or developing progressive solutions to neighborhood safety.

Are there uses of crime data that contribute to a different civic discourse? One idea that came to me is based on this current moment where Chicago Public Schools (CPS) is slated to close a number of schools.  One issue raised by critics is the safety of students who may have to cross gang boundaries to reach their new “welcoming school”.  CPS’ proposal to provide resources for students who must attend a new school after their school has been closed, includes an expansion of the “Safe Passage” program which partners with neighborhood organizations to help increase safety for students on the way to and from school. From my knowledge as a caregiver of CPS students and as a frequent news consumer, I don’t have much sense of how successful this program has been so far.  After the closures happen, how will CPS and city residents know how school closures affected students on their way to and from school?

I hypothesize that we might be able to use crime data as one way to see changes in communities after schools have been closed.  I also think this is a general case of “how does crime change along with some policy event”.  I imagine a web platform where residents can define an “experiment” by looking at a specific geography, types of crime and time period.  Crime data would then be compared before and after the test time period to see how crime changed.

In general, I think it’s important to frame these experiments as “what changed” instead of “did this work” because I think the crime data set probably isn’t enough on its own to determine


  • What kind of crimes would be indicators of school commute safety? Or, should we look at crimes from specific time periods before and after school?
  • What methods do sociologists use to do these kinds of comparisons?
  • Which schools/communities currently participate in the “Safe Passage” program

Other use cases:

  • Neighborhood cleanups
  • Proposed city legislation targeting liquor stores
  • “Positive loitering”
  • Negative outcomes for heightened targeting of youth by police

Crowdsourced usage help and observations for data visualizations

This is my pitch for the Media Ideation Fellowship.

Project Description

This project would provide a platform for user-contributed usage instructions (“move the slider to the right to see the data for different years” or observations (“wow, DC has so many charter schools”) for web and print-based data visualizations. The project would help make data visualizations and their insights more accessible and provide a body of feedback for developers and journalists to create more usable visualizations. Through a public API for web applications and QR codes and short URLs for print, journalists and developers can integrate the platform into their visualizations.

What problem or issue are you trying to address with this project?

We live in a culture which increasingly fetishizes policy decisions that are “data-driven”.  From the future of publicly-funded education in Chicago to the disconnection of residents from city infrastructure in Detroit, data fills a prominent role in the discourse around issues that profoundly impact our lives. Certainly, data has always driven decision-making by policy makers and evaluation of proposals by the public, and it’s an important part of civic process. The danger is for data to take on a magical quality instead of being framed as a tool that can be used, and abused, in the service of civic problem solving. If publics are to leverage data, we must be empowered consumers of this information.


While researching information about the Chicago teachers strike, I came upon this data visualization about the growth in charter schools, I came upon this visualization:

While a careful reading of the instructions could have told me that this was a map of the US and that moving the slider shows the change in the number of charter schools over time, I just wanted to dive in and was confused. Having more human instructions that say “Hey, this is a map of the US” or “move the slider” and observations like “look how charter schools grew in D.C.  That documentary Waiting for Superman talked a lot about that” seemed like something that should exist.