Cleaner Data

11899178333_3a4781efb9_h

Along with David Eads, I gave a talk on cleaning data based on my experience working with convictions and elections data at the Chicago School of Data Conference. The session detail page is here and my slides are here. You might find my speaker notes helpful. They’re in the GitHub repo for the slide deck.

Image: Mud Volcanos creative commons licensed (BY-ND) flickr photo by “Caveman Chuck” Coker


New music for back to school

Prince (the other one)

I work up this morning to the din of the first day of school. Even being out of school for a while, I feel like the fall still has this sense of beginning to me, even as the summer comes to the close.

Out of curiosity looked up the current band of a friend I’ve fallen out of touch with and was instantly pulled in by what I heard. Have you ever been haunted by a song and found yourself playing and replaying the same song a dozen times to see if the lyrics match up with the music’s powerful first impression? That’s what happened to me when I heard “How ya been feelin'”, a track from a forthcoming 7″ from Austin’s Prince. It has such a great combination of energy and sadness. The lyrics are direct, but use imagery that evokes something more than the words. The other recordings on the band’s Bandcamp page are pretty great as well.

Una Bèstia Incontrolable and Iron Lung

Last week I saw Una Bèstia Incontrolable and Iron Lung play and I enjoyed hearing two bands play interesting heavy music that’s still grounded in punk and hardcore idioms. It’s hard because seeing heavy bands live sometimes feels like the way we engage with the music is so predictable compared with the music.

I really enjoyed this Noisey interview with Iron Lung because it revealed some surprising influences that I hadn’t listened to very intently including Flipper’s “Generic Flipper” and Rudimentary Peni’s “Death Church”.

Dystopian Society

Some dear friends just moved to Florence, Italy and I was curious what was going on in local music. This death rock band is the first thing that came up when I google “Florence Italy DIY punk”.


Tools for visualizing network graphs

Tools for extracting structured data from a PDF file

These are tools that have been suggested to me to extract structured data from a PDF files:


New music for early August

Richard The Third by Richard Album and The Singles

I play in a cover band with the drummer of the Singles and he had a great one-liner describing this band which was something like “theatric power pop”. This is their new record and they’re on tour now.

Black Rainbow

They played Chicago this week and it was one of the best performances I’ve seen in a while. Direct but not boring punk music.

Sorrows and In School

Chicago’s queer punk fest, Fed Up Fest was a few weeks ago. It was really great. Good music, and a vibe that felt fun and purposeful. While I was excited to finally get to see Limp Wrist, even if they hadn’t played their unannounced set, I would have been satisfied to see some awesome bands that I had never heard of before. Two of my favorites were New York hardcore bands Sorrows and In School.


Lock straps

IMG_6987

IMG_6989

I love my Soma Porteur Rack and I love how it helps me take weight off of my body and onto my bike.

However, my small u-lock didn’t securely fit on the rack with a single bungee chord.

So, I sewed these straps out of 1 1/2″ velcro.

I’m interested in seeing if the velcro makes lock-ups more annoying and how the velcro holds up to getting wet and dirty.


The Che Cafe vs. profit-seeking models at public universities


by

The Che Cafe, a collectively run all-ages music venue in San Diego, is facing closure by university administrators. Luckily, a court has temporarily halted their eviction.

It’s a nice venue and I’ve played there with Defiance, Ohio a number of times. Long-running, all-ages, DIY spaces are important, but this this paragraph from a press release about the court order connects the cafe with larger dynamics around the financialization of public higher education playing out in so many of our communities and lives.

Arguably, the real reason for the lease termination is economic. And this is why non-students and the broader community should care and join this push to preserve the venue, even if you have never attended or heard of it before. The University administration has shifted to decisions rooted in valuing revenue-generation and profit-seeking above all else. The Che Facility does not bring in windfall profits for the University. It stands in contrast to a Starbuck’s licensed cafe, or a parking lot where each space brings in hundreds of dollars, or even to a new science building that can house researchers securing grant dollars from which the University can take a sizeable cut. The social spaces the University seems to prefer are privately operated, profit-driven and not dedicated to providing practical educational opportunities, self development and creative expression and growth that more traditional spaces like the Che Cafe affords.


Fuzzy-matching strategies

This is a list of strategies for doing quick fuzzy matches that I’m summarizing from a thread that started on June 9, 2014 on the NICAR-L mailing list.

Fuzzy Lookup Excel Add-on

This add-on created by Microsoft can be downloaded here.

It reportedly runs into trouble when trying to match ~3000 records with another ~3000 records.

Increasing the threshold from it’s default to a higher value might provide better performance.

Reconcile CSV

Reconcile CSV is a project of Open Knowledge labs that is described as

Reconcile-csv is a reconciliation service for OpenRefine running from a CSV file. It uses fuzzy matching to match entries in one dataset to entries in another dataset, helping to introduce unique IDs into the system – so they can be used to join your data painlessly.

MySQL’s Soundex() function

OpenRefine

Dan Nguyen provided this recipe for OpenRefine:

If you’re looking for non-Excel/database solutions…you can also do it by hand with OpenRefine.

  1. Combine both lists into one file with a single name column
  2. Import it into Refine
  3. Create a second column called “refined_name_key” that is a duplicate of the original name field
  4. Cluster and de-dupe using Refine’s text-clustering
  5. Export out (into something like a CSV)
  6. Import this table into your existing setup
  7. Join the name fields of the two original tables against the “refined_name_key”

Paxata

http://www.paxata.com/


Rewriting URLs for static files using PHP’s built-in webserver

I don’t particularly like coding in PHP, but I do think WordPress works well for building websites for small organizations in certain use cases. PHPs built-in webserver, which was added in recent versions of PHP helps make PHP web development feel closer to my flow using other languages and frameworks. In particular, it removes the overhead and context switch for having to configure instances of a webserver like Apache or Nginx for local development.

One feature of the the built-in webserver is that you can define a “router” script to segment out serving of static assets or to direct certain paths to a CMS’ main PHP file.

There are lots of examples of making a router script that will work for one’s particular environment. I used this one for WordPress, because it’s just what came up first in my Google search.

However, I ran into trouble when I was trying to develop locally on a multi-site WordPress instance that used path prefixes rather than subdomains to identify certain blogs. For instance, /blog-1/ would go to one blog while /blog-2/ would go to another. I needed to replicate the functionality of these Apache rewrite rules that would remove the blog prefix from the path:

RewriteRule  ^([_0-9a-zA-Z-]+/)?(wp-.*) $2 [L]
RewriteRule  ^([_0-9a-zA-Z-]+/)?(.*.php)$ $2 [L]

The first rule caused the most problems since I needed to return a static file at a path different than the one reflected in the request URL. I found the answer in this example from the built-in server docs of handling unsupported file types.

To rewrite the path of a static file, you need to:

  • Use a regex to update the path.
  • Figure out the mime type of the file and set the appropriate header.
  • Read the contents of the file and return them.

My finished router.php looks like this:

$root = $_SERVER['DOCUMENT_ROOT'];
chdir($root);
$path = '/'.ltrim(parse_url($_SERVER['REQUEST_URI'])['path'],'/');

// Do some URL rewriting
if (preg_match('/\/([_0-9a-zA-Z-]+\/)?(wp-.*)/', $path, $matches)) {
  $path = '/' . $matches[2];
  if (file_exists($root . $path) && !strpos($path, ".php")) {
    // The rewritten path is to a non-PHP file.  It's probably a static asset
    // or theme asset.  Load the file and return it.
    header("Content-Type: " . mime_content_type($path));
    return readfile($root . $path);
  }
}

if (preg_match('/\/([_0-9a-zA-Z-]+\/)?(.*\.php)$/', $path, $matches)) {
  // The path is to some PHP file.  Remove the leading blog prefix.
  // Logic below will load this PHP file.
  $path = '/' . $matches[2];
}

set_include_path(get_include_path().':'.__DIR__);
if (file_exists($root.$path)) {
  if (is_dir($root.$path) && substr($path,strlen($path) - 1, 1) !== '/')
    $path = rtrim($path,'/').'/index.php';
  if (strpos($path,'.php') === false)
    return false;
  else {
    chdir(dirname($root.$path));
    require_once $root.$path;
  }
} else include_once 'index.php';


My first Divvy ride

I was riding to a party at a friends house when I heard a clattering below me. I looked down and realized that three spokes had broken, likely casualties of a winter of moisture and corrosive salts and a springtime of crater-like potholes.

I needed to get around the West Side to run a few errands and get a new wheel, so I decided to try Chicago’s Divvy bike share service as there were a number of stations that popped up around my neighborhood.

I was originally skeptical about the service because the price tag ($7/24 hours, $75/year) seemed a bit steep compared with the price of the service that I had used in Milan over the summer. But, for the utility it gave me, $7 felt pretty fair. I think the price is still steep for avid cyclists who might be visiting the city for more than a few days and wish there was a weekly option available. Similarly, the yearly pass is a good deal, but you have to wait for the service to mail you a fob. It would be nice to be able to buy a yearly membership and use it right away.

The bike felt heavy and slow compared to my usual ride, but the thick tires and upright riding position felt good and easy to navigate the craggy streets. Even though I’m short, the seat heights were set extremely low on many of the bikes at the station (if not stuck, they’re adjustable with a quick-release skewer) and I wonder how many Divvy users are losing a lot of efficiency by not adjusting the seatpost. The tires were pretty well inflated and the disk brakes worked well. It struck me that for many riders, the Divvy bikes likely offer a more comfortable, fun and safe riding experience than the poorly constructed or badly maintained bike in their garage. In some ways, the yearly membership is a good option for someone who doesn’t want to worry about purchasing, maintaining and securing a decent bike. Hopefully, the service will do a good job of maintaining the bikes.

I took a quick look at the system map and found that there were stations really close to where I needed to go. The stations are pretty visible, so I had some sense of where they were from my usual routes around the city. My favorite thing about using the Divvy bike was the way it changed my usual patterns. Since I couldn’t go directly to my destination, I was forced to walk down blocks I wouldn’t usually visit. While the primary goal of a transit system should be to get people where they need to go efficiently, I like public transportation because it’s another way to experience the city. I appreciated the interruption to my routine that having to find and dock the bikes at the stations offered.

While I enjoyed my experience with the service, I am concerned about how the service and its expansion exists within broader development dynamics in Chicago. It’s a highly visible asset that makes the city more livable to some, while big, tough problems like public education, access to affordable mental health care and residential segregation continue to persist. Meshing transportation is a need for many city residents, and Divvy could be a platform, just like other transit systems, that brings together Chicagoans from different parts of the city and different cultural and economic backgrounds on relatively equal terms. Unfortunately, there are a number of factors that mediate Divvy as a node in a social mesh for Chicago. Having nearby stations, having a credit card, having Internet access to find information or sign up, having $75 to spend all at once, having the physical ability or experience and comfort to ride in city traffic. If there are efforts to address some of these things, I’d love to know more. While these shortcomings aren’t a reason to completely hate on the service, I can’t think of the service without thinking about city priorities and who they privilege.