Tools for visualizing network graphs

Tools for extracting structured data from a PDF file

These are tools that have been suggested to me to extract structured data from a PDF files:


New music for early August

Richard The Third by Richard Album and The Singles

I play in a cover band with the drummer of the Singles and he had a great one-liner describing this band which was something like “theatric power pop”. This is their new record and they’re on tour now.

Black Rainbow

They played Chicago this week and it was one of the best performances I’ve seen in a while. Direct but not boring punk music.

Sorrows and In School

Chicago’s queer punk fest, Fed Up Fest was a few weeks ago. It was really great. Good music, and a vibe that felt fun and purposeful. While I was excited to finally get to see Limp Wrist, even if they hadn’t played their unannounced set, I would have been satisfied to see some awesome bands that I had never heard of before. Two of my favorites were New York hardcore bands Sorrows and In School.


Lock straps

IMG_6987

IMG_6989

I love my Soma Porteur Rack and I love how it helps me take weight off of my body and onto my bike.

However, my small u-lock didn’t securely fit on the rack with a single bungee chord.

So, I sewed these straps out of 1 1/2″ velcro.

I’m interested in seeing if the velcro makes lock-ups more annoying and how the velcro holds up to getting wet and dirty.


The Che Cafe vs. profit-seeking models at public universities


by

The Che Cafe, a collectively run all-ages music venue in San Diego, is facing closure by university administrators. Luckily, a court has temporarily halted their eviction.

It’s a nice venue and I’ve played there with Defiance, Ohio a number of times. Long-running, all-ages, DIY spaces are important, but this this paragraph from a press release about the court order connects the cafe with larger dynamics around the financialization of public higher education playing out in so many of our communities and lives.

Arguably, the real reason for the lease termination is economic. And this is why non-students and the broader community should care and join this push to preserve the venue, even if you have never attended or heard of it before. The University administration has shifted to decisions rooted in valuing revenue-generation and profit-seeking above all else. The Che Facility does not bring in windfall profits for the University. It stands in contrast to a Starbuck’s licensed cafe, or a parking lot where each space brings in hundreds of dollars, or even to a new science building that can house researchers securing grant dollars from which the University can take a sizeable cut. The social spaces the University seems to prefer are privately operated, profit-driven and not dedicated to providing practical educational opportunities, self development and creative expression and growth that more traditional spaces like the Che Cafe affords.


Fuzzy-matching strategies

This is a list of strategies for doing quick fuzzy matches that I’m summarizing from a thread that started on June 9, 2014 on the NICAR-L mailing list.

Fuzzy Lookup Excel Add-on

This add-on created by Microsoft can be downloaded here.

It reportedly runs into trouble when trying to match ~3000 records with another ~3000 records.

Increasing the threshold from it’s default to a higher value might provide better performance.

Reconcile CSV

Reconcile CSV is a project of Open Knowledge labs that is described as

Reconcile-csv is a reconciliation service for OpenRefine running from a CSV file. It uses fuzzy matching to match entries in one dataset to entries in another dataset, helping to introduce unique IDs into the system – so they can be used to join your data painlessly.

MySQL’s Soundex() function

OpenRefine

Dan Nguyen provided this recipe for OpenRefine:

If you’re looking for non-Excel/database solutions…you can also do it by hand with OpenRefine.

  1. Combine both lists into one file with a single name column
  2. Import it into Refine
  3. Create a second column called “refined_name_key” that is a duplicate of the original name field
  4. Cluster and de-dupe using Refine’s text-clustering
  5. Export out (into something like a CSV)
  6. Import this table into your existing setup
  7. Join the name fields of the two original tables against the “refined_name_key”

Paxata

http://www.paxata.com/


Rewriting URLs for static files using PHP’s built-in webserver

I don’t particularly like coding in PHP, but I do think WordPress works well for building websites for small organizations in certain use cases. PHPs built-in webserver, which was added in recent versions of PHP helps make PHP web development feel closer to my flow using other languages and frameworks. In particular, it removes the overhead and context switch for having to configure instances of a webserver like Apache or Nginx for local development.

One feature of the the built-in webserver is that you can define a “router” script to segment out serving of static assets or to direct certain paths to a CMS’ main PHP file.

There are lots of examples of making a router script that will work for one’s particular environment. I used this one for WordPress, because it’s just what came up first in my Google search.

However, I ran into trouble when I was trying to develop locally on a multi-site WordPress instance that used path prefixes rather than subdomains to identify certain blogs. For instance, /blog-1/ would go to one blog while /blog-2/ would go to another. I needed to replicate the functionality of these Apache rewrite rules that would remove the blog prefix from the path:

RewriteRule  ^([_0-9a-zA-Z-]+/)?(wp-.*) $2 [L]
RewriteRule  ^([_0-9a-zA-Z-]+/)?(.*.php)$ $2 [L]

The first rule caused the most problems since I needed to return a static file at a path different than the one reflected in the request URL. I found the answer in this example from the built-in server docs of handling unsupported file types.

To rewrite the path of a static file, you need to:

  • Use a regex to update the path.
  • Figure out the mime type of the file and set the appropriate header.
  • Read the contents of the file and return them.

My finished router.php looks like this:

$root = $_SERVER['DOCUMENT_ROOT'];
chdir($root);
$path = '/'.ltrim(parse_url($_SERVER['REQUEST_URI'])['path'],'/');

// Do some URL rewriting
if (preg_match('/\/([_0-9a-zA-Z-]+\/)?(wp-.*)/', $path, $matches)) {
  $path = '/' . $matches[2];
  if (file_exists($root . $path) && !strpos($path, ".php")) {
    // The rewritten path is to a non-PHP file.  It's probably a static asset
    // or theme asset.  Load the file and return it.
    header("Content-Type: " . mime_content_type($path));
    return readfile($root . $path);
  }
}

if (preg_match('/\/([_0-9a-zA-Z-]+\/)?(.*\.php)$/', $path, $matches)) {
  // The path is to some PHP file.  Remove the leading blog prefix.
  // Logic below will load this PHP file.
  $path = '/' . $matches[2];
}

set_include_path(get_include_path().':'.__DIR__);
if (file_exists($root.$path)) {
  if (is_dir($root.$path) && substr($path,strlen($path) - 1, 1) !== '/')
    $path = rtrim($path,'/').'/index.php';
  if (strpos($path,'.php') === false)
    return false;
  else {
    chdir(dirname($root.$path));
    require_once $root.$path;
  }
} else include_once 'index.php';


My first Divvy ride

I was riding to a party at a friends house when I heard a clattering below me. I looked down and realized that three spokes had broken, likely casualties of a winter of moisture and corrosive salts and a springtime of crater-like potholes.

I needed to get around the West Side to run a few errands and get a new wheel, so I decided to try Chicago’s Divvy bike share service as there were a number of stations that popped up around my neighborhood.

I was originally skeptical about the service because the price tag ($7/24 hours, $75/year) seemed a bit steep compared with the price of the service that I had used in Milan over the summer. But, for the utility it gave me, $7 felt pretty fair. I think the price is still steep for avid cyclists who might be visiting the city for more than a few days and wish there was a weekly option available. Similarly, the yearly pass is a good deal, but you have to wait for the service to mail you a fob. It would be nice to be able to buy a yearly membership and use it right away.

The bike felt heavy and slow compared to my usual ride, but the thick tires and upright riding position felt good and easy to navigate the craggy streets. Even though I’m short, the seat heights were set extremely low on many of the bikes at the station (if not stuck, they’re adjustable with a quick-release skewer) and I wonder how many Divvy users are losing a lot of efficiency by not adjusting the seatpost. The tires were pretty well inflated and the disk brakes worked well. It struck me that for many riders, the Divvy bikes likely offer a more comfortable, fun and safe riding experience than the poorly constructed or badly maintained bike in their garage. In some ways, the yearly membership is a good option for someone who doesn’t want to worry about purchasing, maintaining and securing a decent bike. Hopefully, the service will do a good job of maintaining the bikes.

I took a quick look at the system map and found that there were stations really close to where I needed to go. The stations are pretty visible, so I had some sense of where they were from my usual routes around the city. My favorite thing about using the Divvy bike was the way it changed my usual patterns. Since I couldn’t go directly to my destination, I was forced to walk down blocks I wouldn’t usually visit. While the primary goal of a transit system should be to get people where they need to go efficiently, I like public transportation because it’s another way to experience the city. I appreciated the interruption to my routine that having to find and dock the bikes at the stations offered.

While I enjoyed my experience with the service, I am concerned about how the service and its expansion exists within broader development dynamics in Chicago. It’s a highly visible asset that makes the city more livable to some, while big, tough problems like public education, access to affordable mental health care and residential segregation continue to persist. Meshing transportation is a need for many city residents, and Divvy could be a platform, just like other transit systems, that brings together Chicagoans from different parts of the city and different cultural and economic backgrounds on relatively equal terms. Unfortunately, there are a number of factors that mediate Divvy as a node in a social mesh for Chicago. Having nearby stations, having a credit card, having Internet access to find information or sign up, having $75 to spend all at once, having the physical ability or experience and comfort to ride in city traffic. If there are efforts to address some of these things, I’d love to know more. While these shortcomings aren’t a reason to completely hate on the service, I can’t think of the service without thinking about city priorities and who they privilege.


Notes and connections from reading “It’s Complicated”

Here are a few quotations that stood out to me as I was reading danah boyd’s recent work on youth and social media, It’s Complicated.

Since finishing the book, it’s been a useful frame for thinking about media stories about the Internet. While boyd’s focus and research practice was around youth use of social media, the book is a useful guide for navigating broader cultural narratives around media and technology. It feels as though youth have always been the focus of cultural aspirations and fears, so it makes sense that the way they use emerging media and the way they are framed with it, speaks to cultural attitudes and practices about the media at large.

Chapter 6: Inequality

Perhaps Robert Moses did not intentionally design the roadways to segregate Long Island racially and socioeconomically, but his decision to build low overpasses resulted in segregation nonetheless.  In other words, the combination of regulation and design produced a biased outcome regardless of the planner’s intention.

The first time I read about the social impact of Moses’ plans was in Jane Jacobs’ The Death and Life of Great American Cities

This idea of paying attention to the outcome rather than the intention reminds me a lot of an awesome session I went to at the 2014 NICAR conference where Nikole Hannah-Jones talked about her reporting on housing segregation. It’s really amazing stuff. She contributed to the Propublica series Living Apart and a This American Life segment.

Chapter 7: Literacy

In a networked world, in which fewer intermediaries control the flow of information and more information is flowing, the ability to critically question information or media narratives is increasingly important. Censorship of inaccurate or problematic content does not provide youth the skills they will one day need to evaluate information independently.

I thought this was an interesting breakdown of how a bug in MySpace’s code provided an opportunity for youth to learn HTML/CSS/JavaScript with varying levels of sophistication. I’d hazard to guess that the goal of a custom profile compelled youth who would never choose to take or have access to a computer science class at their schols, to investigate these topics.

Excited by the ability to create “layouts” and “backgrounds,” teens started learning enough code to modify their profiles. Some teens became quite sophisticated technically as they sought to build extensive, creative profiles. Others simply copied and pasted code that they found online. But this technical glitch–combined with teens’ passion for personalizing their MySpace profiles–ended up creating an opportunity for teens to develop some technical competency.

Wikipedia often, but not always, forces resolution of conflicting accounts. Critics may deride Wikipedia as a crowdsourced, user-generated collection of information of dubious origin and accuracy, but the service also provides a platofrm for seeing how knowledge evolves and is contested.

How we picture the issue of digital inequality also has political implications. As communication schilar Dmitry Epstein and his coauthors argue, when society frames the digital divide as a problem of access, we see government and industry as the responsible party for addressing the issue. When society understand the digital divide as a skills issue, we place the onus of learning how to manage on individuals and families. … The burden of responsibility shifts depending on how we construct the problem rhetorically and socially. The language we use matters.

Chapter 8: Searching for a public of their own

Boyd, who apparently came of age in the same region of the United States where I grew up, often uses the shopping mall to compare and contrast digital publics inhabited by youth. Early in the book, boyd writes:

I also take for granted, and rarely seek to challenge, the capitalist logic that
underpins American society and the development of social media. Although I believe that these assumptions should be critiqued, this is outside the scope of this project. **

The way in which markets mediate teen publics is huge. My first impulse was to lament the way in which public spaces for youth are now constructed for them in ways that are ultimately exploitative. Then I remembered how many of the publics I inhabited as a teenager were fundamentally commercial spaces: grocery store parking lots, record stores, clothing boutiques, basements of student housing at expensive private colleges. Ultimately, I think paying attention to the way that youth subvert the intended uses of public space is more interesting and informative than only focusing on the way in which capital mediates spaces for youth.

** On the other hand, boyd made a PDF of the book available, explaining:

My desire to be widely read is why I wanted to make the book freely available from the getgo. I get that not everyone can afford to buy the book. I get that it’s not available in certain countries. I get that people want to check it out first. I get that we haven’t figured out how to implement ‘grep’ in physical books. So I really truly get the importance of making the book accessible.

Re: a youth carefully curating his Facebook posts but feeling like Twitter is a more intimate media.

Manu’s practice contradicts the assumptions then held by adults, who often saw Facebook as a more intimate site than Twitter because of each site’s technical affordances and defaults.

What makes a particular site or service more or less public is not necessarily about the design of the system but rather how it is situated within the broader social ecosystem. … In this way, the technical architecture of the system matters less than how users understand their relationship to it and how the public perceives any particular site.

I find that a lot of times, as someone who saw the evolution of much of the Internet and feels pretty comfortable using it’s tools, I end up using the Internet and emerging social technologies in a way that’s not very pragmatic, and as a result not very emergent.

This passage of the book reminded me of this anecdote from Cory Doctorow:

My old Informationweek editor, Mitch Wagner, once discovered some young girls holding a gossipy chat in the comments section of an old blog post of his; when he asked them what they were doing there, they told him that their school blocked all social media, so every day they picked a random blog-post somewhere on the Internet and used it as a discussion board for the day.

Other notes

In one chapter, boyd tells the story of college admission officials being shocked when an academically gifted high school student whose admission essays talked about trying to escape the gangs and violence of his neighborhood, had social media profiles, which were Googled by the admissions officers, containing references to gangs and gang affiliations. boyd argues that this isn’t contradictory, or disingenuous, it’s an example of youth deftly using different channels for different audiences for survival and mobility. Voicing gang affiliations could be necessary for survival within the student’s community and peer group, the audience of the social media profiles. Their aspirations for the future would be problematic or misunderstood by their social media viewers, but really important for officials at the college they wish to attend. Problems arise when information reaches across the intended audiences that people balancing complicated identities imagine.

I thought of this anecdote and analysis immediately when a friend shared writing about the Eagles dumping player DeSean Jackson over alleged connections to gangs.

Further reading


New music


C.R.E.A.M.

This site is hosted by 1 & 1. I started with them because they offered 3 years of free hosting and I've had no complaints sin ce. Use the link below if you're looking for hosting.