Dealing with Drupal comment spam

I got word that the Pages to Prisoners website was drowning in comment spam.  I locked down permissions, installed the AntiSpam and reCAPTCHA modules and wrote up some quick docs about my approach to solve it before stumbling on this really helpful how-to post, How to: Take control of your Drupal site back from the comment spammers, which outlines most of the steps that I took.

In the final step of this how-to, the author recommends bulk deleting spam comments in SQL.  The example SQL query in the how-to is pretty intense, deleting all comments.  I found that by looking at the comments, I could identify frequently used phrases in the subject that clearly had nothing to do with content on the site.  I could then compose a SQL query that would blow away these comments in a way that felt safer than deleting everything.  This is the query that I used which took me about 15 minutes to blow away over 3000 comments:

DELETE FROM `comments` WHERE `subject` LIKE "%boot%" OR `subject` LIKE "%jewelry%" OR
`subject` LIKE "%ugg%" OR `subject` LIKE "%watches%"  OR `subject` LIKE "%handbag%" OR 
`subject` LIKE "%cartier%"  OR `subject` LIKE "%jordan%" OR `subject` LIKE "%vuitton%" OR 
`subject` LIKE "%replica%" OR `subject` LIKE "%Louboutin%" OR `subject` LIKE "%MOD%"  OR 
`subject` LIKE "%OEM%" OR `subject` LIKE "%viagra%" OR `subject` LIKE "%coach%" OR 
`subject` LIKE "%chanel%"  OR `subject` LIKE "%tiffany%"  OR `subject` LIKE "%armani%"  OR 
`subject` LIKE "%windows 7%"  OR `subject` LIKE "%bvlgari%" OR `subject` LIKE "%chanel%" OR 
`subject` LIKE  "%The whirlwinds Mens%"  OR `subject` LIKE "%footwear%"  OR 
`subject` LIKE "%gucci%" OR `subject` LIKE "%BALENCIAGA%" OR `subject` LIKE "%jersey%" OR
 `subject` LIKE "%shoe%" OR `subject` LIKE "%jacket%" OR `subject` LIKE "%vibram%" OR 
`subject` LIKE "%mobile phone%" OR `subject` LIKE "%cell phone%" OR `subject` LIKE "%burberry%" OR
`subject` LIKE "%blu ray%" OR `subject` LIKE "%blu-ray%" OR `subject` LIKE "%phone%" OR 
`subject` LIKE "%nike%" OR `subject` LIKE "%abercrombie%" OR `subject` LIKE "%dress%" OR 
`subject` LIKE "%charm%" OR `subject` LIKE "%jewellery%"  OR `subject` LIKE "%london%" OR 
`subject` LIKE "%www.%"  OR `subject` LIKE "%video game%" OR `subject` LIKE "%bag%" OR 
`subject` LIKE "%wedding%" OR `subject` LIKE "%rosetta stone%" 

low-rent gobo projection

I’ve always wanted to know how to project “stencils” on walls or sidewalks, as a lot of clubs or businesses have started to do.  After a lot of knowledge from the Chicago New Media list, I found out that these projectors are called gobo projectors.  I also found out that IKEA sold an inexpensive version of such a projector called Isbrytare.  Jim Dennewill has a good rundown of the projector.

My impression is that IKEA no longer makes this product.  I’m wondering if there are any other ~$40 projectors that are currently in production.

Photo by Jim via Flickr.

Bookmarklet to generate Flickr image attribution text and link

This is a jQuery bookmarklet to extract an attribution string and link from a Flickr photo page.

To use the bookmarklet bookmark this link or drag it to your browser’s bookmarks bar: Flickr Attribution

The code is available at github.

At the time that I wrote this bookmarklet, I was using the the Monochrome Author theme (similar to the Monochrome Pro theme) by Graph Paper Press. It requires that you have an image associated with each post, so I frequently grab Creative Commons licensed photos for posts where I didn’t take a photo. I got tired of building the photo attribution string and link back to the photo by hand, so I made the bookmarklet to generate it with one click.

This is my first attempt at writing a bookmarklet and using jQuery.

I make use of the very helpful jQuery Bookmarklet by Brett Barros with modifications by Paul Irish as well as the zeroclipboard library for copying the text to the system clipboard.

There are probably some bugs with this code as well as lots of room for improvement. In particular, it would be nice to have the z-index of the bar displayed by the widget set so it covers all the FLickr page elements, but I couldn’t set a high z-index without messing up the zeroclipboard functionality.

Finding duplicate records in a books to prisoners database application

High on my list of neglected tech. projects is the Testament books to prisoners database web application.  This is the database program that projects like the Midwest Pages to Prisoners Project use to track packages sent and returned and books requested in the hopes avoiding delays in delivering books to incarcerated people and to provide metrics that grant providers like.

One of the design challenges has to do with duplicate records.  Recipients of books are identified by their state/federal department of correction (DOC) number (if they’re in a state or federal prison – most jails don’t use ID numbers), their state of incarceration and their name.  I assume that the database was designed originally to minimize barriers for the book project volunteers so both the name and DOC# are free text fields.  Javascript is used to match existing records based on the DOC#, but there is still a large possibility for duplicate records.

The reason for duplicate records is that both the person writing to request books and the volunteer may list their name and/or DOC# inconsistently.  For instance, the state may store the DOC# in their database as A-123456 but the incarcerated person may write it as A123456 A-123-456 or just 123456.  Volunteers who don’t know about this and aren’t careful may not check beforehand for an existing record.

This is probably preventable through more sophisticated validation, but we still need a way to find duplicates in the existing records.  As this application is written in the Django framework, I want to try to use the Django API to find matches.

At first thought, it seems like I will have to iterate through each inmate record and check if there is a duplicate record.  This seems pretty slow, but I can’t think of a better way to do this.  At this point, there aren’t so many records that this approach will fail, but it would be nice to do something slicker.

The other problem is how to match a duplicate.  One approach might be to build a regexp for the DOC# (for instance, match either the first character or omit it, allow dashes or spaces between all characters, …) and then use the iregexp field lookup to try to find matches. One challenge with this is that the current Testament codebase is using Django 0.97 (I think) and iregexp is only available starting in 1.0.  Maybe it’s time we updated our code anyway.

There is also the Python difflib module that can compute deltas between strings.  However, it seems like this would slow things down even further because you would have to load each inmate object and then use difflib to compare the DOC#s.  I assume that the previous approach would be faster because the regexp matching happens at the database level.

String mangling in Excel/VBA

Often, I find myself needing to do string mangling in spreadsheets. For instance, in an Excel dump of an MS Access database, the names of institutions were sometimes written as “Foo, University of” or “Foo U”. I wanted these to be formatted as “University of Foo” and “Foo University” respectively. I may be misinformed but there doesn’t seem to be regexp style substitution in VBA. I could do this directly in the spreadsheet, but conditional use of functions gets to be hard to understand and difficult to edit. So, I wrote the following custom function in VBA.

Function FormatInstitutionName(institutionName As Variant) As String
'Formats an institution name for import into CiviCRM:
'"Foo, University of" => "University of Foo"
'"Foo U" => "Foo University"
    Dim newInstitutionName As String

    newInstitutionName = institutionName

    If InStr(newInstitutionName, ", University of") Then
        newInstitutionName = "University of " & Replace(newInstitutionName, ", University of", "")
    End If

    If InStr(Len(newInstitutionName) - 1, newInstitutionName, "U") Then
        newInstitutionName = Left(newInstitutionName, Len(newInstitutionName) - 2) & " University"
    End If

    FormatInstitutionName = newInstitutionName
End Function

Add this to your workbook by following the instructions at http://www.vertex42.com/ExcelArticles/user-defined-functions.html

Then call it in your spreadsheet cells like like:

=FormatInstitutionName(C2)

WordPress Mu spammed, Chickenfoot to the rescue

I found out last night that I had forgotten to turn off new blog registration on my WordPress Mu instance and that over 500 spammers had created new blogs on my site.  The admin interface allows you to bulk delete blogs but requires that you check the checkbox next to each blog to select it for deletion.  This was getting pretty tedious.  Though I’m sure I could have delved into the internals of WordPress and figured out how to delete these blogs at the database level, this was a little scary and I didn’t really want to spend the time to do the research necessary to feel confident about this method.

So, I installed the Chickenfoot Firefox extension and wrote this simple script that would check every checkbox on the administration page.  This was made easy because of the fact that the only checkboxes on this particular page were ones associated with blogs that I wanted to delete.  This reduced the number of clicks to delete blogs significantly and it only took me a few minutes of manual clicking to delete the hundreds of spam blogs.

for (chk = find(new XPath("//input[@type='checkbox']")); chk.hasMatch; chk = chk.next) {
  check(chk);
}

Photo by loveï½¥janine via Flickr.

Mailman Subscription Form with Drupal’s Webform

I’m trying to use Drupal’s webform module to make a subscription form to a Mailman mailing list.  Tracking thoughts and problems here.

I like webform because it stores form responses which might be useful.

Webform also lets you send e-mail on form submission and specify the sender/subject.  I was going to use this to send an e-mail from the address that user enters in the form to the -request address of the mailman list.

The problem is that the module lets you set the From: header but not the Sender: header and mailman uses the Sender: header to detect the requestor.

Options

  • Find a way to use tokens to set the subject to be ‘subscribe address=
  • Try to send my own e-mail with PHP code specified in the Additional Processing textarea of the Webform advanced settings fieldset when editing my form.

Setting up Embarq DSL with Wifi using Microsoft MN-700 wireless access point

My parents and brother use Embarq DSL and they just got a used Wireless router. Like most folks, they either had the DSL set up for them, or did the initial configuration without understanding or remembering what they did, so I’m writing this as a reference for them, for other family tech support people working with the same hardware or services, or as a general framework for how I figure these things out.

The DSL Service

The first thing I did was to try to figure out how their DSL connection worked. In most cases I’ve seen, there is either a DSL modem and the computer handles the PPPoE connection or the modem is also a router.

My parents’ systems are running Windows XP, so I went to Start->Connect To->Show All Connections. There were only the connections under the heading LAN or High Speed Internet. If the computer was doing PPPoE, it would be listed under the heading Broadband. When I double-clicked on the Local Area Networ€k connection icon and the Support tab, I saw that the computer was getting an IP address of 192.168.2.2. So, it looks like their DSL modem is also doing routing. The much easier way to tell this would have just been to flip over their modem and see that it was labeled ADSL router, although I’m guessing that most ADSL modems/routers can disable the routing, so maybe it’s a good to check and see how the computer is actually connecting.

The ADSL hardware

My family is using an Embarq EQ-66OR ADSL Router

The Wifi hardware

My brother acquired a Microsoft MN-700 Wireless Base Station. Since they got the hardware second hand, it didn’t come with any documentation. Luckily, I found it here: MN-700 Base Station Configuration Guide

Determining hardware capabilities

Since the DSL modem is doing the routing too, I think the easiest thing to do would be to configure the wireless base station to just act as a base station and not do any routing. Then I don’t have to figure out how to configure the modem or what my family’s login credentials are for the DSL service.

I looked at the table of contents for the configuration guide and saw that there were sections for Local Area Connection, Wide Area Connection, and Wireless Mode. What I want to do is to configure the access point in what is often called bridge mode. I started looking at the wireless mode section and realized that it was just choosing between 802.11b and 802.11g. I looked at the Wide Area Connection and Local Area Connection section and it seemed like you had to set a DHCP range for the local network. This wasn’t looking good. I searched for the word bridge and nothing came up. I quickly scanned through the documentation and found the section Base Station Mode. This is what I wanted! This illustrates the point that its always good to (quickly) Read The Friendly Manual.

Planning the network

Now that I know that the hardware can do what I want, and I know which hardware will do what, I can plan the network.

The ADSL modem (EQ-66OR) will do the PPPoE connection and the routing.

The wireless base station (MN-700) will just pass through wifi connections to the router.

The modem and the router will be connected using a standard ethernet cable going from the yellow port labeled Ethernet on the back of the DSL modem to the port labeled To Modem on the side of the wireless base station. Computers will either connect to the network wirelessly or through an Ethernet cable connected to the ports labeled 1-4 on the wireless base station.

So, for the basic connection, I’ll need:

  • The wireless base station
  • The ADSL modem/router
  • An Ethernet cable
  • Space in a power strip for the wall warts (AC adapters) for both the DSL modem and the wireless access point

Configuring the hardware

I should, hopefully, be able to leave the DSL modem configured the way it is, so all I’ll have to configure is the wireless access point.

I connected the DSL router from the computer and connected it to the wireless access point as I specified above.  I powered on the wireless access point and saw that a wireless network connection MSHOME was available.  I connected to this network.  From the docs, I found that I need to connect to the address http://192.168.2.1 in my browser in order to configure the access point.  This didn’t work, so I went back to Start->Connect To->Show all connections and double clicked on the Wirleless Area Connection and the Support tab to see that the computer was getting an IP address of 192.168.0.2 with a gateway of 192.168.0.4, so I typed http://192.168.0.1 in my browser.  I guess I should change my earlier rule to Quickly Read The Friendly Manual But Be Flexible (QRTFMBBF)  This still didn’t get me to an adminstration page. Sometimes access to the administrative pages is disabled for a wireless connection, so I disconnected from the wireless network and connected the computer to the access point with another ethernet cable.  I find that when doing these kinds of network installs its always good to have an extra network cable or two at hand.

When I connected to the access point with the wired network, the computer got an address of 192.168.2.4 and I was able to access the administration pages with my browser at http://192.168.2.1.

The administration page asked me for a password.  My brother said that he thought he had reset the base station back to its factory settings, so I tried the default password admin, which worked.

Set a new administration password

For security, I went to General settings->Change password and asked my mom for a password that she would remember.  I also made a secure note of this for future reference.

Put the access point in Base Station Mode

I set this by following the links to Security->Base Station Mode and chose the Access Point Mode radio button.  I had to select an access point name and chose tenhopedrive, my parents address, because I like giving wifi nodes some geographic significance.  I would use this name in my browser http://tenhopedrive to connect to the admin pages.  This is exactly what I had to do, because after changing the above setting, it rebooted the access point, logging me out of the admin pages.

Set access point SSID

I did this by following the Wireless Settings link and typed tenhopedrive (for consistency) into the Wireless network name (SSID) field and clicked the apply button.  Once again this caused the access point to reboot, and I had to reconnect to the wireless network under the new SSID.  I had to refresh the list of available wireless networks a few times in order to see the new name.

Testing the network

There were a few things that I wanted to make sure worked right, because I wasn’t sure how they would work.

Wired Network

Putting the access point into access point mode, I wasn’t sure how this would affect the wired network.  I made sure the computer was still connected to the access point by the Ethernet cable, disconnected from all wireless networks, and made sure I could access google.com in my browser.

Wireless Network Connection

I then disconnected the Ethernet connection and connected to the wireless connection and made sure that I could still connect to google.com in my browser.

Multiple Computers

The last thing I wanted to test, which is probably something I should have investigated initially, is whether or not multiple computers could connect through the DSL router. I wasn’t sure if they had set up some kind of MAC address filtering or something to only allow my mom’s laptop to connect.  I fired up my laptop and connected to the wireless network I just created to see if I could connect to google.com in my browser.  This worked just fine.

Afterthought: Port Fowarding

You know what they say about the best laid plans of mice and men.  I forgot that my brother might want to use port forwarding for P2P filesharing or chat apps.  With the way I configured it, I would have to configure this on the ADSL modem/router.  So, I tried to access this by pointing my browser at the gateway IP reported by Start->Connect To->Show All Connections->Wireless Area Connection->Support.  I found that I got an admin page login, and that the password was still set to the default, 1234.  This was bad since I wanted to have the wireless network be open (to make it easier for my family to not have to remember and additional password or get locked out of their wifi if their computers forgot the stored password and for good-neighborness).  So, I reset the router admin password to the same admin password as the access point and made a note of this.

I confirmed that I could set up port forwarding on the router and that they also had dyndns settings available, something that wasn’t available, as far as I could tell on the access point, so it looks like I made the right choice in using the ADSL modem/router for handling routing.  I went ahead and set this up too, but that’s beyond the scope of this howto.