Again with the checkcards

Sometimes it seems like all I do is rejigger my checkcards script in response to changes my local library makes to its website. When the script stopped working last week, I put it on my to-do list but was too busy to get around to it until today.

Quick recap: checkcards is a script (actually a Python script, a shell script, and a LaunchAgent plist, but it’s the Python script that does the heavy lifting) that logs into the library’s website, gathers up all the books my family has checked out or on hold, and sends my wife and me a nicely formatted email with that information every morning. Here’s what the email looks like on an iPhone: one table with the checked-out items,

Checked out

and one with the on-hold items.

On hold

For checked-out items, a light red background indicates an items that’s due or over due, and a light yellow background indicates an item that’s due within two days. For on-hold items, a light red background indicates an item that’s ready to be picked up.

The last time I had to update the scripts, I made a big change, switching from the mechanize module to the twill module. These are both modules that act sort of like a web browser, allowing you to automate the interaction with a site.Twill seemed easier and cleaner, but today I had to switch back to mechanize.

Here’s what happened. The library revamped its entire website. Most of the changes seem to be cosmetic, but the login page—which you need to go through to get at your lists of checked-out and on-hold items—has changed more than just its CSS file.

New login page

The form on this page confused twill, making it think there were actually two <form> sections, one with the form items you see and another with a couple of hidden fields. A quick look at the page’s HTML confirmed that the hidden fields were there, but they were part of the form with the visible fields. Twill’s confusion on this matter caused it to send incomplete information to the server, so the script couldn’t log in. I tried what I could to get twill to interpret the form correctly, but nothing worked.

I then experimented with Ruby’s mechanize module. I don’t really know Ruby, but I figured it wouldn’t be too hard to put together a script if the module worked. It didn’t. In fact, it didn’t recognize the hidden fields at all.

So I went back to the Python mechanize module and tried again. My suspicion was that the problem I’d had with mechanize a couple of months ago wasn’t due to a deficiency in mechanize itself1 but to its crappy documentation.

Don’t get me wrong. There’s plenty of documentation for mechanize, but a lot of it is out of date and almost all of it is focused on the lower-level classes and functions. Mechanize has a high-level class, called Browser, which I figured was the proper class to use, but there’s no real documentation on its methods. All the mechanize site has on Browser are some simple, incomplete examples on the home page.

I have to think mechanize’s programmers want us to use Browser, because it provides the most straightforward interface to mechanize’s capabilities. But by failing to provide complete documentation on its methods, they’re limiting the number of people who can use their library.

Two things put me on the right path for learning Browser’s methods:

  1. The source code itself.
  2. This blog post by Rogério Carvalho Schneider. Yes, it’s mostly a set of examples, but Schneider’s examples are more complete and cover more methods.

The relevant portion of my script is now this

python:
 57:  # Go through each card, collecting the lists of items.
 58:  for card in cardList:
 59:    # Open a browser and login
 60:    br = mechanize.Browser()
 61:    br.set_handle_robots(False)
 62:    br.open(lURL)
 63:    br.select_form(nr=0)
 64:    br.form['code'] = card['code']
 65:    br.form['pin'] = card['pin']
 66:    br.submit()
 67:    
 68:    # Go to the page for items checked out and get the HTML.
 69:    br.open(cURL)
 70:    cHtml = br.response().read() 
 71:  
 72:    # Go to the page for items on hold and get the HTML.
 73:    br.open(hURL)
 74:    hHtml = br.response().read()

Looking back through the history of this script, I see that this isn’t terribly different from what this section looked like before the switch to twill, but it’s different enough to make this version work. And I have a better understanding of why it works.

That understanding will come in handy the next time the library’s webmaster decided to freshen up the site and break my script.


  1. It really couldn’t be, as twill is built on mechanize