Posts Tagged ‘python’

Updated Flickr URL script for TextExpander

Last week I wrote a little Python script that printed out the URL of a Flickr image when that image’s page is currently showing in Safari. I used that script with TextExpander to automatically type out the URL when I needed it without having to dig in a couple of levels to get the image URL by hand. I’ve since improved the script to be more flexible and easier to modify.

I won’t go through my motivation for writing the script; it’s laid out in last week’s post. I’ll just point out that there were two problems with the script as it was originally written:

  1. “Special” strings, like URLs, were buried in the code instead of defined at the beginning.
  2. It worked only when the current page in Safari was the main page for the photo. It failed when the current page was, for example, one of the “sized” pages for the image.

Both of these problems have been fixed with this new version.

 1:  #!/usr/bin/python
 2:  
 3:  import appscript
 4:  import re
 5:  import sys
 6:  from urllib import urlopen
 7:  
 8:  # The basic URL format for photos.
 9:  baseURL = 'http://www.flickr.com/photos/%s/%s/'
10:  
11:  # The regex for extracting user and photo info.
12:  infoRE = r'flickr\.com/photos/(.*)/(\d+)/?'
13:  
14:  # The various image URL suffixes.
15:  suffixes = {'master': '_m.jpg',
16:              'original':    '_o_d.jpg',
17:              'large':   '_b_d.jpg',
18:              'medium640':   '_z_d.jpg',
19:              'medium500':   '_d.jpg',
20:              'small': '_m_d.jpg',
21:              'thumbnail':   '_t_d.jpg',
22:              'square':  '_s_d.jpg'}
23:  
24:  # Get the URL of the frontmost Safari tab and extract the photo info.
25:  thisURL = appscript.app('Safari').documents[0].URL.get()
26:  info = re.findall(infoRE, thisURL)
27:  
28:  # Download the main page for that photo and get its "master URL."
29:  # Use the master to generate the URL for the medium500 image
30:  # and print it.
31:  try:
32:    user = info[0][0]
33:    id = info[0][1]
34:    pageURL = baseURL % (user, id)
35:    html = urlopen(pageURL).read()
36:    imageURL = re.search(r'<link\s+rel="image_src"\s+href="([^"]+)"', html).group(1)
37:    imageURL = imageURL.replace(suffixes['master'], suffixes['medium500'])
38:    sys.stdout.write(imageURL)
39:  
40:  # Print an error message if there's any problem.
41:  except:
42:    sys.stdout.write("wrongpagewrongpage")

Lines 8-22 pull all the special strings out to the top of the code, where they can be seen (and adjusted if Flickr changes its URL format). The new suffixes dictionary included all the size possibilities, so it would be a simple matter to change the code to return, say, the Thumbnail URL; just change medium500 in Line 37 to thumbnail.

In the previous version of this script, the URL of the current Safari page would be downloaded and searched for the special <link rel="image_src" > tag. The problem with this was that some Flickr image pages—in particular, the pages associated with “sized” images—didn’t have this tag, so the search would fail. This version defines the baseURL for the photo, and downloads it instead of the current Safari page, insuring that the <link> tag will be present.

Errors are now handled through exceptions instead of an if/else test. This allows us to handle a multitude of errors with a single error message.

As before, I have this script saved as a Shell Script in TextExpander and tied to an abbreviation of ;500. Now it’s a snap to enter Flickr image URLs wherever I need them.


Flickr image URL via TextExpander

If you’re a Flickr user, you’ve probably been trying out its new layout. For the most part, I like it. The photos are bigger and there’s less clutter elsewhere on the page. But it’s not an unalloyed improvement. My biggest disappointment with the new layout has to do with using images from my Flickr stream here on the blog; it takes longer now to pluck out the URL of an image than it used to. This prompted me to write up a Python script—which I can call via TextExpander—that gets the URL of the image showing in the current Safari page.

Let me first clarify what I mean by “image URL.” I don’t mean the URL of the page that shows the image; that would be something like

http://www.flickr.com/photos/drdrang/4812406557/

No, I mean the URL of the image itself, specifically the 500-pixel wide size. That URl looks like

http://farm5.static.flickr.com/4141/4812406557_36acccbccd_d.jpg

I want the 500-pixel version because it’s a good size to fit in this blog.

Other sizes are available; they’ll have the same URL except for the part between the last underscore and the .jpg. We’ll talk about that in more detail later.

In the old Flickr layout, there was a set of buttons across the top of the photo.

Clicking the “All Sizes” button would take me to a page showing the Large version (1024 pixels wide) of the photo and a set of buttons for other sizes. Clicking the Medium button would take me to a similar page that was showing a 500-pixel wide version of the photo. Below that were a couple of text fields, the second of which contained the image URL for the Medium size.

I’d copy that and paste it into the post I was writing. It was a little cumbersome, but took only two or three clicks to get the URL I was after.

Now the “All Sizes” navigation is done through a menu that requires two clicks instead of one.

I still have click the Medium button to get the size I want—no change there—but now there’s no field with the image URL. I have to right click (or control click) on the Download link and then drag to (or click on) the Copy Link item in the popup menu.

As I write out the steps, it doesn’t seem like the new layout requires me to do much more than the old one. One click more, maybe two, depending how you count. But it seems to go much slower because

So that’s the motivation for the script. Maybe my perception is off, but it sure seems to take a good deal longer to grab an image URL now that it used to.

Here’s script itself:

 1:  #!/usr/bin/python
 2:  
 3:  import appscript
 4:  import re
 5:  import sys
 6:  from urllib import urlopen
 7:  
 8:  # Get the URL of the frontmost Safari tab.
 9:  pageURL = appscript.app('Safari').documents[0].URL.get()
10:  
11:  if 'flickr.com/photos' in pageURL:
12:    # Get the medium-sized image URL for the displayed photo.
13:    html = urlopen(pageURL).read()
14:    imageURL = re.search(r'<link\s+rel="image_src"\s+href="([^"]+)"', html).group(1)
15:    imageURL = imageURL.replace('_m.jpg', '_d.jpg')
16:    sys.stdout.write(imageURL)
17:  else:
18:    sys.stdout.write("wrongpagewrongpage")

Update 7/26/10
Here’s an improved version of this script that’s more flexible in how it gets the image URL and is easier to modify for other purposes.

If you want to use it or modify it for your own purposes, you’ll have to install the nonstandard appscript module. Line 9 uses that module to get the URL of the frontmost Safari page.

The rest of the script is just garden-variety Python. Line 13 retrieves the HTML of the photo page, and line 14 plucks out from it the “master URL” for the image. The <head> section of the photo page will have a <link> tag that looks like this:

<link rel="image_src" href="http://farm5.static.flickr.com/4141/4812406557_36acccbccd_m.jpg">

The href attribute is the master URL; all the different sizes of this photograph will have the same URL but for the part between the last underscore and the .jpg extension. Here’s a table of the options.

Size (width) Suffix
Original o_d
Large (1024) b_d
Medium (640) z_d
Medium (500) d
Small (240 m_d
Thumbnail (100) t_d
Square (75) s_d

Line 15 converts the master URL to a size-specific one for the smaller of the Medium sizes. Line 16 then sends it to standard output.

If your front Safari page isn’t a Flickr photo page, the test in Line 11 should catch that and the script will print wrongpagewrongpage instead of a URL. This may seem a little childish, but it’s a distinctive error message that can be selected for deletion with a quick double-click.

I have this script saved as a Shell Script in TextExpander, with an abbreviation of ;500. The semicolon is there because that’s the signal character I use at the beginning of all of my abbreviations. The 500 is the mnemonic for the width of the image.

Since creating this abbreviation, I’m finding it much easier to include photos from my Flickr stream in the blog.

The script could, of course, be modified to return URLs for other image sizes—just change Line 15. More interestingly, it could be the start of a script that downloads images of one or more sizes. I leave that as an exercise for the reader.


New Metra schedule for Simplenote

As I mentioned last November, I have plain text versions of the Metra commuter rail schedule between Chicago and Naperville (where I live) saved in Simplenote. Metra made some changes to the schedule this week, so I updated and decided to make the files available in a GitHub repository.

On the iPhone, the schedules look like this:

Because Simplenote uses Helvetica, a proportional font, and doesn’t have adjustable tab stops (even if it did, there’s no tab key on the iPhone for entering them), the columns don’t line up perfectly, but they look OK. Until the iPhone gets a decent monospaced font, this will have to do.

There are six schedule files in the repository:

  1. Eastbound Monday through Friday
  2. Eastbound Saturday
  3. Eastbound Sunday
  4. Westbound Monday through Friday
  5. Westbound Saturday
  6. Westbound Sunday

Also in the repository is a Python script, metra.py, that I wrote to reformat the schedule times from the way they’re presented on the Metra web page.

I copy the schedule times from the box and paste them into a text editor. Generally, I get something that looks like this,

  08:40  10:40  12:40  02:40  04:40  06:40  08:40  10:40  12:40
Naperville  09:37  11:37  01:37  03:37  05:37  07:37  09:37  11:37  01:37

which some extra stuff at the beginning of each line that needs to be deleted to make it look like this:

08:40  10:40  12:40  02:40  04:40  06:40  08:40  10:40  12:40
09:37  11:37  01:37  03:37  05:37  07:37  09:37  11:37  01:37

Then I copy those lines and execute

pbpaste | python metra.py | pbcopy

which puts the reformatted schedule,

  8:40a            9:37a
10:40a          11:37a
12:40p            1:37p
  2:40p            3:37p
  4:40p            5:37p
  6:40p            7:37p
  8:40p            9:37p
10:40p          11:37p
12:40a            1:37a

onto the clipboard. It looks weird here, but that’s because you’re seeing it in a monospaced font, not Helvetica. Finally, I paste the times into the Simplenote webapp and do some minor editing. This usually consists of

Here’s metra.py:

 1:  #!/usr/bin/python
 2:  
 3:  import re
 4:  import sys
 5:  
 6:  # Collect the two rows of data.
 7:  start = sys.stdin.readline().split()
 8:  stop = sys.stdin.readline().split()
 9:  
10:  # Change leading zeros to two spaces.
11:  start = [re.sub(r'^0', '  ', s) for s in start]
12:  stop = [re.sub(r'^0', '  ', s) for s in stop]
13:  
14:  # Print the data as two columns, using a simple heuristic for am/pm.
15:  ap = 'a'
16:  for i in range(0, len(start)):
17:    if start[i][0:2] == '12':
18:      if ap == 'a':
19:        ap = 'p'
20:      else:
21:        ap = 'a'
22:    
23:    print ' %s%s          %s%s' % (start[i], ap, stop[i], ap)

The AM/PM test is in Lines 17-21. This works pretty well for the Naperville-Chicago schedule and would probably be OK for other schedules, too. I thought about writing a routine that would work in every case, but it just wasn’t worth the effort. With this simple test, I only had to change a few as and ps.

If you’re a Simplenote user who lives in Naperville, the schedule files are pretty handy as is. If you’re a Simplenote user who lives in another town on the Chicago-Aurora line, or on another Metra line entirely, you can use metra.py to create your own Simplenote files.

If you’re not a Simplenote user, you should give it a try. It may be that Jesse Grosjean’s Dropbox-syncing suite of programs will end up working more smoothly, but until that happens, Simplenote is leading the pack.


Monte Carlo and the Two Child Problem

In the previous post about the Two Child Problem, we thought about how the probabilities would change under different rules. In this post, let’s write those rules into a program and see how the probabilities change in a Monte Carlo (no relation to Monty Hall) simulation.

To review, the Two Child Problem is this:

Mr. Smith has two children. At least one of them is a boy. What is the probability that both children are boys?

The answer depends on what rules we think the questioner is following. We’ll look at three cases:

  1. The questioner would never pose this problem if Mr. Smith had two daughters. The problem is restricted to families with at least one son and the question is always about the probability of two sons.
  2. The questioner isn’t restricted at all. He simply tells us about one child, chosen at random, in a two-child family and asks us if the other child is of the same sex.
  3. The questioner is biased toward boys. If there’s at least one boy in the family, that’s what he tells us; if the family has two girls, he tells us there’s at least one girl. In either case, he asks for the probability that the other child is of the same sex.

In Monte Carlo simulation, we use the computer to generate lots of random events and then combine the counts of those random events to estimate probabilities. For the Two Child Problem, we’ll simulate “families” by generating pairs of letters: G for girls, B for boys. The counts we need to keep track of are:

Note that n_{sons} + n_{2daughters} = n.

For the first case, we’re eliminating from consideration the families with two daughters, so the probability will be

\frac{n_{2sons}}{n_{sons}}

For the second case, we include all the families. Since we’re choosing the “revealed” child at random and asking if the other child is of the same sex, it’s equivalent to going through the list of all the families and picking out the boy-boy and girl-girl families. The probability will be

\frac{n_{2sons} + n_{2daughters}}{n}

The third case is a little trickier. Recognize first that if the family has any boys, the questioner will ask about boys and the probability will be calculated as in the first case. The questioner will ask about girls only if the family has two girls, so the probability of having two children of the same sex under that condition is 1. We use conditional probability to combine these situations:

\begin{eqnarray} P(\textrm{same sex}) & = & P(\textrm{same sex} | \textrm{boys} \ge 1)P(\textrm{boys} \ge 1) \\ & & + P(\textrm{same sex} | 2\;\textrm{girls}) P(2\;\textrm{girls}) \end{eqnarray}

With our variables, this becomes

\left(\frac{n_{2sons}}{n_{sons}}\right) \cdot \left(\frac{n_{sons}}{n}\right) + 1 \cdot \left(\frac{n_{2daughters}}{n}\right)

With a little algebra this formula reduces that of the second case. Which means that these two sets of rules are equivalent, even though they don’t seem to be.

Here’s a Python program that implements these ideas.

 1:  #!/usr/bin/python
 2:  
 3:  from __future__ import division
 4:  from random import choice
 5:  
 6:  n = 10000
 7:  sexes = ('G', 'B')
 8:  families = []
 9:  
10:  for i in range(n):
11:    families.append((choice(sexes), choice(sexes)))
12:  
13:  nsons = len([x for x in families if 'B' in x])
14:  n2sons = len([x for x in families if x == ('B', 'B')])
15:  n2daughters = len([x for x in families if x == ('G', 'G')])
16:  
17:  print '''If we restrict ourselves to families that have at least one son,
18:  the probability of having two sons is %d/%d = %5.3f''' % (n2sons, nsons, n2sons/nsons)
19:  
20:  print
21:  
22:  print '''If we choose the "revealed" child at random, the probability of having
23:  two children of the same sex is %d/%d = %5.3f''' % (n2sons+n2daughters, n, (n2sons+n2daughters)/n)
24:  
25:  print
26:  
27:  print '''If we "reveal" boys in every case except when there are two daughters,
28:  the probability of having two children of the same sex is
29:  (%d/%d)*(%d/%d) + 1*(%d/%d) = %5.3f''' % (n2sons, nsons, nsons, n, n2daughters, n, n2sons/nsons*nsons/n+n2daughters/n)

We use the choice function from the random module to generate 10,000 simulated families as a list of tuples. Lines 13-15 then filter the list according to certain criteria and count the number of families left. Line 17 onward does the calculations according to the formulas above and prints the results.

Here’s a sample of the output.

If we restrict ourselves to families that have at least one son,
the probability of having two sons is 2520/7535 = 0.334

If we choose the "revealed" child at random, the probability of having
two children of the same sex is 4985/10000 = 0.498

If we "reveal" boys in every case except when there are two daughters,
the probability of having two children of the same sex is
(2520/7535)*(7535/10000) + 1*(2465/10000) = 0.498

Based on the reasoning of the earlier post, the answers are what we expected. But thinking the problem through from a Monte Carlo perspective does give a different view of what the various rules mean.

The mantra of Richard Hamming’s book Numerical Methods for Scientists and Engineers is

The purpose of computing is insight, not numbers.

I think this exercise is a good illustration of that. We didn’t really have to write the Monte Carlo program; just working out how we were going to write it gave us an understanding of the similarities and differences in the three sets of rules.


Email merging and attachments with Python

Yesterday I had to send out a bunch of personalized emails. The messages were mostly the same, but each message had some unique information (aside from the To address). I’m sure there are several email merge programs available, and probably even some online tools for doing this, but since I’ve already written a couple of Python scripts that send mail, I figured it would be at least as fast to rejigger one of those as it would be to find and learn a new app.

The one thing that worried me was the need for an attachment to be included with each email. My earlier scripts were for text messages only, and I didn’t know how to do the base64 encoding of the (PDF) attachment. Python has a base64 library, and I’m sure I would have learned a lot by digging into it, but I didn’t want to learn base64—I just wanted to get the damned emails out the door.

So I cheated. I sent myself a test email with the PDF attached. Then I opened it in Mail and invoked the View▸Message▸Raw Source command (⌥⌘U).

I copied the block of gibberish that represents the encoded file, all 192 kilobytes of it, and pasted it straight into the appropriate spot in my Python source code, making this by far the longest script I’ve ever written. Not as elegant as opening the PDF, reading it in, and encoding it—but a hell of a lot faster.

The rest of the script was pretty prosaic: loop through the lines of STDIN to get the recipient-specific text, stick it into the appropriate places of the template with the % operator, and call sendmail via the Popen function of the subprocess library. It was the cheat that made the script fun.


Relative links in PNotes

In the last big change to my no-server wiki system—which I now call PNotes—I added the ability to categorize notes by subdirectory. This made PNotes more organized, but because I took some shortcuts in the programming I lost some of the portability of the system. In the last few days I’ve fixed the code to get portability back and uploaded the fixes to the PNotes GitHub repository.

One of my goals with PNotes was to create a wiki-like system that was self-contained within a folder—no reliance on a database system. I wanted to be able to move the PNotes folder anywhere—from place to place on my main computer, to my notebook, to someone else’s computer, to a CDROM or DVD, to my iPhone—and still have it work. When I added the ability to have notes in subdirectories, I decided to use a <base> tag in the <head> of each note to make it easier to generate the list of links in the sidebar.

The <base> tag made all the sidebar links absolute instead of relative, and this broke the portability I wanted. I could still move a PNotes folder anywhere on any of my computers because my computers have all the utility programs needed to regenerate the HTML files in their new location—all I had to do was reset the <base> and run make clean; make in the new directory. But I couldn’t zip up a copy of a PNotes folder and send it someone else, nor could I put it on my iPhone; the <base> would be wrong and screw up all the links.

Now, through a combination of Python and JavaScript, PNotes portability has been restored. You’ll need one line in the project.info file,

dirname = notes

to tell the system the name of the PNotes folder. I’ve always just used “notes,” but you can use any name as long as you set the dirname option accordingly. (I strongly suggest you stick to alphanumeric characters; the folder name is used in a regular expression, so special characters like parentheses and hyphens could mess things up)

Strictly speaking, the new PNotes isn’t perfectly portable. For example, the two “Edit” links in the sidebar won’t work unless the computer understands the txmt:// URL scheme for opening a file in TextMate (or BBEdit). And the contacts links won’t work unless it understands the addressbook:// scheme and has the same list of contacts with the same Address Book IDs. But all the links to notes should work on any computer.

The latest changes have made the GitHub README for PNotes a little out of date. I’ll be fixing it up in the next few days.