Posts Tagged ‘programming’

Updated Flickr URL script for TextExpander

Last week I wrote a little Python script that printed out the URL of a Flickr image when that image’s page is currently showing in Safari. I used that script with TextExpander to automatically type out the URL when I needed it without having to dig in a couple of levels to get the image URL by hand. I’ve since improved the script to be more flexible and easier to modify.

I won’t go through my motivation for writing the script; it’s laid out in last week’s post. I’ll just point out that there were two problems with the script as it was originally written:

  1. “Special” strings, like URLs, were buried in the code instead of defined at the beginning.
  2. It worked only when the current page in Safari was the main page for the photo. It failed when the current page was, for example, one of the “sized” pages for the image.

Both of these problems have been fixed with this new version.

 1:  #!/usr/bin/python
 2:  
 3:  import appscript
 4:  import re
 5:  import sys
 6:  from urllib import urlopen
 7:  
 8:  # The basic URL format for photos.
 9:  baseURL = 'http://www.flickr.com/photos/%s/%s/'
10:  
11:  # The regex for extracting user and photo info.
12:  infoRE = r'flickr\.com/photos/(.*)/(\d+)/?'
13:  
14:  # The various image URL suffixes.
15:  suffixes = {'master': '_m.jpg',
16:              'original':    '_o_d.jpg',
17:              'large':   '_b_d.jpg',
18:              'medium640':   '_z_d.jpg',
19:              'medium500':   '_d.jpg',
20:              'small': '_m_d.jpg',
21:              'thumbnail':   '_t_d.jpg',
22:              'square':  '_s_d.jpg'}
23:  
24:  # Get the URL of the frontmost Safari tab and extract the photo info.
25:  thisURL = appscript.app('Safari').documents[0].URL.get()
26:  info = re.findall(infoRE, thisURL)
27:  
28:  # Download the main page for that photo and get its "master URL."
29:  # Use the master to generate the URL for the medium500 image
30:  # and print it.
31:  try:
32:    user = info[0][0]
33:    id = info[0][1]
34:    pageURL = baseURL % (user, id)
35:    html = urlopen(pageURL).read()
36:    imageURL = re.search(r'<link\s+rel="image_src"\s+href="([^"]+)"', html).group(1)
37:    imageURL = imageURL.replace(suffixes['master'], suffixes['medium500'])
38:    sys.stdout.write(imageURL)
39:  
40:  # Print an error message if there's any problem.
41:  except:
42:    sys.stdout.write("wrongpagewrongpage")

Lines 8-22 pull all the special strings out to the top of the code, where they can be seen (and adjusted if Flickr changes its URL format). The new suffixes dictionary included all the size possibilities, so it would be a simple matter to change the code to return, say, the Thumbnail URL; just change medium500 in Line 37 to thumbnail.

In the previous version of this script, the URL of the current Safari page would be downloaded and searched for the special <link rel="image_src" > tag. The problem with this was that some Flickr image pages—in particular, the pages associated with “sized” images—didn’t have this tag, so the search would fail. This version defines the baseURL for the photo, and downloads it instead of the current Safari page, insuring that the <link> tag will be present.

Errors are now handled through exceptions instead of an if/else test. This allows us to handle a multitude of errors with a single error message.

As before, I have this script saved as a Shell Script in TextExpander and tied to an abbreviation of ;500. Now it’s a snap to enter Flickr image URLs wherever I need them.


A quick script for Avery 5160 labels

An occasional theme of these blog posts is the value of scripting as a way of showing your computer who’s boss. If you can’t write scripts—and here I’m adopting a broad definition of scripts to include macros and other customizations—you become a slave to your computer, forced to do things the way it wants you to instead of the way that’s most natural and efficient for you.

Tonight I helped my wife and a friend who needed to print up a bunch of labels for the neighborhood swim team. The computer program the swim team uses to run the meets and keep track of times can print labels, but not the type of label they needed. What they needed was a label for each swimmer who swam faster than the “city time”1 in one or more events this year. The labels were to be stuck to the backs of ribbons given out at tomorrow morning’s awards ceremony. (Oh yes, I forgot to mention: there was a looming deadline for these labels.)

What we had was a printed list of all the swimmers who were to get ribbons and the events in which they’d made city times. What we needed was to get that information printed on a set of Avery 5160 labels. Some division of labor was in order.

The list went to our daughter, the fastest non-professional typist I know. I asked her to retype the list of names and events into a text file with a simple format and email it to me. Yes, we could have transferred the file over the network here in the house, but it’s usually fastest to use the tools you’re most familiar with.

While she was typing, I pulled up my ancient Perl code for printing file folder labels and started modifying it. That script was written for Avery 5161 label sheets, which have two columns of ten labels each. The 5160 sheets have three columns of ten labels each, so the necessary changes were obvious:

These changes went smoothly, and after a few syntax errors—programming in Python has gotten me out of the habit of ending statements with a semicolon—the script was up and running. It takes a text file that looks like this:2

#Irene Hartnett|2010
50 Free, 50 Back, 50 Breast
50 Fly, 100 IM

#Rosalina Reial|2010
50 Fly

#Darrin Schrick|2010
50 Back, 50 Fly

#Douglas Dunnam|2010
50 Free, 50 Back, 50 Breast
50 Fly, 100 IM

#Carmina Jaworsky|2010
50 Fly

#Elliott Oland|2010
50 Free, 50 Back, 50 Breast
50 Fly, 100 IM

#Salena Angel|2010
25 Back

#Elroy Tigert|2010
50 Free

#Robin Turiano|2010
50 Back

#Jolyn Mcclerkin|2010
50 Free, 50 Breast
50 Fly, 100 IM

#James Wolf|2010
50 Free, 50 Back, 50 Breast
50 Fly, 100 IM

#Renea Addison|2010
50 Free, 50 Back, 50 Breast
50 Fly, 100 IM

#Jesse Bautista|2010
25 Free, 50 Free, 25 Back
25 Breast, 25 Fly

#Gail Tanous|2010
50 Free, 50 Breast, 50 Fly

#Miguel Loarca|2010
25 Free, 50 Free, 25 Back
25 Breast, 25 Fly

and generates a PDF that looks like this:

I didn’t ask my daughter to type in the hash marks, vertical bars, or the year; I added those to the file with a couple of search-and-replace commands. I used the same formatting rules as my file folder label program:

Here’s the script that does it:

  1:  #!/usr/bin/perl
  2:  
  3:  use Getopt::Std;
  4:  
  5:  # Usage/help message.
  6:  $usage = <<USAGE;
  7:  Usage: prlabels [options] [filename]
  8:  Print file folder labels on Avery 5160 sheets
  9:  
 10:    -r m : start at row m (range: 1..10; default: 1)
 11:    -c n : start at column n (range 1..3; default: 1)
 12:    -h   : print this message
 13:  
 14:  If no filename is given, use STDIN. A label entry is a plain text
 15:  series of non-blank lines. Blank lines separate entries.
 16:  
 17:  The first line of an entry is special. If it starts with a #, then it's
 18:  considered a header line. Everything in the header line up to the | is
 19:  printed flush left in bold and everything after the | is printed flush
 20:  right in bold. Subsequent lines are printed centered in normal weight.
 21:  If the first line of an entry doesn't start with #, it uses the header
 22:  of the previous entry.
 23:  USAGE
 24:  
 25:  # Set up geometry constants for Avery 5160.
 26:  $topmargin = 0.60;
 27:  $poleft = 0.4;
 28:  $pomiddle = 3.20;
 29:  $poright = 5.95;
 30:  $lheight = 1;
 31:  
 32:  # get starting point from command line if present
 33:  getopts('hr:c:', \%opt);
 34:  die $usage if ($opt{h});
 35:  
 36:  $row = int($opt{r}) || 1;    # chop off any fractional parts and
 37:  $col = int($opt{c}) || 1;
 38:  
 39:  # Bail out if position options are out of bounds
 40:  die $usage unless (($row >= 1 and $row <= 10) and 
 41:                     ($col >= 1 and $col <= 3));
 42:  
 43:  # Set initial horizontal and vertical positions.
 44:  if ($col == 1) {
 45:    $po = $poleft;
 46:  } elsif ($col == 2) {
 47:    $po = $pomiddle;
 48:  } else {
 49:    $po = $poright;
 50:  }
 51:  $sp = ($topmargin + ($row - 1)*$lheight);
 52:  
 53:  # Pipe output through groff and ps2pdf.
 54:  open OUT, "| groff | ps2pdf -";
 55:  # open OUT, "> labels.rf";    # for debugging
 56:  select OUT;
 57:  
 58:  # Set up document.
 59:  print <<SETUP;
 60:  .ps 11
 61:  .vs 15
 62:  .ll 2.20i
 63:  .ta 2.20iR
 64:  
 65:  SETUP
 66:  
 67:  # The troff code for formatting a single entry, with placeholders for
 68:  # positioning on the page. The magic numbers embedded in the formatting
 69:  # commands make the layout look nice.
 70:  $label = <<ENTRY;
 71:  .sp |%.2fi
 72:  .po %.2fi
 73:  .ft HB
 74:  %s
 75:  .ft H
 76:  .ce 3
 77:  %s
 78:  .ce 0
 79:  ENTRY
 80:  
 81:  # Slurp all the input into an array of entries.
 82:  $/ = "";
 83:  @entries = <>;
 84:  
 85:  $bp = 0;                  # we don't want to start with a page break
 86:  
 87:  foreach $body (@entries) {
 88:    # Parse and transform the header and body.
 89:    if ($body =~ /^#/) {    # it's a header line
 90:      ($header, $body) = split(/\n/, $body, 2);
 91:      $header = substr($header, 1);
 92:      $header =~ s/\|/\t/;
 93:    }  
 94:    $body =~ s/\s+$//;
 95:  
 96:    # Break page if we ran off the end.
 97:    if ($bp) {
 98:      print "\n.bp\n";      # issue the page break command
 99:      $bp = 0;              # reset flag
100:    }
101:    
102:    # Print the label.
103:    printf $label, $sp, $po, $header, $body;
104:    
105:    # Now we set up for the next entry.
106:    if ($col == 1){       # last entry was in the left column
107:      $col = 2;             # so the next will be in
108:      $po = $pomiddle;      # the middle column
109:    } elsif ($col == 2) { # last entry was in the middle column
110:      $col = 3;             # so the next will be in
111:      $po = $poright;       # the right column
112:    } else {              # last was in the right column
113:      $col = 1;             # so the next will be in
114:      $po = $poleft;        # the left column
115:      $row++;               # of the next row
116:      if ($row > 10) {      # we're at the end of the page
117:        $bp = 1;            # page break flag
118:        $row = 1;           # new page starts at top row
119:      }
120:      $sp = ($topmargin + ($row - 1)*$lheight);
121:    }
122:  }

I fed the input file into this script and sent the resulting PDF to the printer. Lots of labels in very little time.

More important, we now have a tool for doing this again and again. Maybe next year we’ll be able to save a step by getting the swim team’s program to spit out a text file instead of a printed list that has to be retyped.


  1. This is a somewhat arbitrary “wheat from chaff” separator. It’s called a city time because those who beat it get to compete in a city-wide meet at the end of the season. 

  2. By the way, if you ever need to create some fake names, I suggest The Name Generator


Flickr image URL via TextExpander

If you’re a Flickr user, you’ve probably been trying out its new layout. For the most part, I like it. The photos are bigger and there’s less clutter elsewhere on the page. But it’s not an unalloyed improvement. My biggest disappointment with the new layout has to do with using images from my Flickr stream here on the blog; it takes longer now to pluck out the URL of an image than it used to. This prompted me to write up a Python script—which I can call via TextExpander—that gets the URL of the image showing in the current Safari page.

Let me first clarify what I mean by “image URL.” I don’t mean the URL of the page that shows the image; that would be something like

http://www.flickr.com/photos/drdrang/4812406557/

No, I mean the URL of the image itself, specifically the 500-pixel wide size. That URl looks like

http://farm5.static.flickr.com/4141/4812406557_36acccbccd_d.jpg

I want the 500-pixel version because it’s a good size to fit in this blog.

Other sizes are available; they’ll have the same URL except for the part between the last underscore and the .jpg. We’ll talk about that in more detail later.

In the old Flickr layout, there was a set of buttons across the top of the photo.

Clicking the “All Sizes” button would take me to a page showing the Large version (1024 pixels wide) of the photo and a set of buttons for other sizes. Clicking the Medium button would take me to a similar page that was showing a 500-pixel wide version of the photo. Below that were a couple of text fields, the second of which contained the image URL for the Medium size.

I’d copy that and paste it into the post I was writing. It was a little cumbersome, but took only two or three clicks to get the URL I was after.

Now the “All Sizes” navigation is done through a menu that requires two clicks instead of one.

I still have click the Medium button to get the size I want—no change there—but now there’s no field with the image URL. I have to right click (or control click) on the Download link and then drag to (or click on) the Copy Link item in the popup menu.

As I write out the steps, it doesn’t seem like the new layout requires me to do much more than the old one. One click more, maybe two, depending how you count. But it seems to go much slower because

So that’s the motivation for the script. Maybe my perception is off, but it sure seems to take a good deal longer to grab an image URL now that it used to.

Here’s script itself:

 1:  #!/usr/bin/python
 2:  
 3:  import appscript
 4:  import re
 5:  import sys
 6:  from urllib import urlopen
 7:  
 8:  # Get the URL of the frontmost Safari tab.
 9:  pageURL = appscript.app('Safari').documents[0].URL.get()
10:  
11:  if 'flickr.com/photos' in pageURL:
12:    # Get the medium-sized image URL for the displayed photo.
13:    html = urlopen(pageURL).read()
14:    imageURL = re.search(r'<link\s+rel="image_src"\s+href="([^"]+)"', html).group(1)
15:    imageURL = imageURL.replace('_m.jpg', '_d.jpg')
16:    sys.stdout.write(imageURL)
17:  else:
18:    sys.stdout.write("wrongpagewrongpage")

Update 7/26/10
Here’s an improved version of this script that’s more flexible in how it gets the image URL and is easier to modify for other purposes.

If you want to use it or modify it for your own purposes, you’ll have to install the nonstandard appscript module. Line 9 uses that module to get the URL of the frontmost Safari page.

The rest of the script is just garden-variety Python. Line 13 retrieves the HTML of the photo page, and line 14 plucks out from it the “master URL” for the image. The <head> section of the photo page will have a <link> tag that looks like this:

<link rel="image_src" href="http://farm5.static.flickr.com/4141/4812406557_36acccbccd_m.jpg">

The href attribute is the master URL; all the different sizes of this photograph will have the same URL but for the part between the last underscore and the .jpg extension. Here’s a table of the options.

Size (width) Suffix
Original o_d
Large (1024) b_d
Medium (640) z_d
Medium (500) d
Small (240 m_d
Thumbnail (100) t_d
Square (75) s_d

Line 15 converts the master URL to a size-specific one for the smaller of the Medium sizes. Line 16 then sends it to standard output.

If your front Safari page isn’t a Flickr photo page, the test in Line 11 should catch that and the script will print wrongpagewrongpage instead of a URL. This may seem a little childish, but it’s a distinctive error message that can be selected for deletion with a quick double-click.

I have this script saved as a Shell Script in TextExpander, with an abbreviation of ;500. The semicolon is there because that’s the signal character I use at the beginning of all of my abbreviations. The 500 is the mnemonic for the width of the image.

Since creating this abbreviation, I’m finding it much easier to include photos from my Flickr stream in the blog.

The script could, of course, be modified to return URLs for other image sizes—just change Line 15. More interestingly, it could be the start of a script that downloads images of one or more sizes. I leave that as an exercise for the reader.


Embedded Google maps

I’m my company’s default webmaster, and I spent a few hours today learning how to embed Google maps into our web pages. I figured I’d write up an example before I forget.

There’s a whole family of Google Maps APIs, covering JavaScript, Flash, static images, Google Earth images, and more. The JavaScript API makes the most sense for my website. I read through the tutorial and started modifying its simple example while consulting the developer’s guide and sections of the reference.

There are dozens of possibilities for creating an embedded map. I chose a fairly simple map with a custom marker and an info window. The result looks like this static image,

and is generated by this HTML/JavaScript/CSS melange:

 1:  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 2:     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 3:  <html>
 4:  <head>
 5:     <title>Embedded Google Map</title>
 6:     <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 7:     <script type="text/javascript" src="http://maps.google.com/maps/api/js?sensor=false"></script>
 8:     <script type="text/javascript">
 9:      function initialize() {
10:        var drangLL = new google.maps.LatLng(41.74593, -88.17815);
11:        var centerLL = new google.maps.LatLng(41.743, -88.18);
12:              var myOptions = {
13:          zoom: 15,
14:          center: centerLL,
15:          scrollwheel: false,
16:          mapTypeId: google.maps.MapTypeId.HYBRID
17:        };
18:        
19:        var map = new google.maps.Map(document.getElementById("map_canvas"), myOptions);
20:        
21:        var contentString = '<span style="font-family: Helvetica, Arial;font-size: 80%">I\'m crossing this bridge in the<br />Springbrook Prairie Preserve.<br />Zoom in to see it.</span>';
22:        var infoWindow = new google.maps.InfoWindow({
23:          content: contentString });
24:          
25:        var marker = new google.maps.Marker({
26:          position: drangLL, 
27:          map: map,
28:          icon: "http://www.leancrew.com/all-this/images2010/littledrang.png",
29:          title:"What am I doing?"});
30:          
31:        google.maps.event.addListener(marker, 'click', function(){infoWindow.open(map, marker);});
32:      }
33:  
34:     </script>
35:     <style type="text/css">
36:       #map_canvas {
37:         height: 500px;
38:         width: 600px;
39:         display: block;
40:         margin-left: auto;
41:         margin-right: auto;
42:       }
43:     </style>
44:  </head>
45:  <body onload="initialize()">
46:    <p id="map_canvas"></p>
47:  </body>
48:  </html>

Line 7 imports the API, and Lines 8-31 use its methods to generate the map.

A variable for the position (latitude and longitude) of the marker is defined in Line 10, and one for the position of the initial center of the map is defined in Line 11. The options for the map itself, defined in Lines 12-17, set the initial zoom, center, and map type. I also decided to disable zooming with the scrollwheel, one of Google’s stupider ideas and a behavior that drives me crazy. It’s called a scrollwheel because it’s for scrolling. Google Maps is the only place I’ve seen a scrollwheel used for zooming.

The map is created in Line 19 using the options defined earlier. It’s placed in the item with ID map_canvas, which is the paragraph in Line 46.

The info window and its contents are defined in Lines 21 and 22. Pretty much self-explanatory, I think.

The custom marker is defined and added to the map in Lines 25-29. By default, the bottom center of the image is placed at the given position; this can be changed if there’s another point on your image that you want to be the “hot spot.” I didn’t bother.

Line 31 associates the info window with the marker and caused it to appear when the marker is clicked.

Lines 35-43 set up some simple CSS properties for the map.

Finally, the map code is invoked via the onLoad handler in the <body> tag on Line 45.

The result is a map with my head on the bike path in the Springbrook Prairie Preserve. You can do all the panning and zooming you’re used to in Google Maps. Clicking on my head pops up the info window.

The coding was pretty simple; the most time-consuming part was getting the latitude and longitude of the marker just right. If you zoom all the way in, you’ll see that my head is on a bridge over a small creek. To get it centered on the bridge I had to define the latitude and longitude in drangLL to 5 decimal points.


New Metra schedule for Simplenote

As I mentioned last November, I have plain text versions of the Metra commuter rail schedule between Chicago and Naperville (where I live) saved in Simplenote. Metra made some changes to the schedule this week, so I updated and decided to make the files available in a GitHub repository.

On the iPhone, the schedules look like this:

Because Simplenote uses Helvetica, a proportional font, and doesn’t have adjustable tab stops (even if it did, there’s no tab key on the iPhone for entering them), the columns don’t line up perfectly, but they look OK. Until the iPhone gets a decent monospaced font, this will have to do.

There are six schedule files in the repository:

  1. Eastbound Monday through Friday
  2. Eastbound Saturday
  3. Eastbound Sunday
  4. Westbound Monday through Friday
  5. Westbound Saturday
  6. Westbound Sunday

Also in the repository is a Python script, metra.py, that I wrote to reformat the schedule times from the way they’re presented on the Metra web page.

I copy the schedule times from the box and paste them into a text editor. Generally, I get something that looks like this,

  08:40  10:40  12:40  02:40  04:40  06:40  08:40  10:40  12:40
Naperville  09:37  11:37  01:37  03:37  05:37  07:37  09:37  11:37  01:37

which some extra stuff at the beginning of each line that needs to be deleted to make it look like this:

08:40  10:40  12:40  02:40  04:40  06:40  08:40  10:40  12:40
09:37  11:37  01:37  03:37  05:37  07:37  09:37  11:37  01:37

Then I copy those lines and execute

pbpaste | python metra.py | pbcopy

which puts the reformatted schedule,

  8:40a            9:37a
10:40a          11:37a
12:40p            1:37p
  2:40p            3:37p
  4:40p            5:37p
  6:40p            7:37p
  8:40p            9:37p
10:40p          11:37p
12:40a            1:37a

onto the clipboard. It looks weird here, but that’s because you’re seeing it in a monospaced font, not Helvetica. Finally, I paste the times into the Simplenote webapp and do some minor editing. This usually consists of

Here’s metra.py:

 1:  #!/usr/bin/python
 2:  
 3:  import re
 4:  import sys
 5:  
 6:  # Collect the two rows of data.
 7:  start = sys.stdin.readline().split()
 8:  stop = sys.stdin.readline().split()
 9:  
10:  # Change leading zeros to two spaces.
11:  start = [re.sub(r'^0', '  ', s) for s in start]
12:  stop = [re.sub(r'^0', '  ', s) for s in stop]
13:  
14:  # Print the data as two columns, using a simple heuristic for am/pm.
15:  ap = 'a'
16:  for i in range(0, len(start)):
17:    if start[i][0:2] == '12':
18:      if ap == 'a':
19:        ap = 'p'
20:      else:
21:        ap = 'a'
22:    
23:    print ' %s%s          %s%s' % (start[i], ap, stop[i], ap)

The AM/PM test is in Lines 17-21. This works pretty well for the Naperville-Chicago schedule and would probably be OK for other schedules, too. I thought about writing a routine that would work in every case, but it just wasn’t worth the effort. With this simple test, I only had to change a few as and ps.

If you’re a Simplenote user who lives in Naperville, the schedule files are pretty handy as is. If you’re a Simplenote user who lives in another town on the Chicago-Aurora line, or on another Metra line entirely, you can use metra.py to create your own Simplenote files.

If you’re not a Simplenote user, you should give it a try. It may be that Jesse Grosjean’s Dropbox-syncing suite of programs will end up working more smoothly, but until that happens, Simplenote is leading the pack.


Monte Carlo and the Two Child Problem

In the previous post about the Two Child Problem, we thought about how the probabilities would change under different rules. In this post, let’s write those rules into a program and see how the probabilities change in a Monte Carlo (no relation to Monty Hall) simulation.

To review, the Two Child Problem is this:

Mr. Smith has two children. At least one of them is a boy. What is the probability that both children are boys?

The answer depends on what rules we think the questioner is following. We’ll look at three cases:

  1. The questioner would never pose this problem if Mr. Smith had two daughters. The problem is restricted to families with at least one son and the question is always about the probability of two sons.
  2. The questioner isn’t restricted at all. He simply tells us about one child, chosen at random, in a two-child family and asks us if the other child is of the same sex.
  3. The questioner is biased toward boys. If there’s at least one boy in the family, that’s what he tells us; if the family has two girls, he tells us there’s at least one girl. In either case, he asks for the probability that the other child is of the same sex.

In Monte Carlo simulation, we use the computer to generate lots of random events and then combine the counts of those random events to estimate probabilities. For the Two Child Problem, we’ll simulate “families” by generating pairs of letters: G for girls, B for boys. The counts we need to keep track of are:

Note that n_{sons} + n_{2daughters} = n.

For the first case, we’re eliminating from consideration the families with two daughters, so the probability will be

\frac{n_{2sons}}{n_{sons}}

For the second case, we include all the families. Since we’re choosing the “revealed” child at random and asking if the other child is of the same sex, it’s equivalent to going through the list of all the families and picking out the boy-boy and girl-girl families. The probability will be

\frac{n_{2sons} + n_{2daughters}}{n}

The third case is a little trickier. Recognize first that if the family has any boys, the questioner will ask about boys and the probability will be calculated as in the first case. The questioner will ask about girls only if the family has two girls, so the probability of having two children of the same sex under that condition is 1. We use conditional probability to combine these situations:

\begin{eqnarray} P(\textrm{same sex}) & = & P(\textrm{same sex} | \textrm{boys} \ge 1)P(\textrm{boys} \ge 1) \\ & & + P(\textrm{same sex} | 2\;\textrm{girls}) P(2\;\textrm{girls}) \end{eqnarray}

With our variables, this becomes

\left(\frac{n_{2sons}}{n_{sons}}\right) \cdot \left(\frac{n_{sons}}{n}\right) + 1 \cdot \left(\frac{n_{2daughters}}{n}\right)

With a little algebra this formula reduces that of the second case. Which means that these two sets of rules are equivalent, even though they don’t seem to be.

Here’s a Python program that implements these ideas.

 1:  #!/usr/bin/python
 2:  
 3:  from __future__ import division
 4:  from random import choice
 5:  
 6:  n = 10000
 7:  sexes = ('G', 'B')
 8:  families = []
 9:  
10:  for i in range(n):
11:    families.append((choice(sexes), choice(sexes)))
12:  
13:  nsons = len([x for x in families if 'B' in x])
14:  n2sons = len([x for x in families if x == ('B', 'B')])
15:  n2daughters = len([x for x in families if x == ('G', 'G')])
16:  
17:  print '''If we restrict ourselves to families that have at least one son,
18:  the probability of having two sons is %d/%d = %5.3f''' % (n2sons, nsons, n2sons/nsons)
19:  
20:  print
21:  
22:  print '''If we choose the "revealed" child at random, the probability of having
23:  two children of the same sex is %d/%d = %5.3f''' % (n2sons+n2daughters, n, (n2sons+n2daughters)/n)
24:  
25:  print
26:  
27:  print '''If we "reveal" boys in every case except when there are two daughters,
28:  the probability of having two children of the same sex is
29:  (%d/%d)*(%d/%d) + 1*(%d/%d) = %5.3f''' % (n2sons, nsons, nsons, n, n2daughters, n, n2sons/nsons*nsons/n+n2daughters/n)

We use the choice function from the random module to generate 10,000 simulated families as a list of tuples. Lines 13-15 then filter the list according to certain criteria and count the number of families left. Line 17 onward does the calculations according to the formulas above and prints the results.

Here’s a sample of the output.

If we restrict ourselves to families that have at least one son,
the probability of having two sons is 2520/7535 = 0.334

If we choose the "revealed" child at random, the probability of having
two children of the same sex is 4985/10000 = 0.498

If we "reveal" boys in every case except when there are two daughters,
the probability of having two children of the same sex is
(2520/7535)*(7535/10000) + 1*(2465/10000) = 0.498

Based on the reasoning of the earlier post, the answers are what we expected. But thinking the problem through from a Monte Carlo perspective does give a different view of what the various rules mean.

The mantra of Richard Hamming’s book Numerical Methods for Scientists and Engineers is

The purpose of computing is insight, not numbers.

I think this exercise is a good illustration of that. We didn’t really have to write the Monte Carlo program; just working out how we were going to write it gave us an understanding of the similarities and differences in the three sets of rules.