Getting those Apollo photos from Flickr

I’m sure you’ve heard about this new Flickr account with about 13,000 scanned photos from the Apollo program. You may have scrolled through the albums and downloaded a few photos. And you may have thought it would be cool if you could just download all of them. If you have about 60 GB of disk space free, you can.

First, I should mention that Ryan M got there before me. I knew he was working on it because of this Twitter exchange between him and Stephen Hackett:

If anyone has a clever way to scrape these, let me know.

Not that I would try it.


Stephen Hackett (@ismh) Oct 2 2015 8:50 PM

I didn’t know until today that he’d posted the code he used as a Gist, which would’ve saved me some of the time I spent writing my my own scripts. By the time I saw his code, I’d already done my own clumsy job, but I have shamelessly stolen some of his good ideas and incorporated them into my scripts to make me look smarter in this post.

Ryan and I both access the Flickr API through Sybren Stüvel’s flickrapi Python library, which I’ve used before for several little utility scripts and snippets. And we both go through the same steps:

The fundamental difference between our approaches is this: Ryan does both of these steps in a single script, whereas I use one script for the first step and another for the second. I save the information gathered in the first step in a file that the second script reads.

Breaking up the functionality this way didn’t help me, but it will help you, because I’m making the file with all the information gathered through the Flickr API available to download. All you’ll need to download the Apollo photos are a standard Python distribution (the one that comes with OS X will do), the information file, and my second script. No need to install the flickrapi library, and more important, no need to wait through all the API calls. For whatever reason, Flickr’s responses are incredibly slow. Far and away, most of the time is spent gathering the information from the API—the downloads themselves are relatively quick. Having the information file will allow you to skip the most time-consuming part of the process.

Of course, I’m still going to show you both of my scripts. Here’s apollo-photo-list, the one that gathers the information:

 1:  #!/usr/bin/env python
 3:  from flickrapi import FlickrAPI
 4:  import json
 6:  # Flickr parameters
 7:  fuser = 'username'
 8:  key = 'apikey'
 9:  secret = 'apisecret'
10:  nasa = '136485307@N06'
12:  def getOriginalURL(id):
13:    s =
14:    for i in s['sizes']['size']:
15:      if i['label'] == 'Original':
16:        return i['source']
18:  flickr = FlickrAPI(key, secret, format='parsed-json')
19:  psets = flickr.photosets.getList(user_id=nasa)
21:  for set in psets['photosets']['photoset']:
22:    print '"{}": '.format(set['title']['_content'].replace('/', '-'))
23:    photos = []
24:    pList = flickr.photosets.getPhotos(photoset_id=set['id'])
25:    for p in pList['photoset']['photo']:
26:      pDict = {}
27:      pDict['photoID'] = p['id']
28:      pDict['photoTitle'] = p['title']
29:      pDict['photoURL'] = getOriginalURL(p['id'])
30:      photos.append(pDict)
31:    print json.dumps(photos)
32:    print

The fuser, key, and secret values on Lines 7–9 come from registering with Flickr. I got the nasa user ID on Line 10 by interactively making API queries on one of the photos in the gallery. The rest of the script is pretty standard stuff. What the user-facing part of Flickr calls “albums,” the API calls “photosets,” so that’s why you see that term so often in the code.

I will say that getting the information as parsed JSON (see Line 18) instead of in the default ElementTree format makes the script much easier to read and write. This is what I stole from Ryan. In my first go-around, I used the default format, and it was a big mistake. The Flickr data formats are deeply nested, and it’s difficult and time-consuming to work out the structure when it’s in ElementTree format. Printing an ElementTree value usually results in an opaque response like this:

<Element rsp at 0x105d88560>

Printing a parsed JSON value, on the other hand, shows you everything. It tells you which items are dictionaries and which are lists, and it gives you the keys for all the dictionaries, no matter how deeply nested they are. I would’ve saved a lot of time if I’d used JSON from the start.

The result of running apollo-photo-list, which took a few hours to complete, is a giant text file that looks almost like a JSON file. A little editing in BBEdit turned it into a legitimate JSON file that can be read and parsed by the second script.

You may be wondering why apollo-photo-list didn’t just create a large data structure and dump it out as a JSON file at the end. Why bother with print statements within the main loop? The reason is that sometimes the connection to Flickr craps out and the program stops because of a response error. Rather than lose all the information when that happens, I have the script print out what it collects as it goes. If there’s an interruption, I can see how far it’s gotten and restart the script from that point. For example, if the connection fails after printing out information on 15 albums, I can change Line 21 to

21:  for set in psets['photosets']['photoset'][15:]:

This will cause it to skip over the first 15 albums and start printing again with the 16th. I feel certain that if I hadn’t written the script this way I’d still be waiting for it to run once with out a hiccup.

OK, when the first script is finally finished and I’ve edited the result into proper JSON format, I have a 1.8 MB text file called apollo.json. You can download a 300 kB zipped version of this,, from here and expand it on your local machine. It’s going to be the input file for the next script.

The script that actually does the downloading is get-apollo-photos:

 1:  #!/usr/bin/env python -u
 3:  import urllib2
 4:  import os
 5:  import sys
 6:  import json
 8:  # Make a holder directory for photos on the user's Desktop.
 9:  apolloDir = os.environ['HOME'] + '/Desktop/apollo'
10:  os.mkdir(apolloDir)
12:  # Read in the JSON from the file given on the command line
13:  sets = json.load(open(sys.argv[1]))
15:  # Make a subfolder for each set and download all the photos.
16:  for k in sets.keys()[4:6]:
17:    sys.stdout.write("Downloading set {}".format(k))
18:    subdir = "{}/{}".format(apolloDir, k)
19:    os.mkdir(subdir)
20:    for p in sets[k]:
21:      name = "{}/{}.jpg".format(subdir, p['photoTitle'])
22:      url = p['photoURL']
23:      image = urllib2.urlopen(url).read()
24:      imgFile = open(name, 'w')
25:      imgFile.write(image)
26:      imgFile.close()
27:      sys.stdout.write('.')
28:    sys.stdout.write('\n')

Assuming you have both it and apollo.json on your Desktop, you run it like this from the command line:

cd ~/Desktop
./get-apollo-photos apollo.json

It will first create an apollo folder on your Desktop (Lines 9–10) and then fill it with subfolders and photos. The subfolders will have the same names as the Flickr albums,1 and the photo files will have same names as the Flickr photos. The photos you end up with are the highest resolution versions because why bother with anything else?

To let you know that things are working, a message is printed whenever the script moves to a new album (Line 17) and a dot is printed whenever a photo has been downloaded (Line 27). Python would normally buffer this kind of output and display it only after the script is finished—which would defeat the purpose of the messages. The -u switch at the end of the shebang line (Line 1) tells Python to run unbuffered and display the messages as the script runs.

Many of the photos are out of focus, poorly framed, or just plain dull. But then there are the good ones, and they are really good. I suspect most people will like the images of the Earth or the Moon or the various pieces of equipment. I do, too, but my favorite is this one, which I’ve seen many times before:

Neil Armstrong

This is Neil Armstrong back in the LEM after he and Buzz Aldrin have taken their walk on the Moon. The mixture of exhaustion, elation, pride, amazement, and wonder on his face is just delightful.

  1. Except that slashes in the album names (Apollo 11 Magazine 37/R) will be replaced by hyphens. Slashes in folder names are a terrible idea on a Unix system. 

Wobbly words in Tweetbot

If you’re using Tweetbot 4 and have the font set to San Francisco, you may have noticed that text in the Compose screen sometimes reformats itself in ways that don’t make sense. Before last night, I thought I was imagining this, but it happened so often as I was tweeting during the Cubs wildcard win over the Pirates1 I knew it was real.

We’re all used to certain types of reformatting as we type. When we get near the end of a line and start typing a long word, that word will jump to the next line when it gets long enough to hit the right edge of the text field and force a word wrap. What I’m talking about in Tweetbot is seeing words jump back and forth from line to line as I add text after them. Let me show you a couple of examples.

Here’s me typing a reply to noted Yankee fan John Gruber:

Tweetbot text jump 1

The last word, the, starts out on the third line of the reply, but when I type the C it suddenly jumps back to the second line. Later on, a similar thing happened:

Tweetbot text jump 2

When I typed the s onto the end of Cardinals, the the jumped back down to the third line again.

This all seemed very weird to me, so I asked (on Twitter, of course) if anyone knew why. Paul Haddad of Tapbots—a pretty authoritative source, I’d say—answered:

@drdrang the counter is using San Francisco and it defaults to proportional numbers.
Paul Haddad (@tapbot_paul) Oct 7 2015 10:29 PM

The right margin of the composition field is defined by the width of the countdown field in the upper right corner. When the counter gets narrower, so does the margin, and more text can fit on every line of the tweet. But in both of my examples, the count is two digits long, which suggests an unchanging width.

That’s the point of Paul’s tweet. While most fonts—even fonts that are otherwise proportional—use monospaced characters for the numerals, San Francisco is different. It has both monospaced and proportional numerals and it’s the proportional ones that happen to be used by default. So the margin width changes a little bit every time the countdown value changes, and that’s what’s causing the text to jump around.

It probably doesn’t help that I use a relatively large font in Tweetbot. That magnifies the small differences in the widths of the numerals and means only a few words can fit on every line of the composition field. These two things will tend to cause more text jumps than if I were using a smaller font. But I need to see what I’m typing, so reducing the font size isn’t in the cards.

According to Paul, the next revision to Tweetbot will use monospaced numerals in the countdown, which will eliminate the jumpiness except when the number of digits changes. This’ll be much better, although personally I’d prefer either a right margin with a fixed width large enough to accommodate a three-digit count or to have the counter moved from the right margin to the otherwise unused space under the user’s avatar.

As tweets about this situation went back and forth between me, Paul, and the Egg McMuffinless Casey Liss, I got a tweet from Ian Bjorhovde directing me to this portion of the WWDC session on San Francisco, which talks about how to use the proportional and monospaced numerals.2 It’s an interesting subtopic and lasts only about three minutes. Well worth your time.

David Loehr mentioned that he uses Avenir, the other font that Tweetbot allows. It doesn’t have the proportional numeral problem, but I find it a little too thin and “gray” for comfortable reading. I’ll stick with San Francisco and hope the Tweetbot revision makes it through Apple’s approval process quickly. The revision is also going to fix the chart labeling bug I talked about a few days ago.

  1. I mention this as a way of preserving a record of this rare Cub playoff appearance and victory for future skeptical generations. 

  2. I always pay attention when Ian tweets at me. He’s a bright guy in his own right, and he’s also the son of a pretty famous structural engineer whose publications I’ve used countless times. 

Better charts

Occasionally, I write a post about making charts. Sometimes these posts are rants about poor practice or my thoughts on good practice, but usually they’re more descriptive than prescriptive. I write about how I make charts with the expectation that my beliefs and tastes will come through and that I might have some small influence in stemming the tide of bad charts.

So far, I’ve had about as much effect as King Canute. The rising popularity of data journalism has brought with it some truly dumb charts. Reporters with no training or experience in communicating graphically are being given a crash course in some JavaScript plotting library and told to have at it. The result is a bunch of charts that seem OK at a casual glance but not when you look again. Here’s a political example from Morning Consult:

Republican polls

A layout that might work with a few candidates is a mess with fifteen. The legend overwhelms the chart, and there’s no rhyme or reason to the order of the names. The colors are too close to one another. The markers, which could be used to distinguish the candidates, are the same for each.1 The labels for the horizontal axis are stupidly formatted over two lines. Worst of all, the polls are equally spaced horizontally even though the times between them vary from 5 to 14 days.

You might say this is nitpicking and that the important thing is that the chart communicates who’s winning and who’s moving up or down in the polls. You could also argue that there’s no reason to wrote an article with correct tenses or to gets the verbs to agree with the subjects. Them things isn’t important to communication, is they?2

Into this mess steps Kieran Healy, associate professor of sociology at a basketball university down in North Carolina (no, the other one). Kieran is perhaps best known on the internet for a data visualization post that has, unfortunately, become something of an evergreen. His charts are always tasteful and informative because he’s a smart guy and he’s thought a lot about how to communicate through plots.

This semester—half-semester, actually—Kieran’s going to impart his wisdom to grad students in his department through a special topics course. He’s starting out right, by demonstrating the evils of Excel’s overly cute 3D column charts:

3D Excel column chart

Image from Kieran Healy.

A handful of Duke sociology students won’t fix the world’s data visualization problems, but Kieran is making his class notes available on GitHub, so there’s hope that others will find them and learn.

  1. Distinguishing the data series is somewhat easier in the actual chart (as opposed to this screenshot) because you can click or tap the names in the legend and see the corresponding series light up. Of course, you have to know or guess that this is possible, otherwise you’d never try it. 

  2. Given my penchant for leaving typos and editing artifacts in my posts, this is a very dangerous paragraph. 


This post from last week by Kirk McElhearn and this followup today by Michael Tsai reminded me that Safari 9 has a new feature in the Develop menu: Responsive Design Mode. Unfortunately, it’s not as smart as I’d hoped it would be. Or maybe I’m not.

The idea behind RDM is to let web designers see what a site looks like on smaller (Apple) screens without continually resizing windows or reloading pages on other devices. You can see how it looks by just clicking a button associated with the device of interest. And, unlike pinned sites, this feature is available on Safari 9 even if you’re still running Yosemite.

I’m not a web designer, but I am responsible for this site, and I do occasionally fiddle with its layout. I (finally!) made a mobile layout back in June, and it would have been much easier if I’d been able to see the results of my CSS changes immediately on my Mac as I made them.

But I soon found that RDM doesn’t really emulate smaller devices. Here’s what my site looks like on my iPhone 5s:

iPhone 5s view

And here’s what it looks like in Safari RDM:

Safari Responsive Design Mode

Not exactly a faithful representation. And it’s no better in landscape.

iPhone 5s landscape view

Safari RDM landscape

I assume this is at least partially because I don’t really know what I’m doing. I have two CSS files for the site: the original style.css that’s meant for wider screens and mobile.css that’s meant for narrower screens. The file used is controlled by these three lines in the <head>:

<link rel="stylesheet" type="text/css" media="all and (max-device-width:480px) and (orientation:portrait)" href="resources/mobile.css" />
<link rel="stylesheet" type="text/css" media="all and (max-device-width:480px) and (orientation:landscape)" href="resources/style.css" />
<link rel="stylesheet" type="text/css" media="all and (min-device-width:480px)" href="resources/style.css" />

So I’m really choosing the style file on the basis of the device’s width, not the view’s width. This is a simple solution, and it works, even though it isn’t fully responsive. If you’re reading this on a phone and rotate it, the layout will change; but if you’re reading this on a notebook or desktop computer and make the browser window narrow, the layout won’t change.

I don’t really want to mess with the site’s layout again, and I certainly wouldn’t do so just to get it to work in Safari’s RDM, but there are two things that’ll probably force me into it: iOS 9’s Slide Over and Split View on iPads. I’m pretty sure those views don’t change the device-width, and if I want the site to display its narrow layout when it’s in those modes, I’ll have to make it truly responsive.

Eventually. I’m not particularly responsive, either.