Better option parsing in Python (maybe)

I’ve bitched in the past about Python’s option parsing modules. It’s not just the modules themselves I dislike, it’s the Python Standard Library’s back-and-forth support for different modules. Perhaps it’s unsurprising that the most attractive option parsing module comes from outside the Standard Library.

Option parsing modules are collections of functions or methods that help you write command-line programs that conform to the usual Unix pattern: the command itself, followed by one or more optional switches designated by leading dashes, followed by one or more arguments. The ncdc script I wrote about a couple of days ago is an example. The usage message is

usage: ncdc [options] date
Return NCDC weather report for the given date.
Options:
  -a     : return ASCII file instead of HTML
  -d     : return daily summary instead of hourly reports
  -m     : entire month, not just a single day
  -p     : precipitation (hourly only, overrides -d)
  -s STA : the station abbreviation
           O'Hare     ORD (default)
           Midway     MDW
           Palwaukee  PWK
           Aurora     ARR
           Waukegan   UGN
           Lewis      LOT
           DuPage     DPA
           Lansing    IGQ
           Joliet     JOT
           Kankakee   IKK
           Gary       GYY
  -h     : print this message

which lays out all of the options for changing the behavior of the script.

Instead of having to write your own code to tediously work through the sys.argv list and account for every permutation of the switches (including combinations like -adm), you import an option parsing module and use its functions to do that work for you. The problem is that in Python 2.7, there are three such modules.

There’s getopt, which is the simplest and least powerful. It’s based on the C and shell functions of the same name and provides a sort of bare-bones approach. I tend to use getopt in my scripts because it’s the easiest to remember, it allows me to write my usage message the way I want, and it doesn’t force me into a coding style I find convoluted. The downside is that parsing with getopt often takes up many lines, even when you don’t have that many options. Here’s an example of getopt from the Python docs:

python:
import getopt, sys

def main():
    try:
        opts, args = getopt.getopt(sys.argv[1:], "ho:v", ["help", "output="])
    except getopt.GetoptError as err:
        # print help information and exit:
        print str(err) # will print something like "option -a not recognized"
        usage()
        sys.exit(2)
    output = None
    verbose = False
    for o, a in opts:
        if o == "-v":
            verbose = True
        elif o in ("-h", "--help"):
            usage()
            sys.exit()
        elif o in ("-o", "--output"):
            output = a
        else:
            assert False, "unhandled option"

Then there’s optparse, which was the hot thing several years ago. Back in the Python 2.6 days, optparse was the recommended library, and I wrote a few scripts with it, despite finding it difficult. Yes, it required fewer lines of code that getopt to do the parsing, but the lines tended to be long, and I had a hard time remembering all the arguments to the add_option command. Here’s an example of optparse from the docs:

python:
from optparse import OptionParser
[...]
parser = OptionParser()
parser.add_option("-f", "--file", dest="filename",
                  help="write report to FILE", metavar="FILE")
parser.add_option("-q", "--quiet",
                  action="store_false", dest="verbose", default=True,
                  help="don't print status messages to stdout")

(options, args) = parser.parse_args()

What’s supposed to be cool about optparse is how it automatically puts the help arguments together into a usage message. It is cool, but I could never wrap my mind around all that action, dest, metavar stuff.

In some ways, I’m glad I didn’t spend a lot of time getting comfortable with optparse, because with Python 2.7, it was deprecated. Now we’re all supposed to use argparse. But to me, argparse feels an awful lot like optparse. Here’s an example from the docs:

python:
import argparse

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
                   help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
                   const=sum, default=max,
                   help='sum the integers (default: find the max)')

args = parser.parse_args()
print args.accumulate(args.integers)

You will not be surprised that I found this no easier to use than optparse. You will also not be surprised that I was annoyed to learn that the optparse code I felt compelled to write in 2.6 wasn’t going to survive another Python version bump. This is when I decided to go back to getopt—it’s too traditional to get deprecated.

This morning, though, my RSS feed had this post from Rob Wells, describing how option parsing Python programs in general—and my ncdc script in particular—could be streamlined by using Vladimir Keleshev’s docopt module.

I won’t recapitulate Rob’s post; you should just go read it. But I will say that what’s very appealing about docopt is that it turns optparse and argparse on their heads. Instead of writing code that generates a usage message, you write a usage message that docopt turns into code. Well, not code exactly, but a dictionary of all the switches and arguments that can be easily incorporated into code. For example, if I had used docopt in my ncdc code, and run the script like this,

ncdc -da -s MDW 12/25/2014

the parsing phase would end with this dictionary:

python:
{'-a': True,
 '-d': True,
 '-h': False,
 '-m': False,
 '-p': False,
 '-s': 'MDW',
 'DATE': '12/25/2014'}

You can imagine how straightforward it would be to fit this dictionary into the subsequent logic of the script.

The usefulness of docopt depends on how easy it is to follow its rules for writing the usage message. Although the rules are relatively flexible, you can’t just format your usage message any way you please. Fortunately, the rules aren’t too different from my usual practice (see how little Rob had to change to get a docopt-compliant message), so I think I could use it without the kind of mental gymnastics I had to use to work with optparse.


Putting the paper in TaskPaper

Lately my RSS and Twitter streams have been loaded with talk of to-do apps and services. When I haven’t fallen off the list-making wagon (which happens far too often), I still use TaskPaper to make my lists and this old set of scripts and config files to print them out onto paper, which I then insert into a notebook I keep open on my desk. Unlike to-do apps, which tend to get hidden behind other windows on my computer and superseded by other apps on my phone, the paper to-do list is always in front of me, reminding me of the tasks I think I’ll never forget but always do. Also, I find the physical act of checking a box with a pen or pencil more satisfying than tapping or clicking.

Daily planner with TaskPaper list

I wrote about this setup five years ago, and I’ve changed only a few things since then:

The various scripts and config files are in this GitHub repository, which I just updated today to reflect what I’ve had running locally for some time. Sometimes I fall off the GitHub wagon, too. The system relies on a Markdown processor (I use Fletcher Penney’s MultiMarkdown), Jan Kärrman’s venerable html2ps script, and the ps2pdf utility that’s bundled with Ghostscript.

Because the TaskPaper format is very close to Markdown, it’s easy to make a few changes to turn it into a valid Markdown document. From there, MultiMarkdown turns it into HTML, html2ps turns that into PostScript, and ps2pdf turns that into a PDF. Through the magic of Unix pipelines, it’s easier than it sounds. The Keyboard Maestro macro that starts it all off looks like this:

TaskPaper printing macro for Keyboard Maestro

During the course of a normal workday, the notebook sits open on my desk, I use it mostly for taking notes when I’m on the phone, but almost anything can go in it. Every morning I start on a new recto page and insert the to-list in front of it, covering the last verso page of the previous day. If you’re wondering where my Daily Notes paper comes from, it’s here.

As the day goes on, I tick off the tasks I complete and write in new ones that arise. At the end of the day, I switch to TaskPaper (which is usually buried behind Calendar and Messages) and bring it up to date, ready to print out the list for the next day. To me, this is the best of both worlds: the paper is always visible in front of me, but I don’t have to rewrite undone tasks from the previous day because they’re saved on the computer. Also, a list printed from a computer always looks so neat and official.

Screens are nice, but most of us don’t have anywhere near as much screen space as we have desk space, which is why paper still has its place.


Weather history without the web

Weather plays an important role in many failures of engineering systems. Snow can overload a roof, ice can build up on transmission lines and towers, rain can infiltrate electrical boxes, and low temperatures can make oils viscous and difficult to pump. In trying to figure out why a failure occurred, I often find myself collecting weather data at the website of the National Centers for Environmental Information. The information kept by the NCEI is both broad and deep, but the web-based interface used to access it is cryptic and clumsy to use, so I wrote a script to speed up the process. The script isn’t as comprehensive as the web interface, but its much faster at getting to the information I usually need.

My first complaint about the NCEI web site is that the names for its various data products are long and obscure. The name of the information you’re looking for is seldom obvious, which makes it easy to forget from one visit to the next. Also, navigating the the many pages that are supposed to lead you to the data set you want is tortuous, and it’s easy to go off on the wrong side path. My most useful tool in cutting through the NCEI nomenclature fog is this bookmark, which takes me directly to the access page for the data I usually want.

NCDC starting page

From this point on, the path to the data I want is no longer obscure, just tedious. Because there are so many weather stations across the country, organizing them in a single list would lead to endless scrolling, so the website has you drill down:

I don’t blame NCEI for this five-step process—they are, after all, serving a national audience—but because I’m almost always looking for data from just a handful of stations here in the Chicago area, I wanted a faster way to zip through these steps. So I studied the source code of the NCEI pages, figured out the names and input types of all the form elements, and wrote a script to replicate the drill-down process in a single step.

I call the script ncdc, because the NCEI was until recently known as the National Climatic Data Center, and I still think of it as the NCDC.1 The script is set up with defaults that match my most common interests, so I usually need to type very little to get the data I want. For example,

ncdc 12/25/2014 > ord.html

will get me an HTML-formatted document with the hourly observation data at O’Hare for last Christmas.

Hourly snapshot page

If I want an ASCII-formatted file of that same data (which is easier to import into an analysis program), I use

ncdc -a 12/25/2014 > ord.txt

which gives me a CSV file. Here are examples of the files the script can generate:

Because ncdc works like any other Unix command, it’s easy to incorporate into pipelines and other scripts.

The usage message (available via ncdc -h) gives a brief rundown of what it can do:

usage: ncdc [options] date
Return NCDC weather report for the given date.
Options:
  -a     : return ASCII file instead of HTML
  -d     : return daily summary instead of hourly reports
  -m     : entire month, not just a single day
  -p     : precipitation (hourly only, overrides -d)
  -s STA : the station abbreviation
           O'Hare     ORD (default)
           Midway     MDW
           Palwaukee  PWK
           Aurora     ARR
           Waukegan   UGN
           Lewis      LOT
           DuPage     DPA
           Lansing    IGQ
           Joliet     JOT
           Kankakee   IKK
           Gary       GYY
  -h     : print this message

Here’s the source code:

python:
  1:  #!/usr/bin/env python
  2:  
  3:  import getopt
  4:  import requests
  5:  import sys
  6:  from dateutil.parser import parse
  7:  
  8:  help = '''usage: ncdc [options] date
  9:  Return NCDC weather report for the given date.
 10:  Options:
 11:    -a     : return ASCII file instead of HTML
 12:    -d     : return daily summary instead of hourly reports
 13:    -m     : entire month, not just a single day
 14:    -p     : precipitation (hourly only, overrides -d)
 15:    -s STA : the station abbreviation
 16:             O'Hare     ORD (default)
 17:             Midway     MDW
 18:             Palwaukee  PWK
 19:             Aurora     ARR
 20:             Waukegan   UGN
 21:             Lewis      LOT
 22:             DuPage     DPA
 23:             Lansing    IGQ
 24:             Joliet     JOT
 25:             Kankakee   IKK
 26:             Gary       GYY
 27:    -h     : print this message
 28:  '''
 29:  
 30:  # The NCDC location.
 31:  url = 'http://www.ncdc.noaa.gov/qclcd/QCLCD'
 32:  
 33:  # Dictionary of stations.           
 34:  stations = {'MDW': '14819',
 35:              'ORD': '94846',
 36:              'PWK': '04838',
 37:              'ARR': '04808',
 38:              'UGN': '14880',
 39:              'LOT': '04831',
 40:              'DPA': '94892',
 41:              'IGQ': '04879',
 42:              'JOT': '14834',
 43:              'IKK': '04880',
 44:              'GYY': '04807'}
 45:  
 46:  # Dictionary of report types. The keys are a tuple of (ascii, daily, precip).
 47:  reports = {(False, False, False): 'LCD Hourly Obs (10A)',
 48:             (True, False, False):  'ASCII Download (Hourly Obs.) (10A)',
 49:             (False, True, False):  'LCD Daily Summary (10B)',
 50:             (True, True, False):   'ASCII Download (Daily Summ.) (10B)',
 51:             (False, False, True):  'LCD Hourly Precip',
 52:             (True, False, True):   'ASCII Download (Hourly Precip.)'}
 53:  
 54:  # Handle options.
 55:  sta = 'ORD'
 56:  ascii = False
 57:  daily = False
 58:  month = False
 59:  precip = False
 60:  try:
 61:    optlist, args = getopt.getopt(sys.argv[1:], 'adhmps:')
 62:  except getopt.GetoptError as err:
 63:    sys.stderr.write(str(err) + '\n')
 64:    sys.stderr.write(help)
 65:    sys.exit(2)
 66:  for o, a in optlist:
 67:    if o == '-h':
 68:      sys.stderr.write(help)
 69:      sys.exit()
 70:    elif o == '-a':
 71:      ascii = True
 72:    elif o == '-d':
 73:      daily = True
 74:    elif o == '-m':
 75:      month = True
 76:    elif o == '-p':
 77:      precip = True
 78:    elif o == '-s':
 79:      sta = a.upper()
 80:    else:
 81:      sys.stderr.write(help)
 82:      sys.exit(2)
 83:  if precip:
 84:    daily = False
 85:  
 86:  # The date is the first argument. All other arguments will be ignored.
 87:  d = parse(args[0], dayfirst=False)
 88:  
 89:  # Assemble the payload for the POST.
 90:  # Start with the parts that don't change.
 91:  payload = {'stnid': 'n/a', 'prior': 'N', 'version': 'VER2'}
 92:  
 93:  # Add the day.
 94:  if month:
 95:    payload['reqday'] = 'E'
 96:  else:
 97:    payload['reqday'] = '{:02d}'.format(d.day)
 98:  
 99:  # Add the station id/year/month string.
100:  try:
101:    payload['yearid'] = '{}{:4d}{:02d}'.format(stations[sta], d.year, d.month)
102:    payload['VARVALUE'] = payload['yearid']
103:  except KeyError:
104:    sys.stderr.write('No such station!\n')
105:    sys.stderr.write(help)
106:    sys.exit(2)
107:  
108:  # Add the report type.
109:  payload['which'] = reports[(ascii, daily, precip)]
110:  
111:  # Go get the report and print it.
112:  r = requests.post(url, payload)
113:  print r.text

As you can see from Line 4, ncdc uses Kenneth Reitz’s excellent requests module, which you’ll have to install yourself because it’s not in the Python Standard Library, nor does Apple provide it with OS X. Line 6 imports the non-standard dateutil module, but you don’t have to worry about installing it because Apple provides it.

Lines 30–52 set up the global variables that are used to assemble the HTTP request later in the script. If you wanted to customize ncdc for your own use, the dictionary of stations in Lines 34–44 is what you’d want to edit. The keys are the three-letter airport codes, and the values are the five-digit station IDs. Most of these can be found in one of the files listed on this NCEI page, but I’ve found that the best way to get the codes and IDs is from the station list on the second drill-down page. They’re in parentheses at the end of each station in the list.

Station list with codes

Lines 54–84 handle the options you can give to ncdc. As is my habit, I’m using the simple getopt module because I don’t trust the more advanced modules to remain in the Standard Library. Also, I find their “simplified” way of generating a usage message harder to understand than just writing my own.

Lines 86–109 assemble all the information needed to pass to the NCEI server in a POST request. Lines 108–113 send the request and print the response.

Although this script is longer than most I post here, it didn’t take that long to write. The source code of the NCEI web pages is straightforward, and it was easy to find the appropriate <form> element and pick out all the required inputs.


  1. You may have noticed that the URLs of the NCEI pages still carry the NCDC legacy. 


More calendar event automation

With the swicsfix script in place, the question becomes how best to use it. It’s written to be run from the command line, like this:

swicsfix MDW_to_BNA.ics

While that works perfectly well, and is how I ran it while writing and debugging, it isn’t the most efficient way to transform the .ics file because when I’m scheduling flights I don’t normally have a Terminal window open with the working directory set to my Downloads folder.

I can set up an Automator service to run swicsfix on the selected file(s). This allows me to right-click on a newly downloaded .ics file in the Finder and transform it via an item in the Services submenu.

SWA fix service

This is probably a little more efficient than the command line route, but it still forces me to open the Downloads folder and navigate a long menu.

To me, the best answer is Hazel, which can watch the Downloads folder, notice that there’s a new .ics file from Southwest, and take appropriate action. With the following rule, there’s no need for me to do anything except download the .ics files when I make my reservation and then confirm that I want the events added to my calendar.

Hazel rule for SWA flight events

This rule both runs swicsfix on the .ics file and then opens it for addition in my default calendar program (which happens to be Apple’s Calendar right now, but if I change back to BusyCal, this rule will change with me).

In theory, these same things could be done by defining a Folder Action in Automator. I didn’t have any luck when I tried creating a few Folder Actions years ago, so I stopped trying. I assume I could get them to work if I spent a little more time learning about them, but Hazel just seems easier.