Python doesn't play nicely with others

Today I want to talk about a script that appeared in a post on Mac OS X Hints a couple of days ago. What the script does isn’t of great interest to me, but the technique the programmer used to combine Python and AppleScript is.

The purpose of the script is to take a bunch of individual text files and convert them into notes in Apple Mail. This may sound nuts, but the programmer (who is anonymous) has an explanation:

I have been using plain text files (.txt) for storing my notes since the arrival of Notational Velocity a while ago. When I saw that Mountain Lion will have a dedicated Notes app, I decided it would be great to switch over to Mail’s notes system in preparation for the new OS.

Here’s the script. I want to go through it as I would one of my own scripts, describing what it does with particular emphasis on the section in which it executes an AppleScript from within Python.

python:
 1:  import sys
 2:  import os
 3:  print sys.argv
 4:  for filename in sys.argv[1:]:
 5:      print filename
 6:      text = open(filename,'r').readlines()
 7:      title = os.path.splitext(os.path.basename(filename))[0]
 8:      text = title +'\n'+' '.join(text)
 9:  
10:      # Store file contents in clipboard
11:      outf = os.popen("pbcopy", "w")
12:      outf.write(text)
13:      outf.close()
14:  
15:      cmd = """osascript<< END
16:    tell application "Mail"
17:      activate
18:    end tell
19:  
20:    tell application "System Events"
21:      tell process "Mail"
22:        click the menu item "New Note" of the menu "File" of menu bar 1
23:        click the menu item "Paste" of the menu "Edit" of menu bar 1
24:      end tell
25:    end tell
26:      END"""
27:  
28:      os.system(cmd)

We see in Line 4 that the script is intended to be given the list of files to be converted on the command line, probably through a *.txt argument.

Line 5 prints the name of the file currently being processed to let the user know how it’s progressing. Lines 6-8 create a string that consists of

  1. The name of the file.
  2. A newline.
  3. The contents of the file, slightly altered.

I understand why the filename gets put on the first line. The iOS Notes app—and, presumably, the upcoming Mountain Lion Notes app—uses the first line of a note as its title. It seems appropriate to use the filename the same way. I don’t understand, however, why the lines of the file are split into a list by the readlines() in Line 6 and then joined together with spaces in Line 8. The result will be a space at the beginning of every line in the file except the first. Perhaps the programmer meant to write Line 8 as

python:
 8:      text = title + '\n' + ''.join(text)

which wouldn’t add the spaces. Even better would be

python:
 6:      text = open(filename,'r').read()
 7:      title = os.path.splitext(os.path.basename(filename))[0]
 8:      text = title + '\n' + text

which eliminates the split-and-rejoin rigamarole entirely. The only reason I can think of for wanting to break the file up into lines is if the file might be extremely large. Given that these are coming from Notational Velocity, that seems quite unlikely. The whole point of using NV, or its cousin nvALT, is to have many small files with just one piece of information.

Lines 11-13 put the text just created on the clipboard through the Mac’s pbcopy command. But it does it through the os.popen method, which has been deprecated since Python 2.6. The preferred style nowadays is to use the subprocess module, which could be done this way:

python:
11:      subprocess.Popen('pbcopy', stdin=subprocess.PIPE).communicate(text)

There are other ways to do this while still using the subprocess library, and it’s never been clear to me which is preferred, a complaint we’ll return to later.

Lines 15-26 define, as a multiline string, a shell command that defines and runs, via osascript, an AppleScript.1 Line 28 then executes that command via the os.system method. Unlike the os.popen method, os.system has not been deprecated, but it’s not recommended. According to the Python documentation for os.system,

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.

which is not exactly a ringing endorsement for what the programmer of this script has done.

Before discussing other ways the AppleScript could have been defined and executed, look again at Lines 15-26 and marvel at the complexity of the quoting. We have quoted strings, like “System Events” in the AppleScript, which are wrapped in a shell here-document, which is then wrapped in Python triple quotes. It doesn’t look remarkable because the programmer has carefully used quoting constructs that don’t interfere with one another. The blandness of these lines is a testament to the thought that went into writing them.

Still, Python doesn’t want us to use os.system, so how should these lines have been written?2 One way would be to use the subprocess.Popen method:

python:
cmd = '''
tell application "Mail" to activate
tell application "System Events"
  tell process "Mail"
    click the menu item "New Note" of the menu "File" of menu bar 1
    click the menu item "Paste" of the menu "Edit" of menu bar 1
  end tell
end tell
'''

subprocess.Popen('osascript', stdin=subprocess.PIPE).communicate(cmd)

This certainly works, but the communicate method is rather clumsy.3 The documentation for subprocess says

The recommended approach to invoking subprocesses is to use the following convenience functions for all use cases they can handle. For more advanced use cases, the underlying Popen interface can be used directly.

The convenience function most suited to this application is subprocess.call, but it’s not especially easy to use when the command being called needs to read from STDIN.

Here’s one way to use subprocess.call:

python:
tf = tempfile.TemporaryFile()
tf.write('''
tell application "Mail" to activate
tell application "System Events"
  tell process "Mail"
    click the menu item "New Note" of the menu "File" of menu bar 1
    click the menu item "Paste" of the menu "Edit" of menu bar 1
  end tell
end tell''')
tf.seek(0)
subprocess.call('osascript', stdin=tf)

The subprocess.call line is simpler than the earlier subprocess.Popen line, but to get that simplicity we had to mess around with temporary files through the tempfile library. (Strictly speaking, we didn’t have to use tempfile, but that’s the safest way to create and dispose of temporary files within a Python script.)

The problem is the stdin argument to subprocess.call requires a file—it won’t take a string. It would be so much nicer if we could just write

python:
cmd = '''
tell application "Mail" to activate
tell application "System Events"
  tell process "Mail"
    click the menu item "New Note" of the menu "File" of menu bar 1
    click the menu item "Paste" of the menu "Edit" of menu bar 1
  end tell
end tell'''
subprocess.call('osascript', stdin=cmd)

but that’s not allowed. And if you’re thinking we could use the StringIO library to treat a string as if it were a file, you are

  1. My kind of people.
  2. About to be disappointed.

Unfortunately, subprocess.call won’t accept as StringIO object for the stdin argument—it needs a real file or file object.

So what’s the best way to call an external command that reads from STDIN? Damned if I know. In the past, I’ve leaned toward Popen because it requires fewer lines of code and doesn’t require the importing of the tempfile library, but I’m starting to wonder if that’s a false economy.

What I’d really like is for Python to get its act together and come up with one simple and consistent method for executing external commands, feeding them input, and gathering their output. Like the programmer of the script we’re looking at, I’ve written programs that used os.system or os.popen because, in earlier versions of Python, that was the recommended way to do it. The convenience functions of the subprocess module are convenient only for commands that don’t read from STDIN; for those that do, you have to either mess around with temporary files or go the more awkward Popen and communicate route.

And while I’m complaining, may I suggest that check_output is a stupid name for a function that returns the STDOUT of an external command? I don’t use it to just “check” the output, which sounds kind of dainty, I use it to get the output so I can use it elsewhere in my program.

Perl, because one of its first missions was to act as a glue language, is so much better than Python at this sort of thing. I can understand why Python would never adopt the backtick notation, but I don’t understand why there isn’t a standard library that handles external command calls in a simpler, more natural way.

I’ve looked into Kenneth Reitz’s envoy module, which promises to simplify the subprocess runaround into something nearly Perl-like in it simplicity. For example:

python:
cmd = '''
tell application "Mail" to activate
tell application "System Events"
  tell process "Mail"
    click the menu item "New Note" of the menu "File" of menu bar 1
    click the menu item "Paste" of the menu "Edit" of menu bar 1
  end tell
end tell'''
envoy.run('osascript', data = cmd)  

I don’t know what, if any, landmines are hidden within envoy, and I’d certainly prefer to stick with standard libraries for simple things like running external commands, but the convenience of envoy.run is pretty compelling. As Reitz says

This is a convenience wrapper around the subprocess module.

You don’t need this.

But you want it.


  1. There is a typo in the script on the Mac OS X Hints page: the two less-than signs in Line 15 are separated by a space when they shouldn’t be. I’ve fixed the typo here. ↩︎

  2. You’ll note that I’m ignoring the details of the AppleScript itself. Generally speaking, I dislike the use of GUI Scripting like we see in Lines 20-25, but I understand why the programmer did it that way. The AppleScript library for Mail doesn’t appear to have any commands for dealing with notes. ↩︎

  3. So is having to type subprocess.PIPE, but that could be solved by

    python:
    from subprocess import Popen, PIPE
    

    which would shorten the lines considerably. For clarity, I decided to keep the module name in all the code. ↩︎