Python doesn’t play nicely with others

Today I want to talk about a script that appeared in a post on Mac OS X Hints a couple of days ago. What the script does isn’t of great interest to me, but the technique the programmer used to combine Python and AppleScript is.

The purpose of the script is to take a bunch of individual text files and convert them into notes in Apple Mail. This may sound nuts, but the programmer (who is anonymous) has an explanation:

I have been using plain text files (.txt) for storing my notes since the arrival of Notational Velocity a while ago. When I saw that Mountain Lion will have a dedicated Notes app, I decided it would be great to switch over to Mail’s notes system in preparation for the new OS.

Here’s the script. I want to go through it as I would one of my own scripts, describing what it does with particular emphasis on the section in which it executes an AppleScript from within Python.

 1:  import sys
 2:  import os
 3:  print sys.argv
 4:  for filename in sys.argv[1:]:
 5:      print filename
 6:      text = open(filename,'r').readlines()
 7:      title = os.path.splitext(os.path.basename(filename))[0]
 8:      text = title +'\n'+' '.join(text)
10:      # Store file contents in clipboard
11:      outf = os.popen("pbcopy", "w")
12:      outf.write(text)
13:      outf.close()
15:      cmd = """osascript<< END
16:    tell application "Mail"
17:      activate
18:    end tell
20:    tell application "System Events"
21:      tell process "Mail"
22:        click the menu item "New Note" of the menu "File" of menu bar 1
23:        click the menu item "Paste" of the menu "Edit" of menu bar 1
24:      end tell
25:    end tell
26:      END"""
28:      os.system(cmd)

We see in Line 4 that the script is intended to be given the list of files to be converted on the command line, probably through a *.txt argument.

Line 5 prints the name of the file currently being processed to let the user know how it’s progressing. Lines 6-8 create a string that consists of

  1. The name of the file.
  2. A newline.
  3. The contents of the file, slightly altered.

I understand why the filename gets put on the first line. The iOS Notes app—and, presumably, the upcoming Mountain Lion Notes app—uses the first line of a note as its title. It seems appropriate to use the filename the same way. I don’t understand, however, why the lines of the file are split into a list by the readlines() in Line 6 and then joined together with spaces in Line 8. The result will be a space at the beginning of every line in the file except the first. Perhaps the programmer meant to write Line 8 as

 8:      text = title + '\n' + ''.join(text)

which wouldn’t add the spaces. Even better would be

 6:      text = open(filename,'r').read()
 7:      title = os.path.splitext(os.path.basename(filename))[0]
 8:      text = title + '\n' + text

which eliminates the split-and-rejoin rigamarole entirely. The only reason I can think of for wanting to break the file up into lines is if the file might be extremely large. Given that these are coming from Notational Velocity, that seems quite unlikely. The whole point of using NV, or its cousin nvALT, is to have many small files with just one piece of information.

Lines 11-13 put the text just created on the clipboard through the Mac’s pbcopy command. But it does it through the os.popen method, which has been deprecated since Python 2.6. The preferred style nowadays is to use the subprocess module, which could be done this way:

11:      subprocess.Popen('pbcopy', stdin=subprocess.PIPE).communicate(text)

There are other ways to do this while still using the subprocess library, and it’s never been clear to me which is preferred, a complaint we’ll return to later.

Lines 15-26 define, as a multiline string, a shell command that defines and runs, via osascript, an AppleScript.1 Line 28 then executes that command via the os.system method. Unlike the os.popen method, os.system has not been deprecated, but it’s not recommended. According to the Python documentation for os.system,

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.

which is not exactly a ringing endorsement for what the programmer of this script has done.

Before discussing other ways the AppleScript could have been defined and executed, look again at Lines 15-26 and marvel at the complexity of the quoting. We have quoted strings, like “System Events” in the AppleScript, which are wrapped in a shell here-document, which is then wrapped in Python triple quotes. It doesn’t look remarkable because the programmer has carefully used quoting constructs that don’t interfere with one another. The blandness of these lines is a testament to the thought that went into writing them.

Still, Python doesn’t want us to use os.system, so how should these lines have been written?2 One way would be to use the subprocess.Popen method:

cmd = '''
tell application "Mail" to activate
tell application "System Events"
  tell process "Mail"
    click the menu item "New Note" of the menu "File" of menu bar 1
    click the menu item "Paste" of the menu "Edit" of menu bar 1
  end tell
end tell

subprocess.Popen('osascript', stdin=subprocess.PIPE).communicate(cmd)

This certainly works, but the communicate method is rather clumsy.3 The documentation for subprocess says

The recommended approach to invoking subprocesses is to use the following convenience functions for all use cases they can handle. For more advanced use cases, the underlying Popen interface can be used directly.

The convenience function most suited to this application is, but it’s not especially easy to use when the command being called needs to read from STDIN.

Here’s one way to use

tf = tempfile.TemporaryFile()
tell application "Mail" to activate
tell application "System Events"
  tell process "Mail"
    click the menu item "New Note" of the menu "File" of menu bar 1
    click the menu item "Paste" of the menu "Edit" of menu bar 1
  end tell
end tell''')'osascript', stdin=tf)

The line is simpler than the earlier subprocess.Popen line, but to get that simplicity we had to mess around with temporary files through the tempfile library. (Strictly speaking, we didn’t have to use tempfile, but that’s the safest way to create and dispose of temporary files within a Python script.)

The problem is the stdin argument to requires a file—it won’t take a string. It would be so much nicer if we could just write

cmd = '''
tell application "Mail" to activate
tell application "System Events"
  tell process "Mail"
    click the menu item "New Note" of the menu "File" of menu bar 1
    click the menu item "Paste" of the menu "Edit" of menu bar 1
  end tell
end tell''''osascript', stdin=cmd)

but that’s not allowed. And if you’re thinking we could use the StringIO library to treat a string as if it were a file, you are

  1. My kind of people.
  2. About to be disappointed.

Unfortunately, won’t accept as StringIO object for the stdin argument—it needs a real file or file object.

So what’s the best way to call an external command that reads from STDIN? Damned if I know. In the past, I’ve leaned toward Popen because it requires fewer lines of code and doesn’t require the importing of the tempfile library, but I’m starting to wonder if that’s a false economy.

What I’d really like is for Python to get its act together and come up with one simple and consistent method for executing external commands, feeding them input, and gathering their output. Like the programmer of the script we’re looking at, I’ve written programs that used os.system or os.popen because, in earlier versions of Python, that was the recommended way to do it. The convenience functions of the subprocess module are convenient only for commands that don’t read from STDIN; for those that do, you have to either mess around with temporary files or go the more awkward Popen and communicate route.

And while I’m complaining, may I suggest that check_output is a stupid name for a function that returns the STDOUT of an external command? I don’t use it to just “check” the output, which sounds kind of dainty, I use it to get the output so I can use it elsewhere in my program.

Perl, because one of its first missions was to act as a glue language, is so much better than Python at this sort of thing. I can understand why Python would never adopt the backtick notation, but I don’t understand why there isn’t a standard library that handles external command calls in a simpler, more natural way.

I’ve looked into Kenneth Reitz’s envoy module, which promises to simplify the subprocess runaround into something nearly Perl-like in it simplicity. For example:

cmd = '''
tell application "Mail" to activate
tell application "System Events"
  tell process "Mail"
    click the menu item "New Note" of the menu "File" of menu bar 1
    click the menu item "Paste" of the menu "Edit" of menu bar 1
  end tell
end tell''''osascript', data = cmd)  

I don’t know what, if any, landmines are hidden within envoy, and I’d certainly prefer to stick with standard libraries for simple things like running external commands, but the convenience of is pretty compelling. As Reitz says

This is a convenience wrapper around the subprocess module.

You don’t need this.

But you want it.

  1. There is a typo in the script on the Mac OS X Hints page: the two less-than signs in Line 15 are separated by a space when they shouldn’t be. I’ve fixed the typo here. 

  2. You’ll note that I’m ignoring the details of the AppleScript itself. Generally speaking, I dislike the use of GUI Scripting like we see in Lines 20-25, but I understand why the programmer did it that way. The AppleScript library for Mail doesn’t appear to have any commands for dealing with notes. 

  3. So is having to type subprocess.PIPE, but that could be solved by

    from subprocess import Popen, PIPE

    which would shorten the lines considerably. For clarity, I decided to keep the module name in all the code. 

14 Responses to “Python doesn’t play nicely with others”

  1. Carl says:

    I find that most of the time users who use readlines do so because they learned Python from a tutorial that used it (probably Dive into Python) and they’re not aware of read. There’s never really a good reason for using it, in my opinion. If you want to break up the text, use splitlines like any other string. If you want to iterate line by line, use for line in open(filename). readlines is nothing but disadvantages from my perspective.

  2. Carl says:

    Hmm, not sure if Dive into Python uses readlines after all. Maybe it was just the official tutorial…

  3. Hamelin says:

    I don’t see how you feel that the communicate method of Popen objects is so clumsy. Sure, it entails the construction of a throwaway object, of which only one method is called, so as the envoy module shows, calling a single function could be more appropriate. However, the Popen(...).communicate(...) idiom has the advantage that the setup of the process is separated from the data exchange between it and its Python parent. Personnally, I like this way of shelling out.

  4. Dr. Drang says:

    When I wrote that weak defense of readlines being a memory saver, I was thinking about using readline (singular) to read in one line at a time. Readlines doesn’t do that; it, too, grabs the whole file at once, so there’s no memory savings.

    Unlike Carl, though, I do think readlines has its place. Reading in a file and turning it into a list of lines is a common enough procedure to justify its own method. It just wasn’t the right method to use here.

  5. Dr. Drang says:

    I can understand, Hamelin, the need sometimes to deal with the process separately from the data exchange, but that isn’t the common case. If it were, we wouldn’t have the call and check_call and check_output convenience functions. I don’t want to see communicate go away, I just want the convenience functions to be more convenient.

    If you use communicate often to feed strings to Popen objects, that’s the evidence that the convenience functions have failed to cover a common use case.

  6. al says:

    There is a typo in the script on the Mac OS X Hints page: the two less-than signs in Line 15 are separated by a space when they shouldn’t be. I’ve fixed the typo here.

    Where? The space between the the two less than signs in line 15 are still there: cmd = """osascript< < END

    Hate to be so nitpicky, but…

  7. Dr. Drang says:

    Well, al, I distinctly remember getting rid of that extra space somewhere. Must’ve been in some other copy of the script that didn’t get pasted into the post. Thanks for finding it.

    Also, apologizing for being nitpicky is unnecessary here.

  8. Clark says:

    I didn’t know about envoy. I installed it via easy_install which to my surprise now consults the standard git repositories when checking for locations to install from. I’ll try it out later today as, like you, I’ve never been particularly happy with the subprocess library. (The other day I was trying to do something that involved reading the output over time of a process that stays running. I never did get it to work right and just gave up.)

  9. Wim Leers says:

    I agree. subprocess is annoying.

    Also, it seems the current state of running external commands in Python doesn’t really conform to PEP 20 (“The Zen of Python”) anymore, IMO. Most importantly:

    There should be one— and preferably only one —obvious way to do it.

    But also:

    Beautiful is better than ugly.

    And possibly:

    If the implementation is hard to explain, it’s a bad idea.

  10. Steve Dunham says:

    I can’t help you with your subprocess issues, but you should take a look at the “appscript” module for dealing with OSA scripting. I find it much easier to use osa directly than to hand off scripts to ‘osascript’. (It saves me from having to write any applescript code.)

    In the past, I’ve used appscript to import photos into iPhoto and arrange them into collections. It was reasonably fast and fairly easy to put together.

  11. Dr. Drang says:

    Steve, I’ve been using appscript for a few years. While it still works in Lion, Hamish Sanderson has stopped developing it and suggests that it not be used for future projects.

  12. qznc says:

    I can understand your frustration. I have been there as well. I wrote my own convenience wrappers.

    However, over time my code was used and debugged. One convenience function after another disappeared. Then i got it.

    Python’s subprocess does it right. Starting processes and communicating with them is a complex thing.

    The executable might not be found. You might want a timeout. You might want ulimit the process. The subprocess might be killed (Ctrl-C). You might want to set/clear environment variables. You want to look at the subprocess’s exit code. Etc.

    The question remains, whether Python should provide some wrapper for common cases. Well, I don’t think so. The Popen-communicate patter is not that verbose and “Explicit is better than implicit”. ;)

  13. Heim says:

    In defense of readlines: it’s been there since way before iterators got to the languague. It should have disappeared from tutorials by now (or explained how and where use it, if you really want to), but…

  14. Nick Coghlan says:

    As others have noted, starting subprocesses correctly is a hard problem. We do want better convenience wrappers for shell scripting like tasks in the standard library, but we also want them to be the right wrappers.

    The problem is, in some ways, even worse than you describe, since you haven’t even touched on the challenges of dealing with text encodings correctly using the existing subprocess API for creating pipes.

    There have been a few different efforts along the lines of improved subprocess convenience wrappers of late: - my own Shell Command is, like subprocess, very output oriented (but does try to simplify several aspects of providing correctly quoted arguments to commands). - Kenneth’s envoy module is described in the article - Vinay Sajip has published “sarge”, his own subprocess invocation helper - the Julia programming language has a rather interesting approach to subprocess invocation and pipelines that may be adaptable to Python

    But articles like this one really do help - better understanding the pain points in the raw subprocess.Popen API is the first step in designing a higher level API that better matches the “just do it” experience of shell scripting or Perl’s shell invocation.