Turning recordings into podcasts

In the last couple of years, as iCloud ramped up and iOS matured, I’ve found myself connecting my iPhone to my computer less and less. You’ve probably gone through the same transition. Back when contacts and calendar events synced through a wire, I’d put my phone in its dock as soon as I arrived at the office in the morning and made sure it got one last sync before I left at night. Now, with almost everything syncing over the air, I go days without making any wired connections. There is, however, one thing that keeps me plugging my phone in a few times a week: the recordings I make of radio shows streamed from the BBC. I’m now in the process of eliminating that last need for a wired connection.

The BBC has a Listen Again service, which allows you to stream virtually any of its shows for week after they air. I’ve used this service for years, along with Audio Hijack Pro and a handful of scripts (which are in this GitHub repository), to automate the recording of shows and to put them in my iTunes library for later syncing. My goal now is to upload these recordings to a server and turn them into podcasts that I can download to my phone wirelessly via Downcast.

I’ve been fiddling with a solution for a few days now and am only partway along the path to a fully automated solution. I know what needs to be done to finish it, but I’m still thinking about the best way to do it. In broad terms, the steps are easy:

  1. Record the show and gather the metadata (title, date, songlist if appropriate) from the BBC website. These scripts are basically already written as part of the earlier iTunes-based automation.
  2. Upload the recording file to the server with an appropriate name. Uploading could be done through AppleScripting Transmit, a Python script using Paramiko, or a direct sftp script. Although I’ve been using Transmit to upload the files as I’ve worked through the process by hand, I doubt I’ll continue with it in the automated system, because AppleScripts don’t integrate well with other scripts. If I can fit everything into a single Python script, Paramiko will be the way to go, but I’m not sure that’s the best solution.

    In my current system, Audio Hijack Pro names the files according to the show being recorded and the date of the recording. I think a better name will be the eight-character code the BBC gives to every individual episode. I’m not sure whether it’s easier to have AHP name the file on my local computer as part of its post-processing or to rename the file as it’s uploaded.

  3. Update an RSS XML file on the server with the information about the newly recorded show. This is the file you specify to your podcatching app (Downcast, Instacast, iTunes, whatever) when you subscribe to a podcast. I’ve written the code that generates the text that needs to be added, but I haven’t figured out the best way to add it to the existing XML file.

A concrete example is probably the best way to explain what I want to do. Here’s the RSS file for my recordings of Sounds of the 70s, a weekly two-hour show that plays what you would guess from the title:1

xml:
 1:  <?xml version="1.0" encoding="UTF-8"?>
 2:  <rss version="2.0">
 3:  
 4:  <channel>
 5:    <title>Sounds of the 70s</title>
 6:    <description>Johnnie Walker plays classic tracks from the Seventies.</description>
 7:    <link>http://bbc.co.uk/radio2</link>
 8:    <language>en-us</language>
 9:    <image>
10:      <title>Sounds of the 70s</title>
11:      <link>http://bbc.co.uk/radio2</link>
12:      <url>http://leancrew.com/bbc/70s.png</url>
13:    </image>
14:    <docs>http://www.rssboard.org/rss-specification</docs>
15:    <webMaster>drdrang@gmail.com</webMaster>
16:  
17:    <item>
18:      <title>Merry Xmas Everybody!</title>
19:      <link>http://bbc.co.uk/programmes/b01pd29s</link>
20:      <guid>http://leancrew.com/bbc/b01pd29s.m4a</guid>
21:      <description><![CDATA[I'd Like To Teach The World To Sing<br />
22:  by The New Seekers<br />
23:  <br />
24:  Always On My Mind<br />
25:  by Elvis Presley<br />
26:  <br />
27:  Mull Of Kintyre<br />
28:  by Wings<br />
29:  <br />
30:  You Make Loving Fun<br />
31:  by Fleetwood Mac<br />
32:  <br />
33:  Merry Xmas Everybody<br />
34:  by Slade<br />
35:  <br />
36:  [cut]
37:  <br />
38:  It May Be Winter Outside<br />
39:  by Love Unlimited<br />
40:  <br />
41:  White Christmas<br />
42:  by Bing Crosby<br />
43:  <br />
44:  Put A Little Love In Your Heart<br />
45:  by Jackie DeShannon]]></description>
46:      <enclosure url="http://leancrew.com/bbc/b01pd29s.m4a" length="86257527" type="audio/mpeg"/>
47:      <category>Podcasts</category>
48:      <pubDate>Sun, 23 Dec 2012 12:00:00 +0000</pubDate>
49:    </item>
50:  
51:    <lastBuildDate>Sat, 29 Dec 2012 13:00:00 -0600</lastBuildDate>
52:  </channel>
53:  </rss>

I’ve clipped out much of the song list because it isn’t relevant to the format.

Everything outside the <item> block is information that applies to every episode. Within the <item> block is information specific to a particular episode:

With every new episode, a new <item> block will be added to the file, and the <lastBuildDate> near the bottom of the file will be updated.

I’ve already written the code that generates the <item> block. It’s the rssitem function in the radio2.py module:

python:
 76:  def rssitem(code, length):
 77:    'Generate an RSS entry for the given show.'
 78:    
 79:    try:
 80:      (title, date, tlist) = episodeInfo(code)
 81:      tlist = str(tlist).replace('\n', '<br />\n')
 82:      date = date.strftime("%a, %d %b %Y %H:%M:%S +0000")
 83:      item = '''  <item>
 84:        <title>{title}</title>
 85:        <link>http://bbc.co.uk/programmes/{code}</link>
 86:        <guid>http://leancrew.com/bbc/{code}.m4a</guid>
 87:        <description><![CDATA[{tlist}]]></description>
 88:        <enclosure url="http://leancrew.com/bbc/{code}.m4a" length="{length}" type="audio/mpeg"/>
 89:        <category>Podcasts</category>
 90:        <pubDate>{date}</pubDate>
 91:      </item>
 92:    '''.format(**vars())
 93:  
 94:      return item
 95:    except:
 96:      return None

It takes the eight-character code and the length of the recording in bytes and returns the <item> block. It calls another function in the module, episodeInfo, which gathers the information by screen-scraping the episode page. The episodeInfo function is just a slightly expanded version of the trackList function I described in an earlier blog post, so I won’t bother describing it here.

I’ve also written a command-line script, radio2-rssitem, that calls rssitem and outputs the <item> block.

python:
 1:  #!/usr/bin/python
 2:  
 3:  import radio2
 4:  import sys
 5:  from getopt import getopt
 6:  import datetime
 7:  
 8:  usage = '''radio2-rssitem [option] show length
 9:  
10:  options:
11:    -c:    use the episode's program code
12:    -h:    print this message
13:  
14:  Without the -c option, it will generate a podcast RSS item for the most
15:  recent episode of that show (70s, 60s, soul, at). If you use the -c option,
16:  it can retrieve any episode using the given code.'''
17:  
18:  # Parse the command line argument.
19:  if len(sys.argv[1:]) < 2:
20:    print usage
21:    sys.exit()
22:  else:
23:    optlist, args = getopt(sys.argv[1:], 'ch')
24:    if len(args) < 2:
25:      print usage
26:      sys.exit()
27:    if optlist:
28:      for o,a in optlist:
29:        if o == '-c':
30:          code = args[0]
31:          length = args[1]
32:        if o == '-h':
33:          print usage
34:          sys.exit()
35:    else:
36:      show = args[0]
37:      code = radio2.programCode(show)
38:      length = args[1]
39:  
40:  # Construct the <item>.
41:  item = radio2.rssitem(code, length)
42:  if item:
43:    print item.encode('utf-8')
44:  else:
45:    print "Not found"

This is what I’ve been using to construct the XML file by hand.

What I’m not sure about at the moment is the best way to insert new <item> blocks into the XML file automatically. They can’t just be appended to the end (via cat, for example), because they need to be inside the <channel> and <rss> blocks. Also, as I said before, I need to update <lastBuildDate>.

I’m sure there’s a clever way to do it with sed, but I hate sed and don’t want to have anything to do with it. Another possibility is to use an extra file: one that has everything the XML file has except the last three lines. I could then construct the full XML file like this

radio2-rssitem -c b01pd29s 86257527 >> 70s.xml.head
cat 70s.xml.head << END > 70s.xml
  <lastBuildDate>`date +"%a, %d %b %Y %H:%M:%S %z"`</lastBuildDate>
</channel>
</rss>
END

The first line appends the output of radio2-rssitem to the extra file, 70s.xml.head. The remainder concatenates a three-line here-document to the end of 70s.xml.head and saves the result into 70s.xml. The here-document uses the date command to update <lastBuildDate>.

Assuming this is the direction I take, there’s still some fiddle work to be done. I have to get the size of the file, and I have to upload both the recording file and the new version of the XML to the server. Finally, I have to put all this into a script that can be called from an AppleScript because, unfortunately, Audio Hijack Pro can only run AppleScripts when a recording is finished.

AHP recording settings

Looking a the previous paragraph, it doesn’t seem like that much work. In fact, as is often the case, writing this all out has helped me clarify a process that was only fuzzily defined when I started. If I were nicer to my readers, I’d sit on this post for a few days and then rewrite and publish it when all the scripts were in final form. But I’m not nice and I really don’t want to go back and rewrite this mess. If you’re interested in how things turned out, you can watch the GitHub repository.

Whichever way I end up automating this process, when I’m done my iPhone should be pretty much free of my computer. The only times I’ll need to make a wired connection will be those rare occasions when I want to change the mix of songs or add a movie.


  1. Don’t bother commenting on how lame music was in the 70s. It’s my lame music, and I listen to it not because it’s good, but because it’s mine. 

  2. I’m not trying to be a dick about it, but I’ve password-protected the directory that contains the recordings. The BBC is nice enough to provide the Listen Again feature, but it certainly doesn’t want people like me redistributing its material. 


8 Responses to “Turning recordings into podcasts”

  1. Ben says:

    That looks/sounds like a lot of work to handle the XML feed. Any reason you’re not just using an XML library to do the dirty work?

  2. Mike Watta says:

    I had a bit of an epiphany recently with generating RSS files - I used the Jinja2 templating python module that I was using to generate the rest of my static site. It made generating the file disgustingly easy

  3. Alex Chan says:

    I have to do something similar for podcasts that don’t provide their own RSS feeds, and I think I whipped up something similar to Mike Watta, inspired by static baked blogging systems. I generate the feed from scratch on every run, so I don’t need to add new ’s automatically.

    So here’s one thought: if you only have

    <lastBuildDate>`date +"%a, %d %b %Y %H:%M:%S %z"`</lastBuildDate>
    </channel>
    </rss>
    

    at the end of your XML file, and that’s constant, could you tell the script to insert the new item at line number (final line — 4)? I don’t know enough about Python to know if that’s possible or easy, but that’s how I think I would try to go about it.

  4. *sigh* says:

    I don’t know whether you’ve looked at get iplayer automator? OSX only it seems but a very clever way of slurping BBC content.

  5. Joshua Goodwin says:

    My solution involves running the get_iplayer Perl script on a server. Unlike Audio Hijack Pro, it only works with BBC shows, but it removes the need to leave a computer switched on, and the additional expense of downloading to the computer and then re-uploading.

  6. Gridlock says:

    You’re welcome

    — Licence fee payer

  7. Dr. Drang says:

    I do appreciate your support, Gridlock, but I’m pretty sure I’m an indirect supporter of BBC Radio, too. My cable package includes BBC America, which funnels money back to its corporate mother.

  8. T.M. says:

    I agree with Ben. String manipulation is not the right way to handle XML.