Archiving tweets

Last night, Justin Blanton wrote about this interesting IFTTT recipe that archives your tweets in a plain text file in Dropbox. This morning, people began tweeting about it as they realized how nice it would be to have an archive like that sitting on your computer, always up to date within a few minutes. Brett Terpstra wrote a variation on the recipe that put the entries into Markdown format. I wrote a little script to convert my ThinkUp archive into the format the IFTTT recipe uses, and now I have about 6,500 of my tweets in a Dropbox file.

I’ve been using Gina Trapani’s ThinkUp for quite a while to archive my tweets. It’s been rock solid for me, but I’ve been thinking about using a different method because

  1. It’s overkill. I understand that many people need the statistics it provides, but I really just need a tweet archive.
  2. It’s inconvenient for searching. Log in, click to go to Tweets page, click to go to More page, click to go to Search. It would be far easier to search by grepping or acking a local file on the computer in front of me.
  3. It stores the tweets in a database rather than a plain text file. For ThinkUp’s purposes, a database is probably necessary, but for my limited needs, a text file is preferable because it’s more portable and easier to understand.

I’d been thinking about writing a script, to be run periodically by launchd, that would go get my recent tweets from Twitter and append them to a text file. I know how to write such a script—back in 2008, I wrote a script to grab my tweets from the previous day and post them here.1 A script that just appended the tweets to a file should be even easier. But I started thinking about things like what sort of information should be included in the file and what the best format would be. The perfect became the enemy of the good, and I never wrote the script.

When I saw the IFTTT recipe for archiving tweets—which was, by the way, written by Hugo (@hugovk on Twitter), not by Justin Blanton as several people think—I saw that it had all I really needed. The format wasn’t what I would have chosen, but so what? It’s clear and should be easy to parse. Here’s what the tail of my archive looks like now:

I believe @jblanton owes @ttscoff one (1) full day of productivity.
July 03, 2012 at 03:22PM
http://twitter.com/drdrang/status/220251096495570945
- - - - - 

@ttscoff @jaheppler If you can install @thinkup, it’ll go back and pull old tweets into a database.
July 03, 2012 at 03:27PM
http://twitter.com/drdrang/status/220252567698018304
- - - - - 

@thanland @jaheppler Yep. About 1,000 of my pearls of wisdom are lost to me forever. Oh, the humanity!
July 03, 2012 at 03:33PM
http://twitter.com/drdrang/status/220253916225474560
- - - - - 

@BenjaminBrooks @gruber I had 9 SunOS visitors. According to Google Analytics, every one of them had a beard longer than @jdalrymple’s.
July 03, 2012 at 04:10PM
http://twitter.com/drdrang/status/220263296400498691
- - - - - 

The one problem with the IFTTT recipe is that it’s prospective only; it won’t go back and grab the tweets you posted before activating the recipe. But since I had most of my tweets2 in ThinkUp, all I needed to do was get the tweets out of its database and into a text file in the right format.

That was actually pretty easy because I’d done a similar thing before. ThinkUp has a command that will export your tweets to your local hard disk in CSV format.

ThinkUp tweet export

A CSV library is part of the standard Python distribution, so extracting the desired information and formatting it didn’t take much programming:

python:
 1:  #!/usr/bin/python
 2:  
 3:  import csv
 4:  import os
 5:  from datetime import datetime
 6:  import sys
 7:  
 8:  # Put your Twitter username here.
 9:  me = "drdrang"
10:  
11:  # Archive format.
12:  single = "%s\n%s\nhttp://twitter.com/" + me + "/status/%s"
13:  
14:  # Open the CSV file specified on the command line and read the field names.
15:  tfile = open(sys.argv[1])
16:  treader = csv.reader(tfile)
17:  fields = treader.next()
18:  
19:  # Fill a list with the tweets, with each tweet a dictionary.
20:  allInfo = []
21:  for row in treader:
22:    allInfo.append(dict(zip(fields,row)))
23:  
24:  # Collect only the info we need in a list of lists. Convert the date string
25:  # into a datetime object.
26:  tweets = [ [datetime.strptime(x['pub_date'], "%Y-%m-%d %H:%M:%S"), \
27:              x['post_id'], x['post_text']] \
28:              for x in allInfo ]
29:  
30:  # We put the date first so we can sort by date easily.
31:  tweets.sort()
32:  
33:  # Construct a new list of tweets formatted the way the IFTTT recipe does.
34:  out = [ single % \
35:          (x[2], x[0].strftime("%B %d, %Y at %I:%M%p"), x[1]) \
36:          for x in tweets ]
37:  
38:  print '\n- - - - -\n\n'.join(out)
40:  print '\n- - - - -'

Update 7/4/12
The original version of this script had a bug on Line 35. I had the hour code as %H (24-hour clock) instead of %I (12-hour clock) as IFTTT uses. If you used the original, as I did, your evening tweets will have stupid timestamps like “18:08PM.”

To fix this:

  1. Run the new version of the script without piping it to pbcopy. Note the last tweet.
  2. Open your ~/Dropbox/ifttt/twitter/twitter.txt file and delete all the tweets from the beginning through the one you just noted.
  3. Rerun the new version of the script and pipe the result to pbcopy.
  4. Paste the fixed tweets at the beginning of your ~/Dropbox/ifttt/twitter/twitter.txt file.

Sorry about the extra work this mistake caused.

The script, called tu2ifttt, expects the exported CSV file to be its argument, and it prints the transformed (and much simplified) archive to standard out. I piped the output to the clipboard,

python tu2ifttt ~/Downloads/posts-drdrang-twitter.csv | pbcopy

and pasted it at the beginning of the twitter.txt file in my Dropbox folder that the IFTTT recipe had created.3 I may have needed to add an extra empty line between the old tweets I’d just pasted in and the newer ones that IFTTT had archived.

If you have a ThinkUp archive and would like to add your old tweets to your new IFTTT archive, just change the drdrang in Line 9 of the script to your username and run the pipeline on your downloaded CSV file. You’ll have all your old tweets on the clipboard, ready for pasting.


So now I have a 1.2 MB file in my Dropbox folder that contains over 6,500 tweets of mine. I can quickly find that tweet from when my wife was watching Downfall by searching for “Hitler.”

My wife is watching “Downfall.” She’s mad because every time I hear Hiitler yelling I start laughing.
  — Dr. Drang (@drdrang) Wed Jul 14 2010

Which is of vital importance.

The questions I need to answer now are:

  1. Should I trust IFTTT to keep running?
  2. Should I continue to use ThinkUp as a backup?
  3. Should I just write my own script for archiving each day’s tweets so I don’t have to rely on ThinkUp or IFTTT?

  1. Don’t judge me too harshly. People were doing that sort of thing back then. I shut it down in 2009. 

  2. I don’t have all of my tweets in ThinkUp because Twitter won’t let any program collect more than the most recent 3,200 tweets, and by the time I started using ThinkUp I was already past 4,200. 

  3. I had made my own copy of the IFTTT recipe and started it running before I went back and collected my older archive from ThinkUp. 


11 Responses to “Archiving tweets”

  1. Mat Packer says:

    Based on the recent history of web apps being bought, sold, killed off, I think the answers to your questions are pretty simple:

    1. No
    2. No
    3. Yes (and then share it with the world)

    Just my 2c anyway.

  2. Lukas says:

    @Mat: ThinkUp is open source and can be installed on your own server, so I don’t think there’s much chance of it being killed off in any relevant way :-)

  3. Michael says:

    Thanks for this posting!

    In my experience IFTTT is not yet reliable, recently my recipes weren’t on my account page anymore (but continued to run). Could only resolve it by deauthenticating IFTTT from my Twitter-account.

  4. Lauri Ranta says:

    I’ve just been using the Twitter Ruby gem. Here’s a ridiculous single line example:

    ruby -rubygems -e 'require "twitter"; require "cgi"; Twitter.user_timeline("climagic", options = {:page => 1, :count => 100}).each { |tweet| puts tweet.id.to_s + " " + CGI.unescapeHTML(tweet.text.gsub("\n", " ")) }'

  5. Abraham Vegh says:

    I wouldn’t bother trying to outsource this. The few services that have sprung up in the past have usually gone south after about a year or two, even (especially?) the ones that have been for-pay from the outset.

    I’ve been using Tweetnest since its creation to save my tweets. As of right now, I’ve got about 41,000 of them stored in MySQL on a VPS that has cron refreshing it every 30 minutes to make sure I get everything.

    Before that (and for a time, in tandem with Tweetnest), I used the now-defunct Downstream, so I actually have nearly all of my ~56,000 tweets archived in one fashion or another, although the Tweetnest GUI is usually handy for finding that thing I said last month.

  6. Abraham Vegh says:

    …and I just realized ThinkUp can be self-hosted. And that it’s interface is better than Tweetnest.

  7. Dr. Drang says:

    I agree with Mat that the answers are simple, but I’m going with

    1. No.
    2. Yes, until I get my own archiving script up, running, and debugged.
    3. Yes.

    I’ll be writing it in Python, using a Twitter library that works more or less the same way as the Ruby library Lauri mentioned.

    The script won’t be quite as simple as Lauri’s; it’ll have to keep track of the last tweet archived so it doesn’t make duplicates, and it’ll convert the timestamps from UTC to my timezone.

  8. Jan Marcel says:

    I’ve made a minor change to your script in order to convert ThinkUp’s UTC timestamps to the machine’s timezone: https://gist.github.com/3056557. Sorry if it’s not very elegant…

  9. Clark says:

    Honest question: why do you want your old tweets?

  10. Andrew Heiss says:

    If you want to use Brett Terpstra’s Markdown syntax in your plaintext file and extract everything from ThinkUp, change line 12 to this:

    single = "%s\n\n[%s](http://twitter.com/" + me + "/status/%s)\n"
    
  11. Tobias says:

    I had my own Python script until the move to OAuth broke it. This really isn’t something I want to maintain myself. Now I simply occasionally launch http://www.riverfold.com/software/tweetlibrary/ . It can export CSV, but so far the built-in search was enough.