Labeling time series

Today I made a simple time series graph for a report. I’m happy with the way it turned out, but I’m kind of embarrassed at the hackey way I got it that way.

The values to be plotted were temperature readings from a local NCDC weather station. I had a text file with lines that looked like this:

2015-01-04-09-28  29  
2015-01-04-09-53  27  
2015-01-04-09-55  26  
2015-01-04-10-23  25  
2015-01-04-10-31  24  
2015-01-04-10-40  24  
2015-01-04-10-53  23  
2015-01-04-10-55  23  
2015-01-04-11-01  23  
2015-01-04-11-08  22  

The first item on each line is a date/time stamp in YYYY-MM-DD-hh-mm format. The second is the temperature in Fahrenheit.1 Although I’m displaying this excerpt here with spaces separating the fields, the input file had them separated with tabs.

After consulting the Matplotlib documentation and looking at a couple of examples, I wrote up a simple plotting routine,

python:
 1:  #!/usr/bin/python
 2:  
 3:  from matplotlib import pyplot as plt
 4:  from matplotlib import dates
 5:  from datetime import datetime
 6:  import sys
 7:  
 8:  d = []
 9:  t = []
10:  for line in sys.stdin:
11:    dstamp, temp = line.rstrip().split('\t')
12:    d.append(datetime.strptime(dstamp, '%Y-%m-%d-%H-%M'))
13:    t.append(int(temp))
14:  
15:  days = dates.DayLocator()
16:  hours = dates.HourLocator()
17:  dfmt = dates.DateFormatter('%b %d')
18:  
19:  datemin = datetime(2015, 1, 4, 0, 0)
20:  datemax = datetime(2015, 1, 12, 0, 0)
21:  
22:  fig = plt.figure()
23:  ax = fig.add_subplot(111)
24:  ax.xaxis.set_major_locator(days)
25:  ax.xaxis.set_major_formatter(dfmt)
26:  ax.xaxis.set_minor_locator(hours)
27:  ax.set_xlim(datemin, datemax)
28:  ax.set_ylabel('Temperature (F)')
29:  ax.plot(d, t, linewidth=2)
30:  fig.set_size_inches(8, 4)
31:  
32:  plt.savefig('temperatures.pdf', format='pdf')

and got this result:

Initial temperature plot

There are a few things wrong with it. First, there are way too many minor tick marks. A minor tick every hour is just too often. I fixed that by changing Line 16 to

python:
16:   hours = dates.HourLocator(interval=3)

A minor tick every three hours looks much less cluttered.

Second, the plot needed a grid to make it easier to keep the reader’s eye aligned with the axes. I inserted the line

python:
27:  ax.grid(True)

just before the ax.plot command.

The biggest problem, though, was the location of the date labels. As you can see, they’re centered under the major tick marks associated with midnight of each day. Having a label aligned with a single tick mark would be fine if I had just one data point per day, but in this case there are about 40 temperature readings each day. The labeling of the axis should reflect the fact that a day is the entire block of time from one midnight to the next.

I see time series labeled like this fairly often, and will probably not surprise you to hear that it annoys the shit out of me. It seems to be most common for data series that stretch out over years. A year’s worth of daily figures should not have a label like “2014” centered under the tick mark for January 1. Changing the text of the label to something like “1/1/14” is more accurate, but it’s lazy and inelegant. The best way to say “this is 2014” is to have obvious marks at either end of the year and center the year label between them.

The same principle holds for my temperature data. I already had the days’ borders marked with major tick marks and (after adding the ax.grid(True) line) vertical grid lines. All I needed to do is scootch the day labels to get them centered between the borders.

It was at this point that I cheated.

There is, I’m sure, a Matplotlib command for moving labels the way I wanted, but all I could find on short notice were ways to move the labels closer to or farther from the axis—nothing about moving them along the axis. And I really wanted to get my report out the door and into the hands of my client.

So…

python:
 1:  #!/usr/bin/python
 2:  
 3:  from matplotlib import pyplot as plt
 4:  from matplotlib import dates
 5:  from datetime import datetime
 6:  import sys
 7:  
 8:  d = []
 9:  t = []
10:  for line in sys.stdin:
11:    dstamp, temp = line.rstrip().split('\t')
12:    d.append(datetime.strptime(dstamp, '%Y-%m-%d-%H-%M'))
13:    t.append(int(temp))
14:  
15:  days = dates.DayLocator()
16:  hours = dates.HourLocator(interval=3)
17:  dfmt = dates.DateFormatter('              %b %d')
18:  
19:  datemin = datetime(2015, 1, 4, 0, 0)
20:  datemax = datetime(2015, 1, 11, 23, 59, 59)
21:  
22:  fig = plt.figure()
23:  ax = fig.add_subplot(111)
24:  ax.xaxis.set_major_locator(days)
25:  ax.xaxis.set_major_formatter(dfmt)
26:  ax.xaxis.set_minor_locator(hours)
27:  ax.set_xlim(datemin, datemax)
28:  ax.set_ylabel('Temperature (F)')
29:  ax.grid(True)
30:  ax.plot(d, t, linewidth=2)
31:  fig.set_size_inches(8, 4)
32:  
33:  plt.savefig('temperatures.pdf', format='pdf')

I added a bunch of space characters to the front of the date formatting string in Line 17. After two or three attempts, I arrived at something that looked reasonably centered.

Final temperature plot

You may have noticed the other bit of hackery: to avoid having a “Jan 12” label sticking off the right edge of the plot, I changed the upper bound in Line 20 to just before midnight on January 11. The one second difference can’t be seen in the plot, and it means “Jan 11” is the last label.

Almost everyone who works with computers has resorted to tricks like this at one time or another. We know it’s wrong, and we’re ashamed that we don’t know “the right way” to accomplish our goals. But mixed with the shame is a perverse pride in the ability to get something done even when we don’t really know what we’re doing.


  1. If you go on Twitter to tell me I should be using Celsius, I will block you. This report is being written for an audience that’s more comfortable with Fahrenheit, so that’s how I’m reporting the data. 


Screen captures with file upload

Things have come full circle. Several years ago I wrote a little script called snapftp, which combined screen capturing with file uploading via FTP to streamline the process of including screenshots in my posts here. As I mentioned last night, this later evolved into snapflickr, a similar script for screen capturing and uploading to Flickr. Now that I’ve returned to hosting images on my web server, I need to return to something similar to snapftp, but I want this one to be smarter.

First, I won’t be using FTP anymore. I never had any security issues with FTP, but it just makes more sense to use SCP. Second, I want to eliminate (or at least greatly reduce) the chance of duplicate file names. And finally, I’m going to implement this as a Keyboard Maestro macro, which gives me a bit more flexibility in divvying up the work between Python and shell scripts.

Using the new macro, called SnapSCP, is only slightly more complicated than using the SnapClip macro. I invoke it through the ⌃⇧4 keyboard combination, choose a window or rectangular area to capture, give the image file a name, and decide whether it should have a border. The file is then saved locally in my ~/Pictures/Screenshots folder and is uploaded to an images directory on the blog server. An HTML <img> tag is put on the clipboard for pasting into a post.1

Here’s the GUI presented for setting the name and choosing whether to add a border:

SnapSCP

The Name field is used to populate the alt and title attributes of the img, and it’s also part of the file name. But it isn’t the complete file name, because that would make it too easy to create file name conflicts. So the file name is whatever’s written in the Name field, prefixed by the date in yyyymmdd format. Thus, the file name of the image above is 20150125-SnapSCP.png. Although I could never enter unique names over the course of a year, month, or even a week, I figure I can remember to give unique names during a single day.

Here’s the Keyboard Maestro macro:

Keyboard Maestro SnapSCP

The first Python script is

python:
 1:  #!/usr/bin/python
 2:  
 3:  import Pashua
 4:  import tempfile
 5:  import Image
 6:  import sys, os
 7:  import subprocess
 8:  from datetime import date
 9:  
10:  # Local parameters
11:  dstring = date.today().strftime('%Y%m%d')
12:  type = "png"
13:  localdir = os.environ['HOME'] + "/Pictures/Screenshots"
14:  tf, tfname = tempfile.mkstemp(suffix='.'+type, dir=localdir)
15:  bgcolor = (61, 101, 156)
16:  bigshadow = (25, 5, 25, 35)
17:  smallshadow = (0, 0, 0, 0)
18:  border = 16
19:  
20:  # Dialog box configuration
21:  conf = '''
22:  # Window properties
23:  *.title = Snapshot
24:  
25:  # File name text field properties
26:  fn.type = textfield
27:  fn.default = Snapshot
28:  fn.width = 264
29:  fn.x = 54
30:  fn.y = 40
31:  fnl.type = text
32:  fnl.default = Name:
33:  fnl.x = 0
34:  fnl.y = 42
35:  
36:  # Border checkbox properties
37:  bd.type = checkbox
38:  bd.label = Background border
39:  bd.x = 10
40:  bd.y = 5
41:  
42:  # Default button
43:  db.type = defaultbutton
44:  db.label = Save
45:  
46:  # Cancel button
47:  cb.type = cancelbutton
48:  '''
49:  
50:  # Capture a portion of the screen and save it to a temporary file.
51:  subprocess.call(["screencapture", "-ioW", "-t", type, tfname])
52:  
53:  # Exit if the user canceled the screencapture.
54:  if not os.access(tfname, os.F_OK):
55:    os.remove(tfname)
56:    sys.exit()
57:  
58:  # Open the dialog box and get the input.
59:  dialog = Pashua.run(conf)
60:  if dialog['cb'] == '1':
61:    os.remove(tfname)
62:    sys.exit()
63:  
64:  # Add a desktop background border if asked for.
65:  snap = Image.open(tfname)
66:  if dialog['bd'] == '1':
67:    # Make a solid-colored background bigger than the screenshot.
68:    snapsize = tuple([ x + 2*border for x in snap.size ])
69:    bg = Image.new('RGB', snapsize, bgcolor)
70:    bg.paste(snap, (border, border))
71:    bg.save(tfname)
72:  
73:  # Rename the temporary file using today's date (yyyymmdd) and the 
74:  # name provided by the user.
75:  name = dialog['fn'].strip()
76:  fname =  '{localdir}/{dstring}-{name}.{type}'.format(**locals())
77:  os.rename(tfname, fname)
78:  
79:  print fname

Most of this follows the same logic as the script in the SnapClip macro, so I won’t repeat that explanation here. The main differences are:

As you can see in the KM screenshot, the output of this script, if any, is saved to a KM variable called fname. If this variable exists and is not empty, the image file is then uploaded to the server through this one-line shell script:

scp -P 22 "$KMVAR_fname" user@leancrew.com:path/to/all-this/images2015/

The -P option to scp should be set to the server’s SSH port. As you’ve probably guessed, my server’s SSH port isn’t actually set to 22.

The first argument is the local file, which we get from the KMVAR_fname environment variable. For each variable defined in a macro, Keyboard Maestro sets an environment variable of the form KMVAR_variable that can be accessed from any script. The quotes around the variable are there to accommodate characters (like spaces) that have special meaning to the shell.

The second argument is the remote directory to which the file is uploaded. Because I have SSH keys set up, I don’t need to provide a password—scp looks in the ~/.ssh/ directories on both the server and my local machine for the information necessary to log in securely.

The final Python script generates the img tag and puts it on the clipboard.

python:
 1:  #!/usr/bin/python
 2:  
 3:  from datetime import date
 4:  import os
 5:  import os.path
 6:  import urllib
 7:  
 8:  # File names: with and without date prefix and extension.
 9:  fname = os.path.basename(os.environ['KMVAR_fname'])
10:  name = os.path.splitext(fname)[0].split('-', 1)[-1]
11:  fname = urllib.quote(fname)
12:  
13:  print '<img class="ss" src="http://leancrew.com/all-this/images2015/{fname}" alt="{name}" title="{name}" />'.format(**locals())

Like the shell script, it accesses the file name through the KMVAR_fname environment variable. It extracts the parts necessary for the src, alt, and title attributes, and combines them into a full image tag. Line 11 URL-encodes the file name, which isn’t actually necessary but seems like good practice. I’ve noticed, for example, that Transmit encodes when you right-click on an item and choose the Copy URL command.

Because I use LaunchBar’s Clipboard History feature, I don’t have to paste the image tag immediately after running SnapSCP. I can take several screenshots in a row, in any order, and then paste the links into a post later. This is one of the many advantages of writing your own scripts—you can create commands that not only fit in with how you think, but also fit in with the rest of the tools you use.


  1. Although I write in Markdown, I don’t use the Markdown syntax for images. I’ve never found it especially intuitive, and it isn’t convenient for assigning classes to the image. But one of the great things about Markdown is that it allows you to drop down into HTML when you want to. 


Screen captures to clipboard again

Back in 2011, in an attempt to reduce site traffic when this blog was on a different host and was being run on WordPress, I began using my Flickr account to host the screen captures that accompany many of my posts. I wrote a little utility, called snapflickr to make the process of capturing and uploading easier. Last year, I added the ability to save the screen captures to the clipboard. I did this so I could open the screen images quickly in Acorn for editing, but later found myself using it to get online receipts (captured from, for example, Amazon’s invoice pages) to add to my expense reports in Numbers. This worked, but was a little clumsy—negotiating the additional options in the utility slowed down the process of getting a screenshot.

SnapFlickr options

Now that I have a static site and have switched hosts, it seemed like a good time to

  1. Stop using Flickr as a poor man’s CDN and go back to hosting images on the blog’s server.
  2. Split the utility in two: one that sends screenshots to the clipboard, and another that uploads them to the server.

I’m still tweaking the one that uploads to the server, but the screen capturing clipboard utility is ready for its closeup. It’s a Keyboard Maestro macro called SnapClip that looks like this in the KM Editor:

Keyboard Maestro SnapClip

When I press the ⌃⌥⌘4 keyboard combination, the cursor goes into screen capture mode, much as it would if I pressed the system standard ⇧⌘4 or ⌃⇧⌘4 combos. But after I’ve chosen the window or screen rectangle I want to capture, SnapClip displays this window:

SnapClip

The checkbox allows me to put a dark blue border around the captured image, which is the style I’ve adopted for full-window captures. I think it makes the captured image look more like a window on the Desktop than if the window is captured “bare.”

Of course, providing the option for the border means I can’t just use what’s built into the ⌃⇧⌘4 command. The Python script that SnapClip runs is this:

python:
 1:  #!/usr/bin/python
 2:  
 3:  import Pashua
 4:  import tempfile
 5:  import Image
 6:  import sys, os
 7:  import subprocess
 8:  
 9:  # Local parameters
10:  type = "png"
11:  localdir = os.environ['HOME'] + "/Pictures/Screenshots"
12:  tf, tfname = tempfile.mkstemp(suffix='.'+type, dir=localdir)
13:  bgcolor = (61, 101, 156)
14:  border = 16
15:  
16:  # Dialog box configuration
17:  conf = '''
18:  # Window properties
19:  *.title = Snapshot
20:  
21:  # Border checkbox properties
22:  bd.type = checkbox
23:  bd.label = Background border
24:  bd.x = 10
25:  bd.y = 40
26:  
27:  # Default button
28:  db.type = defaultbutton
29:  db.label = Clipboard
30:  
31:  # Cancel button
32:  cb.type = cancelbutton
33:  '''
34:  
35:  # Capture a portion of the screen and save it to a temporary file.
36:  subprocess.call(["screencapture", "-ioW", "-t", type, tfname])
37:  
38:  # Exit if the user canceled the screencapture.
39:  if not os.access(tfname, os.F_OK):
40:    os.remove(tfname)
41:    sys.exit()
42:  
43:  # Open the dialog box and get the input.
44:  dialog = Pashua.run(conf)
45:  if dialog['cb'] == '1':
46:    os.remove(tfname)
47:    sys.exit()
48:  
49:  # Add a desktop background border if asked for.
50:  snap = Image.open(tfname)
51:  if dialog['bd'] == '1':
52:    # Make a solid-colored background bigger than the screenshot.
53:    snapsize = tuple([ x + 2*border for x in snap.size ])
54:    bg = Image.new('RGB', snapsize, bgcolor)
55:    bg.paste(snap, (border, border))
56:    bg.save(tfname)
57:    
58:  # Put the image on the clipboard and delete the temporary file.
59:  impbcopy = os.environ['HOME'] + '/Dropbox/bin/impbcopy'
60:  subprocess.call([impbcopy, tfname])
61:  os.remove(tfname)

As you can see from the import lines at the top, this script uses two nonstandard libraries: Pashua and Image.

The Pashua library provides the interface to Carsten Blüm’s Pashua application. This is a great little utility for adding simple GUIs to scripts. It’s similar in many ways to Platypus, but I find it generally easier to use.

The Image library is one of the modules in the Python Imaging Library (PIL), the old standby for image editing in Python. PIL is what I use to add the border to the screenshot.

Lines 9–14 define the parameters that govern the specifics of the rest of the program: the image format for the screenshot, the location and name of the temporary image file, and the size and color of the border.

Lines 16–32 specify the layout of the window shown above.

Line 36 captures the image by running OS X’s screencapture command. I have the options set to start the process in window capture mode, but it’s easy to switch to rectangle capture mode by pressing the space bar. Because it might need to add a border, SnapClip saves the captured image to disk using the temporary file name defined in Line 12.

Lines 43–47 put up the window and collect the user input. If I click the “Background border” checkbox, Lines 49–56 open the image file, add the border, and save the edited file back to disk.

Finally, Lines 58–61 put the image on the clipboard and clean up by deleting the temporary image file. I use Alec Jacobson’s impbcopy command line tool for this. It’s a clever little program that mimics the builtin pbcopy command, but for image data instead of text.

While this is not quite as efficient as the standard ⌃⇧⌘4 keyboard combo when I don’t want a border (it requires an extra tap on the Return key), it’s far easier to use when I do. And I prefer to remember just one keyboard combination for putting screen captures on the clipboard.


DNS migration

As predicted, I made a boneheaded mistake last night when switching DNS records to the site’s new host. Things seem to be working now, and if you’re reading this, it means the DNS propagation has made it to your service provider.

I got interrupted while in the middle of following Digital Ocean’s instructions. I had changed settings at my domain registrar and was in the middle of configuring the domain in DO’s control panel. I’d just set the A record when I had to leave the computer for while. When I came back, I thought I was done and didn’t go on to set the CNAME record. Later, after the DNS changes had propagated to my service provider, the server errors told me I’d screwed something up. Luckily, it wasn’t hard to figure out and fix.

There were a couple of things I actually did well. First, before making any DNS changes, I tested the site on the new server to make sure I’d copied everything over and had configured Apache correctly. One way to do this is to substitute the IP number of the new server wherever leancrew.com or www.leancrew.com would appear in a URL. For example,

http://256.256.256.256/all-this/2015/01/dns-migration/

would be the way to address this page on the new server.1 That’s kind of a pain in the ass, though, and it doesn’t help me test my various redirection and rewriting rules.2 To make it easier to test everything before “going live,” I added this line to the /etc/hosts file on my local computer.

256.256.256.256 leancrew.com

The /etc/hosts file takes precedence over DNS, so it directed all my leancrew.com URLs to the new server. I commented the line out (by putting a hash mark at the front) when I wasn’t testing.

With leancrew.com sometimes pointing to the old server and sometimes to the new server, I needed to make sure I knew which one I was polling at any given time. The simplest way I found to do this was with curl:

curl -v http://leancrew.com

The -v option puts curl in verbose mode. In this mode, it doesn’t just fetch the data, it also prints a running log of the conversation with the server. Included in the conversation is the IP number of the server.

Another option I found useful was -L, which causes curl to follow any redirections. Combining -L with -v let me track the redirections to make sure they worked.

I won’t be surprised to find other mistakes over the next few days, but I think the big ones are behind me.

Update 1/22/15 10:28 AM
Well, that didn’t take long. I forgot to change Apache’s DirectoryIndex setting to include index.xml files. That screwed up the RSS feed. It’s fixed now, but some of the feed-syncing services will never check again.


  1. I hope it’s obvious to you that that’s a fake IP number. It’s not even legal. 

  2. This blog has changed from Blosxom to Movable Type to WordPress to static, and the rewriting rules allow very old links to still work.