Fear Liss

Casey Liss announced yesterday that he had changed jobs to become an iOS developer. I was happy for him, of course, as it’s been obvious for a while that being a Mac and iOS user in his personal life has made him wish he could switch to the Apple side in his professional life. But I was also struck by how brave this move was.

Changing jobs always takes a certain amount of bravery, especially as you accumulate responsibilities. Staying where you are, where you know the ropes, is easier than jumping into something new, no matter how qualified you may be. But that’s not the kind of bravery I was thinking of.

By moving into iOS development, Casey is entering a field where he’s a celebrity, despite having relatively little direct experience. That has to be weird. So weird that, if it were me, it’d worry me a lot more than, say, the learning curve for Cocoa libraries. There’s nothing specific I can point to that’s worth worrying about—and it’s that vagueness that would make me uneasy. Maybe that says more about me than it does about Casey’s situation.

In any event, congratulations, Casey! Both for the new job itself and the courage it took to jump into the unknown.

Tweaking a legend

To get the stacked area chart posted a couple of days ago, I used a slightly edited version of the Matplotlib script that generated the line-and-scatter chart from a week ago. I thought it might be helpful, mostly to future me, to discuss what I did and how the chart was improved by small changes.

This is the line-and-scatter chart I started with,

Apple moving averages

and this is the script that produced it:

  1:  #!/usr/bin/env python
  3:  from dateutil.relativedelta import *
  4:  from datetime import date
  5:  from sys import stdin, argv, exit
  6:  import numpy as np
  7:  import matplotlib.pyplot as plt
  8:  import matplotlib.dates as mdates
  9:  from matplotlib.ticker import MultipleLocator
 10:  from PIL import Image
 12:  # Initialize
 13:  phoneFile = 'iphone-sales.txt'
 14:  padFile = 'ipad-sales.txt'
 15:  macFile = 'mac-sales.txt'
 16:  lastYear = 2000
 17:  plotFile = argv[1]
 18:  if plotFile[-4:] != '.png':
 19:     plotFile = plotFile + '.png'
 21:  # Get the last Saturday of the given month.
 22:  def lastSaturday(y, m):
 23:    return date(y, m, 1) + relativedelta(day=31, weekday=SA(-1))
 25:  # Read the given data file and return the series. Also update the
 26:  # global variable lastYear to the last year in the data.
 27:  def getSeries(fname):  
 28:    global lastYear
 29:    qmonths = {'Q1': 12, 'Q2': 3, 'Q3': 6, 'Q4': 9}
 30:    dates = []
 31:    sales = []
 32:    for line in open(fname):
 33:      quarter, units = line.strip().split('\t')
 34:      units = float(units)
 35:      year, q = quarter.split('-')
 36:      year = int(year)
 37:      month = qmonths[q]
 38:      if month == 12:
 39:        qend = lastSaturday(year-1, month)
 40:      else:
 41:        qend = lastSaturday(year, month)
 42:      if qend.year > lastYear:
 43:        lastYear = qend.year
 44:      dates.append(qend)
 45:      sales.append(units)
 46:    ma = [0]*len(sales)
 47:    for i in range(len(sales)):
 48:      lower = max(0, i-3)
 49:      chunk = sales[lower:i+1]
 50:      ma[i] = sum(chunk)/len(chunk)
 51:    return dates, sales, ma
 53:  # Read in the data
 54:  macDates, macRaw, macMA = getSeries(macFile)
 55:  phoneDates, phoneRaw, phoneMA = getSeries(phoneFile)
 56:  padDates, padRaw, padMA = getSeries(padFile)
 58:  # Tick marks and tick labels
 59:  y = mdates.YearLocator()
 60:  m = mdates.MonthLocator(bymonth=[1, 4, 7, 10])
 61:  yFmt = mdates.DateFormatter('             %Y')
 62:  ymajor = MultipleLocator(10)
 63:  yminor = MultipleLocator(2)
 65:  # Plot the moving averages with major gridlines.
 66:  fig, ax = plt.subplots(figsize=(8,6))
 67:  ax.plot(macDates, macMA, 'g-', linewidth=3, label='Mac')
 68:  ax.plot(macDates, macRaw, 'g.')
 69:  ax.plot(phoneDates, phoneMA, 'b-', linewidth=3, label='iPhone')
 70:  ax.plot(phoneDates, phoneRaw, 'b.')
 71:  ax.plot(padDates, padMA, 'r-', linewidth=3, label='iPad')
 72:  ax.plot(padDates, padRaw, 'r.')
 73:  ax.grid(linewidth=1, which='major', color='#dddddd', linestyle='-')
 75:  # Set the upper limit to show all of the last year in the data set.
 76:  plt.xlim(xmax=date(lastYear, 12, 31))
 78:  # Set the labels
 79:  plt.ylabel('Unit sales (millions)')
 80:  plt.xlabel('Calendar year')
 81:  t = plt.title('Raw sales and four-quarter moving averages')
 82:  t.set_y(1.03)
 83:  ax.xaxis.set_major_locator(y)
 84:  ax.xaxis.set_minor_locator(m)
 85:  ax.xaxis.set_major_formatter(yFmt)
 86:  ax.yaxis.set_minor_locator(yminor)
 87:  ax.yaxis.set_major_locator(ymajor)
 88:  ax.set_axisbelow(True)
 89:  plt.legend(loc=(.15, .6), prop={'size':12})
 90:  fig.set_tight_layout({'pad': 1.5})
 92:  # Save the plot file as a PNG.
 93:  plt.savefig(plotFile, format='png', dpi=100)
 95:  # Add the head. Unfortunately, I don't know a way to get its
 96:  # size and location other than trial and error.
 97:  plot = Image.open(plotFile)
 98:  head = Image.open('snowman-head.jpg')
 99:  smallhead = head.resize((86, 86), Image.ANTIALIAS)
100:  plot.paste(smallhead, (300, 162))
101:  plot.save(plotFile)

This itself is an evolved version of a script that was discussed in a post back in July. If I were writing it from scratch today, I’d use Pandas to read in and manipulate the data, but I see no reason to do a full rewrite on a script that works—especially one that I’m doing for fun instead of profit.

The first thing I did to turn it into a stacked area chart was delete Lines 70–72, which produce the scatter portion of the chart. Then I added a section to create new series with composite sales figures:

# Generate summed sales
x = macDates
y1 = macMA
y2 = [0.0]*(len(macMA)-len(padMA)) + padMA
y2 = [ a + b for a, b in zip(y1, y2) ]
y3 = [0.0]*(len(macMA)-len(phoneMA)) + phoneMA
y3 = [ a + b for a, b in zip(y2, y3) ]

This is certainly not the most elegant way to do this, but it was quick, and I didn’t want to spend a lot of time making a chart that I don’t really like in the first place. With this in place, x is the (redundant) list of dates for the entire domain of the plot; y1 is the (also redundant) list of Mac sales; y2 is the sum of Mac and iPad sales; and y3 is the sum of Mac, iPad, and iPhone sales. The x and y1 lists are unnecessary, but I wanted a new set of variable names to use in the later plotting commands.

Then I deleted Lines 67–72 from the original script (these are the commands that did the line and scatter plotting) and replaced them with these:

ax.fill_between(x, 0, y1, facecolor='green', label='Mac')
ax.fill_between(x, y1, y2, facecolor='red', label='iPad')
ax.fill_between(x, y2, y3, facecolor='blue', label='iPhone')

This gave me my first iteration of the stacked area chart:

Stacked chart iteration 1

The fill_between command does pretty much what you’d think. It fills the space between two data series with the given color. It’s clever enough to know that 0 should be treated as the entire x axis, saving you the trouble of generating a list of zero values.

The problem with the graph at this point was that the colors were far too bright. I don’t mind saturated colors in lines and points, but they’re distracting when used in big areas. I’m plotting data here, not designing a superhero costume.

There are a couple of ways to fix this. One is to choose colors that are less saturated. The other is to increase the transparency of the fill. This allows the white background to show through and reduce the perceived saturation of the fill colors and has the added benefit of making the background grid visible in the filled areas.

To turn the fills from opaque to translucent, add an alpha parameter to the fill_between commands:

ax.fill_between(x, 0, y1, facecolor='green', alpha=.5, label='Mac')
ax.fill_between(x, y1, y2, facecolor='red', alpha=.4, label='iPad')
ax.fill_between(x, y2, y3, facecolor='blue', alpha=.5, label='iPhone')

The values I chose for alpha came from trial and error. To my eye, they make the grid lines appear about as dark in each of the filled areas.

Stacked chart iteration 2

I almost published the post with the chart in this form. Because I’d been focused on getting the data plotted the way I wanted, I’d ignored the legend. Matplotlib had automatically taken care of changing the markers in the legend from lines to blocks of color, so I didn’t think much about it. But after getting the plotting done, I realized that the legend needed tweaking.

The legend is certainly accurate in its depiction of which color goes with which area, but the order can be improved. In the original line-and-scatter chart, the order didn’t matter too much, and having it in Mac-iPhone-iPad order made sense chronologically. Now that we have a stacked area chart, the stacking order of the legend should match the stacking order of the data. The purpose of a legend is to tell you what’s what, and by using position as well as color, we reinforce that communication.

One way to change the order of the legend is to change the order of the fill_between commands. This order would work:

ax.fill_between(x, y2, y3, facecolor='blue', alpha=.5, label='iPhone')
ax.fill_between(x, y1, y2, facecolor='red', alpha=.4, label='iPad')
ax.fill_between(x, 0, y1, facecolor='green', alpha=.5, label='Mac')

But I learned from this Stack Overflow discussion that creation order doesn’t always translate into legend order. A more robust way to set the legend order is to understand the legend command a little better and not rely on its defaults.

One of the parameters you can pass to legend is a list of handles to the individual data plots. The handles are the return values of the plotting commands, so first I had to change

ax.fill_between(x, 0, y1, facecolor='green', alpha=.5, label='Mac')
ax.fill_between(x, y1, y2, facecolor='red', alpha=.4, label='iPad')
ax.fill_between(x, y2, y3, facecolor='blue', alpha=.5, label='iPhone')


mac = ax.fill_between(x, 0, y1, facecolor='green', alpha=.5, label='Mac')
pad = ax.fill_between(x, y1, y2, facecolor='red', alpha=.4, label='iPad')
phone = ax.fill_between(x, y2, y3, facecolor='blue', alpha=.5, label='iPhone')

With variables associated with each plot, I could now set the order of the legend by including a handles parameter in the legend command, switching from

plt.legend(loc=(.15, .6), prop={'size':12})


plt.legend(handles=[phone, pad, mac], loc=(.15, .6), prop={'size':12})

That gave me the version I finally published:

Stacked area chart

The handles trick is something I know future me will want to use. Sometimes the order of the plotting commands can’t be changed because the chart is using opacity and the z position of its component plots to achieve a certain effect. In those situations, being able to change the legend order without changing the plotting order will save me the trouble of opening the chart PDF in a program like OmniGraffle or Graphic (née iDraw) and editing the legend by hand.

Breaking the rules

Earlier this evening I was looking through tweets that link here,1 and after following a particular thread backward, I found this:


Benedict Evans (@BenedictEvans) Feb 4 2016 2:13 PM

(You can see a larger version of the image by clicking on it.)

My animus to stacked area charts led me to tweet this:

Remember, kids, never use stacked area charts.

Dr. Drang (@drdrang) Feb 4 2016 6:11 PM

That got a very quick response:

Remember kids, any charts ‘rule’ will produce bad charts. Judgement beats rules. twitter.com/drdrang/status…
Benedict Evans (@BenedictEvans) Feb 4 2016 6:14 PM

Evans is right.2 Judgment does beat rules. I’m not averse to breaking rules occasionally, but you have to exercise good judgment when you do so. You have to have a reason.

My case against stacked area charts is here. In a nutshell, the problem with stacked area charts is that each of the items being graphed (except the one on the bottom) are distorted because they’re set upon a sloped and curving baseline, i.e., the top of the item graphed below it. This can hide behavior that’s present in the data and suggest behavior that isn’t.

In Evans’s graph, what’s being hidden is the iPad’s declining sales. Oh, it’s there, no doubt, but it isn’t as obvious as it should be because it’s sitting on top of the upward sloping iPhone.

Now, you might argue that the purpose of Evans’s graph wasn’t to show the iPad’s decline. That’s probably true, but if the purpose was to show Apple’s devices rising and overtaking Windows PCs, why bother breaking Apple’s sales into its components? Why not just show Apple’s composite sales of Macs, iPads, and iPhones as a single line, growing up and crossing the Windows PC line?

If you’re going to show the individual components, you have an obligation to show them clearly, and Evans’s graph doesn’t do that. Interestingly, if he had given it some thought, Evans could have made a stacked area chart that presented the data with less distortion. Simply plot the iPad sales between the Mac and the iPhone.3

Stacked area chart

The iPad sales are less distorted in this view because the Mac sales provide a relatively flat baseline for the iPad to sit upon. Of course, the iPhone sales are more distorted than in Evans’s graph because of the iPad hump, but that’s less of a worry, I think, because iPhone sales are so much higher. Also, the gridlines in the background aid in seeing the heights of the individual components.

Is there some rule that you have to stack the sales in the order that the products were introduced? It’s certainly natural to stack them in that order, but it isn’t a rule. And if it were, this would be a good place to break it.

By the way, I don’t want to give the impression that I actually like this graph. I just think it’s better than Evans’s.4 Lesser of two evils.

And as for judgment, it’s nice to talk about, but it’s better to apply.

  1. Oh, don’t tell me you’ve never done that for your site. 

  2. Not in the first sentence, of course. That’s just silly. But I realize he’s overstating to make a point, something he doesn’t seem to recognize when others do it. 

  3. I don’t have the Windows PC data, and I’m not going to go looking for it because I have no argument with that part of the plot. I’m sticking with the color convention I used in earlier charts rather than adopting the colors Evans used. Similarly, I’m using a four-quarter moving average instead of a four-quarter total. This is not a knock on Evans’s choices for color or scale; it was just faster for me to make a plot that was consistent with my earlier choices. 

  4. And not just in the order of the stacking. Although I do like Evans’s use of old-style numerals, his tick labels along the horizontal axis are an abomination. The eye-glazing repetition of Jun/Dec over 29 labels makes reading the labels harder than necessary, as does their vertical orientation. There’s no need for a label every six months, or even every year. They take up way too much vertical space and draw attention away from the data.

    I don’t think much of his legend, either. Areas should be designated by blocks of color, not streaks that are only slightly thicker than the marker for lines. 

A small adjustment to SnapClip

As I buy more business supplies online and fewer in regular retail stores, I find myself taking more screenshots of digital receipts to attach to my expense reports. My SnapClip Keyboard Maestro macro (along with its predecesor scripts) has been my go-to utility for this, even though I originally wrote it mainly to capture screenshots of windows for posting here on the the blog. Because my pattern of use has changed, it seemed like a good time to give it a couple of tweaks to make it faster to use for its most common task.

The purpose of SnapClip is to give me a one-stop keyboard macro for screenshots that aren’t intended to be immediately uploaded to the leancrew server. It handles the following types of screenshot:

The changes I’ve made recently have been the addition of saving to a file and switching the default from window capture to rectangle capture. A small change in focus to better fit how I work now.

SnapClip is triggered by the ⌃⌥⌘4 keystroke combination. Like the built-in ⇧⌘4, it starts by presenting a crosshair for selecting a rectangular area but can be switched to window capture mode by tapping the spacebar. Once the rectangle or window has been selected, SnapClip presents the following dialog:

SnapClip dialog

The defaults are to save the image to the clipboard only and to not add a blue background border. Strictly speaking, I don’t need SnapClip to handle this default condition. Using the standard ⇧⌘4 capture and holding down the Control key when making the selection will put the screenshot on the clipboard instead of saving it to a file. The advantage of SnapClip is that I can do more than the built-in screen capture with a single keystroke combination.

I generally don’t use the background border option when taking rectangular screenshots, but I do like adding the border when taking window screenshots. It changes the screenshot from this

Screenshot without border

to this

Screenshot with border

Turning on the file saving option saves a copy of the screenshot to the Desktop with a filename in yyyymmdd-HHMMSS.png format. I do this when I expect to need the screenshot more than once. I may, for example, paste it immediately into a tweet or text message but also expect to incorporate it into a blog post or an email later.

That’s what SnapClip does. Here’s how it’s made. It is, as I said, a Keyboard Maestro macro:

Keyboard Maestro SnapClip macro

The only action in the macro is this Python script:

 1:  #!/usr/bin/env python
 3:  import Pashua
 4:  import tempfile
 5:  from PIL import Image
 6:  import sys, os
 7:  import subprocess
 8:  import shutil
 9:  from datetime import datetime
11:  # Local parameters
12:  type = "png"
13:  localdir = os.environ['HOME'] + "/Pictures/Screenshots"
14:  tf, tfname = tempfile.mkstemp(suffix='.'+type, dir=localdir)
15:  bgcolor = (61, 101, 156)
16:  border = 16
17:  desktop = os.environ['HOME'] + "/Desktop/"
18:  fname = desktop + datetime.now().strftime("%Y%m%d-%H%M%S." + type)
20:  # Dialog box configuration
21:  conf = '''
22:  # Window properties
23:  *.title = Snapshot
25:  # Border checkbox properties
26:  bd.type = checkbox
27:  bd.label = Background border
28:  bd.x = 10
29:  bd.y = 60
31:  # Save file checkbox properties
32:  sf.type = checkbox
33:  sf.label = Save file to Desktop
34:  sf.x = 10
35:  sf.y = 35
37:  # Default button
38:  db.type = defaultbutton
39:  db.label = Clipboard
41:  # Cancel button
42:  cb.type = cancelbutton
43:  '''
45:  # Capture a portion of the screen and save it to a temporary file.
46:  status = subprocess.call(["screencapture", "-io", "-t", type, tfname])
48:  # Exit if the user canceled the screencapture.
49:  if not status == 0:
50:    os.remove(tfname)
51:    sys.exit()
53:  # Open the dialog box and get the input.
54:  dialog = Pashua.run(conf)
55:  if dialog['cb'] == '1':
56:    os.remove(tfname)
57:    sys.exit()
59:  # Add a desktop background border if asked for.
60:  snap = Image.open(tfname)
61:  if dialog['bd'] == '1':
62:    # Make a solid-colored background bigger than the screenshot.
63:    snapsize = tuple([ x + 2*border for x in snap.size ])
64:    bg = Image.new('RGB', snapsize, bgcolor)
65:    bg.paste(snap, (border, border))
66:    bg.save(tfname)
68:  # Put the image on the clipboard, save to Desktop if asked for,
69:  # and delete the temporary file.
70:  impbcopy = os.environ['HOME'] + '/Dropbox/bin/impbcopy'
71:  subprocess.call([impbcopy, tfname])
72:  if dialog['sf'] == '1':
73:    shutil.copyfile(tfname, fname)
74:  os.remove(tfname)

The script uses two nonstandard Python modules:

  1. Pashua, which provides bindings to Carsten Blüm’s lovely Pashua utility for creating dialog boxes from short textual descriptions.
  2. Image from the Python Imaging Library, which, as its name suggests, provides image editing commands.

The script also calls impbcopy, Alec Jacobson’s command line utility for putting the contents of an image file onto the clipboard.

You’ll need all of these utilities and modules installed if you want to run your own SnapClip.

Most of the script was described in my post from last year. The differences between then and now are relatively minor:

If you’ve been reading ANIAT for any length of time, you’re sick of hearing me say this, but I’ll say it anyway. The great advantage of building your own tools is that you can make them fit exactly how you work. Even if how you work changes.